Dissertations / Theses: 'Natural language processing (Computer science) – Research'

1

Ramachandran, Venkateshwaran. "A temporal analysis of natural language narrative text." Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Imam, Md Kaisar. "Improvements to the complex question answering models." Thesis, Lethbridge, Alta. : University of Lethbridge, c2011, 2011. http://hdl.handle.net/10133/3214.

Full text

Abstract:

In recent years the amount of information on the web has increased dramatically. As a result, it has become a challenge for the researchers to find effective ways that can help us query and extract meaning from these large repositories. Standard document search engines try to address the problem by presenting the users a ranked list of relevant documents. In most cases, this is not enough as the end-user has to go through the entire document to find out the answer he is looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain complex question answering. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, we considered the task of complex question answering as query-focused multi-document summarization. In this thesis, to improve complex question answering we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. We have formulated the task of complex question answering using reinforcement framework, which to our best knowledge has not been applied for this task before and has the potential to improve itself by fine-tuning the feature weights from user feedback. We have also used unsupervised machine learning techniques (random walk, manifold ranking) and augmented semantic and syntactic information to improve them. Finally we experimented with question decomposition where instead of trying to find the answer of the complex question directly, we decomposed the complex question into a set of simple questions and synthesized the answers to get our final result. x, 128 leaves : ill. ; 29 cm

APA, Harvard, Vancouver, ISO, and other styles

3

Shivade, Chaitanya P. "How sick are you?Methods for extracting textual evidence to expedite clinical trial screening." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462810822.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Hale, Scott A. "Global connectivity, information diffusion, and the role of multilingual users in user-generated content platforms." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:3040a250-c526-4f10-aa9b-25117fd4dea2.

Full text

Abstract:

Internet content and Internet users are becoming more linguistically diverse as more people speaking different languages come online and produce content on user-generated content platforms. Several platforms have emerged as truly global platforms with users speaking many different languages and coming from around the world. It is now possible to study human behavior on these platforms using the digital trace data the platforms make available about the content people are authoring. Network literature suggests that people cluster together by language, but also that there is a small average path length between any two people on most Internet platforms (including two speakers of different languages). If so, multilingual users may play critical roles as bridges or brokers on these platforms by connecting clusters of monolingual users together across languages. The large differences in the content available in different languages online underscores the importance of such roles. This thesis studies the roles of multilingual users and platform design on two large, user-generated content platforms: Wikipedia and Twitter. It finds that language has a strong role structuring each platform, that multilingual users do act as linguistic bridges subject to certain limitations, that the size of a language correlates with the roles its speakers play in cross-language connections, and that there is a correlation between activity and multilingualism. In contrast to the general understanding in linguistics of high levels of multilingualism offline, this thesis finds relatively low levels of multilingualism on Twitter (11%) and Wikipedia (15%). The findings have implications for both platform design and social network theory. The findings suggest design strategies to increase multilingualism online through the identification and promotion of multilingual starter tasks, the discovery of related other-language information, and the promotion of user choice in linguistic filtering. While weak-ties have received much attention in the social networks literature, cross-language ties are often not distinguished from same-language weak ties. This thesis finds that cross-language ties are similar to same-language weak ties in that both connect distant parts of the network, have limited bandwidth, and yet transfer a non-trivial amount of information when considered in aggregate. At the same time, cross-language ties are distinct from same-language weak ties for the purposes of information diffusion. In general cross-language ties are smaller in number than same-language ties, but each cross-language tie may convey more diverse information given the large differences in the content available in different languages and the relative ease with which a multilingual speaker may access content in multiple languages compared to a monolingual speaker.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhu, Haoyu. "The state of network research." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289170.

Full text

Abstract:

In the past decades, networking researchers experienced great changes. Being familiar with the development of networking researches is the first step for most scholars to start their work. The targeted areas, useful documents, and active institutions are helpful to set up the new research. This project is focused on developing an assistant tool based on public accessed papers and information on the Internet that allows researchers to view most cited papers in networking conferences and journals. NLP tools are implemented over crawled full-text in order to classify the papers and extract the keywords. Papers are located based on authors to show the most active countries around the world that are working in this area. References are analyzed to view the most cited topics and detailed paper information. We draw some interesting conclusions from our system, showing that some topics attract more attention in the past decades. Under de senaste decennierna upplevde nätverksundersökningar stora förändringar. Att känna till utvecklingen av nätverksundersökningar är det första steget för de flesta forskare att starta sitt arbete. De riktade områdena, användbara dokument och aktiva institutioner är användbara för att skapa den nya forskningen. Projektet fokuserade på att utveckla ett assistentverktyg baserat på offentliga åtkomstpapper och information via internet. Som gör det möjligt för forskare att se de mest citerade artiklarna i nätverkskonferenser och tidskrifter. NLP- verktyg implementeras över genomsökt fulltext för att klassificera papperet och extrahera nyckelorden. Artiklar är baserade på författare för att visa de mest aktiva länderna runt om i världen som arbetar inom detta område. Hänvisningar analyseras för att se det mest citerade ämnet och detaljerad pappersinformation. Vi drar några intressanta slutsatser från vårt system och visar att något ämne inte lockar till sig mer under de senaste decennierna.

APA, Harvard, Vancouver, ISO, and other styles

6

Wijeratne, Sanjaya. "A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation of Emoji using EmojiNet." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1547506375922938.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Naphtal, Rachael (Rachael M. ). "Natural language processing based nutritional application." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100640.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 67-68). The ability to accurately and eciently track nutritional intake is a powerful tool in combating obesity and other food related diseases. Currently, many methods used for this task are time consuming or easily abandoned; however, a natural language based application that converts spoken text to nutritional information could be a convenient and eective solution. This thesis describes the creation of an application that translates spoken food diaries into nutritional database entries. It explores dierent methods for solving the problem of converting brands, descriptions and food item names into entries in nutritional databases. Specifically, we constructed a cache of over 4,000 food items, and also created a variety of methods to allow refinement of database mappings. We also explored methods of dealing with ambiguous quantity descriptions and the mapping of spoken quantity values to numerical units. When assessed by 500 users entering their daily meals on Amazon Mechanical Turk, the system was able to map 83.8% of the correctly interpreted spoken food items to relevant nutritional database entries. It was also able to nd a logical quantity for 92.2% of the correct food entries. Overall, this system shows a signicant step towards the intelligent conversion of spoken food diaries to actual nutritional feedback. by Rachael Naphtal. M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

8

張少能 and Siu-nang Bruce Cheung. "A concise framework of natural language processing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1989. http://hub.hku.hk/bib/B31208563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Cheung, Siu-nang Bruce. "A concise framework of natural language processing /." [Hong Kong : University of Hong Kong], 1989. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12432544.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lei, Tao Ph D. Massachusetts Institute of Technology. "Interpretable neural models for natural language processing." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108990.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 109-119). The success of neural network models often comes at a cost of interpretability. This thesis addresses the problem by providing justifications behind the model's structure and predictions. In the first part of this thesis, we present a class of sequence operations for text processing. The proposed component generalizes from convolution operations and gated aggregations. As justifications, we relate this component to string kernels, i.e. functions measuring the similarity between sequences, and demonstrate how it encodes the efficient kernel computing algorithm into its structure. The proposed model achieves state-of-the-art or competitive results compared to alternative architectures (such as LSTMs and CNNs) across several NLP applications. In the second part, we learn rationales behind the model's prediction by extracting input pieces as supporting evidence. Rationales are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by the desiderata for rationales. We demonstrate the effectiveness of this learning framework in applications such multi-aspect sentiment analysis. Our method achieves a performance over 90% evaluated against manual annotated rationales. by Tao Lei. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

11

Grinman, Alex J. "Natural language processing on encrypted patient data." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/113438.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 85-86). While many industries can benefit from machine learning techniques for data analysis, they often do not have the technical expertise nor computational power to do so. Therefore, many organizations would benefit from outsourcing their data analysis. Yet, stringent data privacy policies prevent outsourcing sensitive data and may stop the delegation of data analysis in its tracks. In this thesis, we put forth a two-party system where one party capable of powerful computation can run certain machine learning algorithms from the natural language processing domain on the second party's data, where the first party is limited to learning only specific functions of the second party's data and nothing else. Our system provides simple cryptographic schemes for locating keywords, matching approximate regular expressions, and computing frequency analysis on encrypted data. We present a full implementation of this system in the form of a extendible software library and a command line interface. Finally, we discuss a medical case study where we used our system to run a suite of unmodified machine learning algorithms on encrypted free text patient notes. by Alex J. Grinman. M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

12

Bigert, Johnny. "Automatic and unsupervised methods in natural language processing." Doctoral thesis, Stockholm, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-156.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

XIAO, MIN. "Generalized Domain Adaptation for Sequence Labeling in Natural Language Processing." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/391382.

Full text

Abstract:

Computer and Information Science Ph.D. Sequence labeling tasks have been widely studied in the natural language processing area, such as part-of-speech tagging, syntactic chunking, dependency parsing, and etc. Most of those systems are developed on a large amount of labeled training data via supervised learning. However, manually collecting labeled training data is too time-consuming and expensive. As an alternative, to alleviate the issue of label scarcity, domain adaptation has recently been proposed to train a statistical machine learning model in a target domain where there is no enough labeled training data by exploiting existing free labeled training data in a different but related source domain. The natural language processing community has witnessed the success of domain adaptation in a variety of sequence labeling tasks. Though the labeled training data in the source domain are available and free, however, they are not exactly as and can be very different from the test data in the target domain. Thus, simply applying naive supervised machine learning algorithms without considering domain differences may not fulfill the purpose. In this dissertation, we developed several novel representation learning approaches to address domain adaptation for sequence labeling in natural language processing. Those representation learning techniques aim to induce latent generalizable features to bridge domain divergence to enable cross-domain prediction. We first tackle a semi-supervised domain adaptation scenario where the target domain has a small amount of labeled training data and propose a distributed representation learning approach based on a probabilistic neural language model. We then relax the assumption of the availability of labeled training data in the target domain and study an unsupervised domain adaptation scenario where the target domain has only unlabeled training data, and give a task-informative representation learning approach based on dynamic dependency networks. Both works are developed in the setting where different domains contain sentences in different genres. We then extend and generalize domain adaptation into a more challenging scenario where different domains contain sentences in different languages and propose two cross-lingual representation learning approaches, one is based on deep neural networks with auxiliary bilingual word pairs and the other is based on annotation projection with auxiliary parallel sentences. All four specific learning scenarios are extensively evaluated with different sequence labeling tasks. The empirical results demonstrate the effectiveness of those generalized domain adaptation techniques for sequence labeling in natural language processing. Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

14

Walker, Alden. "Natural language interaction with robots." Diss., Connect to the thesis, 2007. http://hdl.handle.net/10066/1275.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Cosh, Kenneth John. "Supporting organisational semiotics with natural language processing techniques." Thesis, Lancaster University, 2003. http://eprints.lancs.ac.uk/12351/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Cline, Ben E. "Knowledge intensive natural language generation with revision." Diss., This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-09092008-063657/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Chen, Michelle W. M. Eng Massachusetts Institute of Technology. "Comparison of natural language processing algorithms for medical texts." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100298.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Title as it appears in MIT Commencement Exercises program, June 5, 2015: Comparison of NLP systems for medical text. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 57-58). With the large corpora of clinical texts, natural language processing (NLP) is growing to be a field that people are exploring to extract useful patient information. NLP applications in clinical medicine are especially important in domains where the clinical observations are crucial to define and diagnose the disease. There are a variety of different systems that attempt to match words and word phrases to medical terminologies. Because of the differences in annotation datasets and lack of common conventions, many of the systems yield conflicting results. The purpose of this thesis project is (1) to create a visual representation of how different concepts compare to each other when using various annotators and (2) to improve upon the NLP methods to yield terms with better fidelity to what the clinicians are trying to express. by Michelle W. Chen. M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

18

Chien, Isabel. "Natural language processing for precision clinical diagnostics and treatment." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119754.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 61-65). In this thesis, I focus upon application of natural language processing to clinical diagnostics and treatment within the palliative care and serious illness field. I explore a variety of natural language processing methods, including deep learning, rule-based, and classic machine learning, and applied to the identication of documentation reflecting advanced care planning measures, serious illnesses, and serious illness symptoms. I introduce two tools that can be used to analyze clinical notes from electronic health records: ClinicalRegex, a regular expression interface, and PyCCI, an a clinical text annotation tool. Additionally, I discuss a palliative care-focused research project in which I apply machine learning natural language processing methods to identifying clinical documentation in the palliative care and serious illness field. Advance care planning, which includes clarifying and documenting goals of care and preferences for future care, is essential for achieving end-of-life care that is consistent with the preferences of dying patients and their families. Physicians document their communication about these preferences as unstructured free text in clinical notes; as a result, routine assessment of this quality indicator is time consuming and costly. Integrating goals of care conversations and advance care planning into decision-making about palliative surgery have been shown to result in less invasive care near the time of death and improve clinical outcomes for both the patient and surviving family members. Natural language processing methods offer an efficient and scalable way to improve the visibility of documented serious illness conversations within electronic health record data, helping to better quality of care. by Isabel Chien. M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

19

Shepherd, David. "Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools /." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 176 p, 2007. http://proquest.umi.com/pqdweb?did=1397920371&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Indovina, Donna Blodgett. "A natural language interface to MS-DOS /." Online version of thesis, 1989. http://hdl.handle.net/1850/10548.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Shah, Aalok Bipin 1977. "Iteractive design and natural language processing in the WISE Project." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80118.

Full text

Abstract:

Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999. Includes bibliographical references (p. 55-57). by Aalok Bipin Shah. S.B.and M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

22

Bajwa, Imran Sarwar. "A natural language processing approach to generate SBVR and OCL." Thesis, University of Birmingham, 2014. http://etheses.bham.ac.uk//id/eprint/4890/.

Full text

Abstract:

The Object Constraint Language (OCL) is a declarative language and is used to make the Unified Modeling Language (UML) models well-defined through defining a set of constraints. However, the syntactic complexity of OCL makes the writing of OCL code difficult. A natural language based interface can be useful in making the process of writing OCL expressions easy and simple. However, the translation of natural language (NL) text to object constraint language (OCL) code is a challenging task on account of the informal nature of natural languages as various syntactic and semantic ambiguities make the process of NL translation to formal languages more complex. However, in our approach the usage of SBVR not only provides natural languages a formal abstract syntax representation but it is also close to OCL syntax. In this thesis, a framework is presented to facilitate the users of the UML tools so that they can write invariants and pre/post conditions in English. The results of the case studies manifest that a natural language based approach to generate OCL constraints can not only help in significantly improving usability of OCL but also outperforms the most closely related techniques in terms of effectiveness and effort required in generating OCL.

APA, Harvard, Vancouver, ISO, and other styles

23

Li, Wenhui. "Sentiment analysis: Quantitative evaluation of subjective opinions using natural language processing." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/28000.

Full text

Abstract:

Sentiment Analysis consists of recognizing sentiment orientation towards specific subjects within natural language texts. Most research in this area focuses on classifying documents as positive or negative. The purpose of this thesis is to quantitatively evaluate subjective opinions of customer reviews using a five star rating system, which is widely used on on-line review web sites, and to try to make the predicted score as accurate as possible. Firstly, this thesis presents two methods for rating reviews: classifying reviews by supervised learning methods as multi-class classification does, or rating reviews by using association scores of sentiment terms with a set of seed words extracted from the corpus, i.e. the unsupervised learning method. We extend the feature selection approach used in Turney's PMI-IR estimation by introducing semantic relatedness measures based up on the content of WordNet. This thesis reports on experiments using the two methods mentioned above for rating reviews using the combined feature set enriched with WordNet-selected sentiment terms. The results of these experiments suggest ways in which incorporating WordNet relatedness measures into feature selection may yield improvement over classification and unsupervised learning methods which do not use it. Furthermore, via ordinal meta-classifiers, we utilize the ordering information contained in the scores of bank reviews to improve the performance, we explore the effectiveness of re-sampling for reducing the problem of skewed data, and we check whether discretization benefits the ordinal meta-learning process. Finally, we combine the unsupervised and supervised meta-learning methods to optimize performance on our sentiment prediction task.

APA, Harvard, Vancouver, ISO, and other styles

24

Pham, Son Bao Computer Science &amp Engineering Faculty of Engineering UNSW. "Incremental knowledge acquisition for natural language processing." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26299.

Full text

Abstract:

Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.

APA, Harvard, Vancouver, ISO, and other styles

25

Jarmasz, Mario. ""Roget's Thesaurus" as a lexical resource for natural language processing." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26493.

Full text

Abstract:

This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.

APA, Harvard, Vancouver, ISO, and other styles

26

O'Sullivan, John J. D. "Teach2Learn : gamifying education to gather training data for natural language processing." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/117320.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 65-66). Teach2Learn is a website which crowd-sources the problem of labeling natural text samples using gamified education as an incentive. Students assign labels to text samples from an unlabeled data set, thereby teaching superised machine learning algorithms how to interpret new samples. In return, students can learn how that algorithm works by unlocking lessons written by researchers. This aligns the incentives of researchers and learners to help both achieve their goals. The application used current best practices in gamification to create a motivating structure around that labeling task. Testing showed that 27.7% of the user base (5/18 users) engaged with the content and labeled enough samples to unlock all of the lessons, suggesting that learning modules are sufficient motivation for the right users. Attempts to grow the platform through paid social media advertising were unsuccessful, likely because users aren't looking for a class when they browse those sites. Unpaid posts on subreddits discussing related topics, where users were more likely to be searching for learning opportunities, were more successful. Future research should seek users through comparable sites and explore how Teach2Learn can be used as an additional learning resource in classrooms. by John J.D. O'Sullivan M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

27

Forsyth, Alexander William. "Improving clinical decision making with natural language processing and machine learning." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/112847.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 49-53). This thesis focused on two tasks of applying natural language processing (NLP) and machine learning to electronic health records (EHRs) to improve clinical decision making. The first task was to predict cardiac resynchronization therapy (CRT) outcomes with better precision than the current physician guidelines for recommending the procedure. We combined NLP features from free-text physician notes with structured data to train a supervised classifier to predict CRT outcomes. While our results gave a slight improvement over the current baseline, we were not able to predict CRT outcome with both high precision and high recall. These results limit the clinical applicability of our model, and reinforce previous work, which also could not find accurate predictors of CRT response. The second task in this thesis was to extract breast cancer patient symptoms during chemotherapy from free-text physician notes. We manually annotated about 10,000 sentences, and trained a conditional random field (CRF) model to predict whether a word indicated a symptom (positive label), specifically indicated the absence of a symptom (negative label), or was neutral. Our final model achieved 0.66, 1.00, and 0.77 F1 scores for predicting positive, neutral, and negative labels respectively. While the F1 scores for positive and negative labels are not extremely high, with the current performance, our model could be applied, for example, to gather better statistics about what symptoms breast cancer patients experience during chemotherapy and at what time points during treatment they experience these symptoms. by Alexander William Forsyth. M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

28

Manek, Meenakshi. "Natural language interface to a VHDL modeling tool." Thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-06232009-063212/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Keller, Thomas Anderson. "Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing." Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.

Full text

Abstract:

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

APA, Harvard, Vancouver, ISO, and other styles

30

Cohn, Trevor A. "Scaling conditional random fields for natural language processing /." Connect to thesis, 2007. http://eprints.unimelb.edu.au/archive/00002874.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Schäfer, Ulrich. "Integrating deep and shallow natural language processing components : representations and hybrid architectures /." Saarbrücken : German Reseach Center for Artificial Intelligence : Saarland University, Dept. of Computational Linguistics and Phonetics, 2007. http://www.loc.gov/catdir/toc/fy1001/2008384333.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Berman, Lucy. "Lewisian Properties and Natural Language Processing: Computational Linguistics from a Philosophical Perspective." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cmc_theses/2200.

Full text

Abstract:

Nothing seems more obvious than that our words have meaning. When people speak to each other, they exchange information through the use of a particular set of words. The words they say to each other, moreover, are about something. Yet this relation of “aboutness,” known as “reference,” is not quite as simple as it appears. In this thesis I will present two opposing arguments about the nature of our words and how they relate to the things around us. First, I will present Hilary Putnam’s argument, in which he examines the indeterminacy of reference, forcing us to conclude that we must abandon metaphysical realism. While Putnam considers his argument to be a refutation of non-epistemicism, David Lewis takes it to be a reductio, claiming Putnam’s conclusion is incredible. I will present Lewis’s response to Putnam, in which he accepts the challenge of demonstrating how Putnam’s argument fails and rescuing us from the abandonment of realism. In order to explain the determinacy of reference, Lewis introduces the concept of “natural properties.” In the final chapter of this thesis, I will propose another use for Lewisian properties. Namely, that of helping to minimize the gap between natural language processing and human communication.

APA, Harvard, Vancouver, ISO, and other styles

33

Linckels, Serge, and Christoph Meinel. "An e-librarian service : natural language interface for an efficient semantic search within multimedia resources." Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2009/3308/.

Full text

Abstract:

1 Introduction 1.1 Project formulation 1.2 Our contribution 2 Pedagogical Aspect 4 2.1 Modern teaching 2.2 Our Contribution 2.2.1 Autonomous and exploratory learning 2.2.2 Human machine interaction 2.2.3 Short multimedia clips 3 Ontology Aspect 3.1 Ontology driven expert systems 3.2 Our contribution 3.2.1 Ontology language 3.2.2 Concept Taxonomy 3.2.3 Knowledge base annotation 3.2.4 Description Logics 4 Natural language approach 4.1 Natural language processing in computer science 4.2 Our contribution 4.2.1 Explored strategies 4.2.2 Word equivalence 4.2.3 Semantic interpretation 4.2.4 Various problems 5 Information Retrieval Aspect 5.1 Modern information retrieval 5.2 Our contribution 5.2.1 Semantic query generation 5.2.2 Semantic relatedness 6 Implementation 6.1 Prototypes 6.2 Semantic layer architecture 6.3 Development 7 Experiments 7.1 Description of the experiments 7.2 General characteristics of the three sessions, instructions and procedure 7.3 First Session 7.4 Second Session 7.5 Third Session 7.6 Discussion and conclusion 8 Conclusion and future work 8.1 Conclusion 8.2 Open questions A Description Logics B Probabilistic context-free grammars

APA, Harvard, Vancouver, ISO, and other styles

34

Huber, Bernard J. Jr. "A knowledge-based approach to understanding natural language. /." Online version of thesis, 1991. http://hdl.handle.net/1850/11053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Thompson, Cynthia Ann. "Semantic lexicon acquisition for learning natural language interfaces /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Custy, E. John. "An architecture for the semantic processing of natural language input to a policy workbench." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FCusty.pdf.

Full text

Abstract:

Thesis (M.S. in Software Engineering)--Naval Postgraduate School, March 2003. Thesis advisor(s): James Bret Michael, Neil C. Rowe. Includes bibliographical references (p. 91-92). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

37

Dulle, John David. "A caption-based natural-language interface handling descriptive captions for a multimedia database system." Thesis, Monterey, California : Naval Postgraduate School, 1990. http://handle.dtic.mil/100.2/ADA236533.

Full text

Abstract:

Thesis (M.S. in Computer Science)--Naval Postgraduate School, June 1990. Thesis Advisor(s): Lum, Vincent Y. ; Rowe, Neil C. "June 1990." Description based on signature page. DTIC Identifiers: Interfaces, natural language, databases, theses. Author(s) subject terms: Natural language processing, multimedia database system, natural language interface, descriptive captions. Includes bibliographical references (p. 27).

APA, Harvard, Vancouver, ISO, and other styles

38

Watanabe, Kiyoshi. "Visible language : repetition and its artistic presentation with the computers." Thesis, Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/17664.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Califf, Mary Elaine. "Relational learning techniques for natural language information extraction /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Chandra, Yohan. "Natural Language Interfaces to Databases." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5474/.

Full text

Abstract:

Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field.

APA, Harvard, Vancouver, ISO, and other styles

41

Dua, Smrite. "Introducing Semantic Role Labels and Enhancing Dependency Parsing to Compute Politeness in Natural Language." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1430876809.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Das, Dipanjan. "Semi-Supervised and Latent-Variable Models of Natural Language Semantics." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/342.

Full text

Abstract:

This thesis focuses on robust analysis of natural language semantics. A primary bottleneck for semantic processing of text lies in the scarcity of high-quality and large amounts of annotated data that provide complete information about the semantic structure of natural language expressions. In this dissertation, we study statistical models tailored to solve problems in computational semantics, with a focus on modeling structure that is not visible in annotated text data. We first investigate supervised methods for modeling two kinds of semantic phenomena in language. First, we focus on the problem of paraphrase identification, which attempts to recognize whether two sentences convey the same meaning. Second, we concentrate on shallow semantic parsing, adopting the theory of frame semantics (Fillmore, 1982). Frame semantics offers deep linguistic analysis that exploits the use of lexical semantic properties and relationships among semantic frames and roles. Unfortunately, the datasets used to train our paraphrase and frame-semantic parsing models are too small to lead to robust performance. Therefore, a common trait in our methods is the hypothesis of hidden structure in the data. To this end, we employ conditional log-linear models over structures, that are firstly capable of incorporating a wide variety of features gathered from the data as well as various lexica, and secondly use latent variables to model missing information in annotated data. Our approaches towards solving these two problems achieve state-of-the-art accuracy on standard corpora. For the frame-semantic parsing problem, we present fast inference techniques for jointly modeling the semantic roles of a given predicate. We experiment with linear program formulations, and use a commercial solver as well as an exact dual decomposition technique that breaks the role labeling problem into several overlapping components. Continuing with the theme of hypothesizing hidden structure in data for modeling natural language semantics, we present methods to leverage large volumes of unlabeled data to improve upon the shallow semantic parsing task. We work within the framework of graph-based semi-supervised learning, a powerful method that associates similar natural language types, and helps propagate supervised annotations to unlabeled data. We use this framework to improve frame-semantic parsing performance on unknown predicates that are absent in annotated data. We also present a family of novel objective functions for graph-based learning that result in sparse probability measures over graph vertices, a desirable property for natural language types. Not only are these objectives easier to numerically optimize, but also they result in smoothed distributions over predicates that are smaller in size. The experiments presented in this dissertation empirically demonstrates that missing information in text corpora contain considerable semantic information that can be incorporated into structured models for semantics, to significant benefit over the current state of the art. The methods in this thesis were originally presented by Das and Smith (2009, 2011, 2012), and Das et al. (2010, 2012). The thesis gives a more thorough exposition, relating and comparing the methods, and also presents several extensions of the aforementioned papers.

APA, Harvard, Vancouver, ISO, and other styles

43

Ramos, Brás Juan Ariel. "Natural language processing and translation using augmented transition networks and semantic networks." Diss., Connect to the thesis, 2003. http://hdl.handle.net/10066/1480.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Achananuparp, Palakorn Hu Xiaohua. "Similarity measures and diversity rankings for query-focused sentence extraction /." Philadelphia, Pa. : Drexel University, 2010. http://hdl.handle.net/1860/3245.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Mahamood, Saad Ali. "Generating affective natural language for parents of neonatal infants." Thesis, University of Aberdeen, 2010. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=158569.

Full text

Abstract:

The thesis presented here describes original research in the field of Natural Language Generation (NLG). NLG is the subfield of artificial intelligence that is concerned with the automatic production of documents from underlying data. This thesis in particular focuses on developing new and novel methods for generating text that takes into consideration the recipient’s level of stress as a factor to adapt the resultant textural output. This consideration of taking the recipient level of stress was particularly salient due to the domain that this research was conducted under; providing information for parents of pre-term infants during neonatal intensive care (NICU). A highly technical and stressful environment for parents where emotional sensitivity must be shown for the nature of information presented. We have investigated the emotional and informational needs of these parents through an extensive past literature review and two separate research studies with former and current NICU parents. The NLG system built for this research was called BabyTalk Family (BT-Family). A system that can produce a textual summary of medical events that has occurred for a baby in NICU in last twenty-four hours for parents. The novelty of this system is that is capable of estimating the level of stress of the recipient and by using several affective NLG strategies it is able to tailor it’s output for a stressed audience. Unlike traditional NLG systems where the output would remain unchanged regardless of emotional state of the recipient. The key innovation in this system was the integration of several affective strategies in the Document Planner for tailoring textual output for stress recipients. BT-Family’s output was evaluated with thirteen parents that previously had baby in neonatal care. We developed a methodology for an evaluation that involved a direct comparison between stressed and unstressed text for the same given medical scenario for variables such as preference, understandability, helpfulness, and emotional appropriateness. The results, obtained showed the parents overwhelming preferred the stressed text for all of the variables measured.

APA, Harvard, Vancouver, ISO, and other styles

46

Mitchell, Margaret. "Generating reference to visible objects." Thesis, University of Aberdeen, 2013. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=201692.

Full text

Abstract:

In this thesis, I examine human-like language generation from a visual input head-on, exploring how people refer to visible objects in the real world. Using previous work and the studies from this thesis, I propose an algorithm that generates humanlike reference to visible objects. Rather than introduce a general-purpose REG algorithm, as is tradition, I address the sorts of properties that visual domains in particular make available, and the ways that these must be processed in order to be used in a referring expression algorithm. This method uncovers several issues in generating human-like language that have not been thoroughly studied before. I focus on the properties of color, size, shape, and material, and address the issues of algorithm determinism and how speaker variation may be generated; unique identification of objects and whether this is an appropriate goal for generating humanlike reference; atypicality and the role it plays in reference; and multi-featured values for visual attributes. Technical contributions from this thesis include (1) an algorithm for generating size modifiers from features in a visual scene; and (2) a referring expression generation algorithm that generates structures for varied, human-like reference.

APA, Harvard, Vancouver, ISO, and other styles

47

Venour, Chris. "A computational model of lexical incongruity in humorous text." Thesis, University of Aberdeen, 2013. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=201735.

Full text

Abstract:

Many theories of humour claim that incongruity is an essential ingredient of humour. How- ever this idea is poorly understood and little work has been done in computational humour to quantify it. For example classifiers which attempt to distinguish jokes from regular texts tend to look for secondary features of humorous texts rather than for incongruity. Similarly most joke generators attempt to recreate structural patterns found in example jokes but do not deliberately endeavour to create incongruity. As in previous research, this thesis develops classifiers and a joke generator which attempt to automatically recognize and generate a type of humour. However the systems described here differ from previous programs because they implement a model of a certain type of humorous incongruity. We focus on a type of register humour we call lexical register jokes in which the tones of individual words are in conflict with each other. Our goal is to create a semantic space that reflects the kind of tone at play in lexical register jokes so that words that are far apart in the space are not simply different but exhibit the kinds of incongruities seen in lexical jokes. This thesis attempts to develop such a space and various classifiers are implemented to use it to distinguish lexical register jokes from regular texts. The best of these classifiers achieved high levels of accuracy when distinguishing between a test set of lexical register jokes and 4 different kinds of regular text. A joke generator which makes use of the semantic space to create original lexical register jokes is also implemented and described in this thesis. In a test of the generator, texts that were generated by the system were evaluated by volunteers who considered them not as humorous as human-made lexical register jokes but significantly more humorous than a set of control (i.e.non- joke) texts. This was an encouraging result which suggests that the vector space is somewhat successful in discovering lexical differences in tone and in modelling lexical register jokes.

APA, Harvard, Vancouver, ISO, and other styles

48

Thomson, Blaise Roger Marie. "Statistical methods for spoken dialogue management." Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609054.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Oldham, Joseph Dowell. "Generating documents by means of computational registers." Lexington, Ky. : [University of Kentucky Libraries], 2000. http://lib.uky.edu/ETD/ukycosc2000d00006/oldham.pdf.

Full text

Abstract:

Thesis (Ph. D.)--University of Kentucky, 2000. Title from document title page. Document formatted into pages; contains ix, 169 p. : ill. Includes abstract. Includes bibliographical references (p. 160-167).

APA, Harvard, Vancouver, ISO, and other styles

50

Buys, Jan Moolman. "Incremental generative models for syntactic and semantic natural language processing." Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:a9a7b5cf-3bb1-4e08-b109-de06bf387d1d.

Full text

Abstract:

This thesis investigates the role of linguistically-motivated generative models of syntax and semantic structure in natural language processing (NLP). Syntactic well-formedness is crucial in language generation, but most statistical models do not account for the hierarchical structure of sentences. Many applications exhibiting natural language understanding rely on structured semantic representations to enable querying, inference and reasoning. Yet most semantic parsers produce domain-specific or inadequately expressive representations. We propose a series of generative transition-based models for dependency syntax which can be applied as both parsers and language models while being amenable to supervised or unsupervised learning. Two models are based on Markov assumptions commonly made in NLP: The first is a Bayesian model with hierarchical smoothing, the second is parameterised by feed-forward neural networks. The Bayesian model enables careful analysis of the structure of the conditioning contexts required for generative parsers, but the neural network is more accurate. As a language model the syntactic neural model outperforms both the Bayesian model and n-gram neural networks, pointing to the complementary nature of distributed and structured representations for syntactic prediction. We propose approximate inference methods based on particle filtering. The third model is parameterised by recurrent neural networks (RNNs), dropping the Markov assumptions. Exact inference with dynamic programming is made tractable here by simplifying the structure of the conditioning contexts. We then shift the focus to semantics and propose models for parsing sentences to labelled semantic graphs. We introduce a transition-based parser which incrementally predicts graph nodes (predicates) and edges (arguments). This approach is contrasted against predicting top-down graph traversals. RNNs and pointer networks are key components in approaching graph parsing as an incremental prediction problem. The RNN architecture is augmented to condition the model explicitly on the transition system configuration. We develop a robust parser for Minimal Recursion Semantics, a linguistically-expressive framework for compositional semantics which has previously been parsed only with grammar-based approaches. Our parser is much faster than the grammar-based model, while the same approach improves the accuracy of neural Abstract Meaning Representation parsing.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Natural language processing (Computer science) – Research'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles