Дисертації з теми "Events in natural language processing"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Events in natural language processing.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Events in natural language processing".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Patil, Supritha Basavaraj. "Analysis of Moving Events Using Tweets." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/90884.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date.
Master of Science
News now travels faster on social media than through news channels. Information from social media can help retrieve minute details that might not be emphasized in news. People tend to describe their actions or sentiments in tweets. I aim at studying if such collections of tweets are dependable sources for identifying paths of moving events. In events like hurricanes, using Twitter can help in analyzing people’s reaction to such moving events. These may include actions such as dislocation or emotions during different phases of the event. The results obtained in the experiments concur with the actual path of the events with respect to the regions affected and time. The frequency of tweets increases during event peaks. The number of locations affected that are identified are significantly more than in news wires.
2

Huang, Yin Jou. "Event Centric Approaches in Natural Language Processing." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/265210.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Nothman, Joel. "Grounding event references in news." Thesis, The University of Sydney, 2013. http://hdl.handle.net/2123/10609.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation.
4

Lindén, Johannes. "Huvudtitel: Understand and Utilise Unformatted Text Documents by Natural Language Processing algorithms." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-31043.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
News companies have a need to automate and make the editors process of writing about hot and new events more effective. Current technologies involve robotic programs that fills in values in templates and website listeners that notifies the editors when changes are made so that the editor can read up on the source change at the actual website. Editors can provide news faster and better if directly provided with abstracts of the external sources. This study applies deep learning algorithms to automatically formulate abstracts and tag sources with appropriate tags based on the context. The study is a full stack solution, which manages both the editors need for speed and the training, testing and validation of the algorithms. Decision Tree, Random Forest, Multi Layer Perceptron and phrase document vectors are used to evaluate the categorisation and Recurrent Neural Networks is used to paraphrase unformatted texts. In the evaluation a comparison between different models trained by the algorithms with a variation of parameters are done based on the F-score. The results shows that the F-scores are increasing the more document the training has and decreasing the more categories the algorithm needs to consider. The Multi-Layer Perceptron perform best followed by Random Forest and finally Decision Tree. The document length matters, when larger documents are considered during training the score is increasing considerably. A user survey about the paraphrase algorithms shows the paraphrase result is insufficient to satisfy editors need. It confirms a need for more memory to conduct longer experiments.
5

Sanagavarapu, Krishna Chaitanya. "Determining Whether and When People Participate in the Events They Tweet About." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984235/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
6

Sakaguchi, Tomohiro. "Anchoring Events to the Time Axis toward Storyline Construction." Kyoto University, 2019. http://hdl.handle.net/2433/242437.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
付記する学位プログラム名: デザイン学大学院連携プログラム
Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第21912号
情博第695号
新制||情||119(附属図書館)
京都大学大学院情報学研究科知能情報学専攻
(主査)教授 黒橋 禎夫, 教授 西田 豊明, 教授 楠見 孝
学位規則第4条第1項該当
7

Baier, Thomas, Ciccio Claudio Di, Jan Mendling, and Mathias Weske. "Matching events and activities by integrating behavioral aspects and label analysis." Springer Berlin Heidelberg, 2018. http://dx.doi.org/10.1007/s10270-017-0603-z.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nowadays, business processes are increasingly supported by IT services that produce massive amounts of event data during the execution of a process. These event data can be used to analyze the process using process mining techniques to discover the real process, measure conformance to a given process model, or to enhance existing models with performance information. Mapping the produced events to activities of a given process model is essential for conformance checking, annotation and understanding of process mining results. In order to accomplish this mapping with low manual effort, we developed a semi-automatic approach that maps events to activities using insights from behavioral analysis and label analysis. The approach extracts Declare constraints from both the log and the model to build matching constraints to efficiently reduce the number of possible mappings. These mappings are further reduced using techniques from natural language processing, which allow for a matching based on labels and external knowledge sources. The evaluation with synthetic and real-life data demonstrates the effectiveness of the approach and its robustness toward non-conforming execution logs.
8

Mills, Michael Thomas. "Natural Language Document and Event Association Using Stochastic Petri Net Modeling." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1369408524.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Mehta, Sneha. "Towards Explainable Event Detection and Extraction." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/104359.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Event extraction refers to extracting specific knowledge of incidents from natural language text and consolidating it into a structured form. Some important applications of event extraction include search, retrieval, question answering and event forecasting. However, before events can be extracted it is imperative to detect events i.e. identify which documents from a large collection contain events of interest and from those extracting the sentences that might contain the event related information. This task is challenging because it is easier to obtain labels at the document level than finegrained annotations at the sentence level. Current approaches for this task are suboptimal because they directly aggregate sentence probabilities estimated by a classifier to obtain document probabilities resulting in error propagation. To alleviate this problem we propose to leverage recent advances in representation learning by using attention mechanisms. Specifically, for event detection we propose a method to compute document embeddings from sentence embeddings by leveraging attention and training a document classifier on those embeddings to mitigate the error propagation problem. However, we find that existing attention mechanisms are inept for this task, because either they are suboptimal or they use a large number of parameters. To address this problem we propose a lean attention mechanism which is effective for event detection. Current approaches for event extraction rely on finegrained labels in specific domains. Extending extraction to new domains is challenging because of difficulty of collecting finegrained data. Machine reading comprehension(MRC) based approaches, that enable zero-shot extraction struggle with syntactically complex sentences and long-range dependencies. To mitigate this problem, we propose a syntactic sentence simplification approach that is guided by MRC model to improve its performance on event extraction.
Doctor of Philosophy
Event extraction is the task of extracting events of societal importance from natural language texts. The task has a wide range of applications from search, retrieval, question answering to forecasting population level events like civil unrest, disease occurrences with reasonable accuracy. Before events can be extracted it is imperative to identify the documents that are likely to contain the events of interest and extract the sentences that mention those events. This is termed as event detection. Current approaches for event detection are suboptimal. They assume that events are neatly partitioned into sentences and obtain document level event probabilities directly from predicted sentence level probabilities. In this dissertation, under the same assumption by leveraging representation learning we mitigate some of the shortcomings of the previous event detection methods. Current approaches to event extraction are only limited to restricted domains and require finegrained labeled corpora for their training. One way to extend event extraction to new domains in by enabling zero-shot extraction. Machine reading comprehension(MRC) based approach provides a promising way forward for zero-shot extraction. However, this approach suffers from the long-range dependency problem and faces difficulty in handling syntactically complex sentences with multiple clauses. To mitigate this problem we propose a syntactic sentence simplification algorithm that is guided by the MRC system to improves its performance.
10

Veladas, Rute Gomes. "Classificação automática de eventos na linha de saúde SNS24." Master's thesis, Universidade de Évora, 2021. http://hdl.handle.net/10174/29055.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nesta dissertação apresentamos uma nova ferramenta de suporte à decisão a ser implementada no Serviço de Triagem, Aconselhamento e Encaminhamento (TAE) do Centro de Contacto do Serviço Nacional de Saúde - SNS24. Atualmente a seleção do algoritmo clínico mais adequado a cada situação é efetuada manualmente pelo enfermeiro que atende a chamada. Esta seleção deve ser feita de entre um conjunto de 59 algoritmos clínicos, sendo que esta implementação vem responder à necessidade de reduzir a duração das chamadas recebidas pela linha e consequentemente maximizar o número de chamadas atendidas por unidade de tempo. Este será um modelo baseado em metodologias de Inteligência Artificial, com foco em abordagens de Aprendizagem automática e Processamento de língua natural. O modelo apresentado representa o modelo inicial que foi desenvolvido com um conjunto de dados com os registos de três meses de chamadas, equivalente a cerca de 270.000 registos, mas o modelo final será futuramente desenvolvido a partir de um conjunto de dados com cerca de 4 milhões de chamadas registadas ao longo de três anos pela linha de saúde. O modelo inicial permitiu atingir uma exatidão de 78,80% e medida-F de 78,45% para a classificação da classe do top 1, enquanto que a classificação para o top 3 e top 5 de classes atingiu valores de exatidão superiores a 90%; Abstract: Automatic event classification on the health phone line SNS24 In this dissertation we present a new decision support tool to be implemented in the Screening, Counseling and Referral Service (TAE) of the Contact Center of the National Health Service - SNS24. Currently, the selection of the most appropriate clinical algorithm for each situation is done manually by the nurse who answers the call. This selection must be made from a set of 59 clinical algorithms, and this implementation responds to the need to reduce the duration of calls received by the line and consequently maximize the number of calls answered per unit of time. This will be a model based on Artificial Intelligence methodologies, focusing on Machine Learning and Natural Language Processing approaches. The model presented represents the initial model that was developed with a set of data with the records of three months of calls, equivalent to about 270.000 records, but the final model will be developed in the future from a data set with about 4 million calls registered over three years by the health line. The initial model reached an accuracy of 78.80% and F-measure of 78.45% for the classification of the top 1 class, while the classification for the top 3 and top 5 classes reached values of accuracy greater than 90%.
11

Murugan, Srikala. "Determining Event Outcomes from Social Media." Thesis, University of North Texas, 2020. https://digital.library.unt.edu/ark:/67531/metadc1703427/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
An event is something that happens at a time and location. Events include major life events such as graduating college or getting married, and also simple day-to-day activities such as commuting to work or eating lunch. Most work on event extraction detects events and the entities involved in events. For example, cooking events will usually involve a cook, some utensils and appliances, and a final product. In this work, we target the task of determining whether events result in their expected outcomes. Specifically, we target cooking and baking events, and characterize event outcomes into two categories. First, we distinguish whether something edible resulted from the event. Second, if something edible resulted, we distinguish between perfect, partial and alternative outcomes. The main contributions of this thesis are a corpus of 4,000 tweets annotated with event outcome information and experimental results showing that the task can be automated. The corpus includes tweets that have only text as well as tweets that have text and an image.
12

Tang, Huaxiu. "Detecting Adverse Drug Reactions in Electronic Health Records by using the Food and Drug Administration’s Adverse Event Reporting System." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1470753258.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Carlassare, Giulio. "Similarità semantica e clustering di concetti della letteratura medica rappresentati con language model e knowledge graph di eventi." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23138/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Sul web è presente una grande quantità di informazioni principalmente in formato testuale e la diffusione dei social network ne ha incrementato la produzione. La mancanza di struttura rende difficile l'utilizzo della conoscenza contenuta, generalmente espressa da fatti rappresentabili come relazioni (due entità legate da un predicato) o eventi (in cui una parola esprime una semantica relativa anche a molte entità). La ricerca sta muovendo recentemente il proprio interesse verso i Knowledge Graph che permettono di codificare la conoscenza in un grafo dove i nodi rappresentano le entità e gli archi indicano le relazioni fra di esse. Nonostante al momento la loro costruzione richieda molto lavoro manuale, i recenti passi nel campo del Natural Language Understanding offrono strumenti sempre più sofisticati: in particolare, i language model basati su transformer sono la base di molte soluzioni per l'estrazione automatica di conoscenza dal testo. I temi trattati in questa tesi hanno applicazione diretta nell'ambito delle malattie rare: la scarsa disponibilità di informazioni ha portato alla nascita di comunità di pazienti sul web, in cui si scambiano pareri di indubbia rilevanza sulla propria esperienza. Catturare la "voce dei pazienti" può essere molto importante per far conoscere ai medici la visione che i diretti interessati hanno della malattia. Il caso di studio affrontato riguarda una specifica malattia rara, l'acalasia esofagea e il dataset di post pubblicati in un gruppo Facebook ad essa dedicato. Si propone una struttura modulare di riferimento, poi implementata con metodologie precedentemente analizzate. Viene infine presentata una soluzione in cui le interazioni in forma di eventi, estratte anche con l'utilizzo di un language model, vengono rappresentate efficacemente in uno spazio vettoriale che ne rispecchia il contenuto semantico dove è possibile effettuare clustering, calcolarne la similarità e di conseguenza aggregarli in un unico knowledge graph.
14

Balzani, Lorenzo. "Verbalizzazione di eventi biomedici espressi nella letteratura scientifica: generazione controllata di linguaggio naturale da grafi di conoscenza mediante transformer text-to-text." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24286/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Il periodo in cui viviamo rappresenta la cuspide di una forte e rapida evoluzione nella comprensione del linguaggio naturale, raggiuntasi prevalentemente grazie allo sviluppo di modelli neurali. Nell'ambito dell'information extraction, tali progressi hanno recentemente consentito di riconoscere efficacemente relazioni semantiche complesse tra entità menzionate nel testo, quali proteine, sintomi e farmaci. Tale task -- reso possibile dalla modellazione ad eventi -- è fondamentale in biomedicina, dove la crescita esponenziale del numero di pubblicazioni scientifiche accresce ulteriormente il bisogno di sistemi per l'estrazione automatica delle interazioni racchiuse nei documenti testuali. La combinazione di AI simbolica e sub-simbolica può consentire l'introduzione di conoscenza strutturata nota all'interno di language model, rendendo quest'ultimi più robusti, fattuali e interpretabili. In tale contesto, la verbalizzazione di grafi è uno dei task su cui si riversano maggiori aspettative. Nonostante l'importanza di tali contributi (dallo sviluppo di chatbot alla formulazione di nuove ipotesi di ricerca), ad oggi, risultano assenti contributi capaci di verbalizzare gli eventi biomedici espressi in letteratura, apprendendo il legame tra le interazioni espresse in forma a grafo e la loro controparte testuale. La tesi propone il primo dataset altamente comprensivo su coppie evento-testo, includendo diverse sotto-aree biomediche, quali malattie infettive, ricerca oncologica e biologia molecolare. Il dataset introdotto viene usato come base per l'addestramento di modelli generativi allo stato dell'arte sul task di verbalizzazione, adottando un approccio text-to-text e illustrando una tecnica formale per la codifica di grafi evento mediante testo aumentato. Infine, si dimostra la validità degli eventi per il miglioramento delle capacità di comprensione dei modelli neurali su altri task NLP, focalizzandosi su single-document summarization e multi-task learning.
15

Liu, Xiao. "Fast recursive biomedical event extraction." Thesis, Compiègne, 2014. http://www.theses.fr/2014COMP1963/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L’internet et les nouvelles formes de média de communication, d’information, et de divertissement ont entraîné une croissance massive de la quantité des données numériques. Le traitement et l’interprétation automatique de ces données permettent de créer des bases de connaissances, de rendre les recherches plus efficaces et d’effectuer des recherches sur les médias sociaux. Les travaux de recherche sur le traitement automatique du langage naturel concernent la conception et le développement d’algorithmes, qui permettent aux ordinateurs de traiter automatiquement le langage naturel dans les textes, les contenus audio, les images ou les vidéos, pour des tâches spécifiques. De par la complexité du langage humain, le traitement du langage naturel sous forme textuelle peut être divisé en 4 niveaux : la morphologie, la syntaxe, la sémantique et la pragmatique. Les technologies actuelles du traitement du langage naturel ont eu de grands succès sur les tâches liées auxdeux premiers niveaux, ce qui a permis la commercialisation de beaucoup d’applications comme les moteurs de recherche. Cependant, les moteurs de recherches avancés (structurels) nécessitent une interprétation du langage plus avancée. L’extraction d’information consiste à extraire des informations structurelles à partir des ressources non annotées ou semi-annotées, afin de permettre des recherches avancées et la création automatique des bases de connaissances. Cette thèse étudie le problème d’extraction d’information dans le domaine spécifique de l’extraction des événements biomédicaux. Nous proposons une solution efficace, qui fait un compromis entre deux types principaux de méthodes proposées dans la littérature. Cette solution arrive à un bon équilibre entre la performance et la rapidité, ce qui la rend utilisable pour traiter des données à grande échelle. Elle a des performances compétitives face aux meilleurs modèles existant avec une complexité en temps de calcul beaucoup plus faible. Lors la conception de ce modèle, nous étudions également les effets des différents classifieurs qui sont souvent proposés pour la résolution des problèmes de classification multi-classe. Nous testons également deux méthodes permettant d’intégrer des représentations vectorielles des mots appris par apprentissage profond (deep learning). Même si les classifieurs différents et l’intégration des vecteurs de mots n’améliorent pas grandement la performance, nous pensons que ces directions de recherche ont du potentiel et sont prometteuses pour améliorer l’extraction d’information
Internet as well as all the modern media of communication, information and entertainment entails a massive increase of digital data quantities. Automatically processing and understanding these massive data enables creating large knowledge bases, more efficient search, social medial research, etc. Natural language processing research concerns the design and development of algorithms that allow computers to process natural language in texts, audios, images or videos automatically for specific tasks. Due to the complexity of human language, natural language processing of text can be divided into four levels: morphology, syntax, semantics and pragmatics. Current natural language processing technologies have achieved great successes in the tasks of the first two levels, leading to successes in many commercial applications such as search. However, advanced structured search engine would require computers to understand language deeper than at the morphology and syntactic levels. Information extraction is designed to extract meaningful structural information from unannotated or semi-annotated resources to enable advanced search and automatically create knowledge bases for further use. This thesis studies the problem of information extraction in the specific domain of biomedical event extraction. We propose an efficient solution, which is a trade-off between the two main trends of methods proposed in previous work. This solution reaches a good balance point between performance and speed, which is suitable to process large scale data. It achieves competitive performance to the best models with a much lower computational complexity. While designing this model, we also studied the effects of different classifiers that are usually proposed to solve the multi-class classification problem. We also tested two simple methods to integrate word vector representations learned by deep learning method into our model. Even if different classifiers and the integration of word vectors do not greatly improve the performance, we believe that these research directions carry some promising potential for improving information extraction
16

Bernard, Guillaume. "Détection et suivi d’événements dans des documents historiques." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS032.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les campagnes actuelles de numérisation de documents historiques issus de fonds documentaires du monde entier ouvrent de nouvelles voies aux historiens, historiennes et spécialistes des sciences sociales. La compréhension des événements du passé se renouvelle par l’analyse de ces grands volumes de données historiques : découdre le fil des événements, tracer de fausses informations sont, entre autres, des possibilités offertes par les sciences du numérique. Cette thèse s’intéresse à ces articles de presse historique et propose, à travers deux stratégies que tout oppose, deux processus d’analyse répondant à la problématique de suivi des événements dans la presse. Un cas d’utilisation simple est celui d’une équipe de recherche en humanités numériques qui s’intéresse à un événement particulier du passé. Ses membres cherchent à découvrir tous les documents de presse qui s’y rapportent. L’analyse manuelle des articles est irréalisable dans un temps contraint. En publiant à la fois algorithmes, jeux de données et analyses, cette thèse est un premier jalon vers la publication d’outils plus sophistiqués. Nous permettons à tout individu de fouiller les fonds de presse ancienne à la recherche d’événements, et pourquoi pas, renouveler certaines de nos connaissances historiques
Current campaigns to digitise historical documents from all over the world are opening up new avenues for historians and social science researchers. The understanding of past events is renewed by the analysis of these large volumes of historical data: unravelling the thread of events, tracing false information are, among other things, possibilities offered by the digital sciences. This thesis focuses on these historical press articles and suggests, through two opposing strategies, two analysis processes that address the problem of tracking events in the press. A simple use case is for instance a digital humanities researcher or an amateur historian who is interested in an event of the past and seeks to discover all the press documents related to it. Manual analysis of articles is not feasible in a limited time. By publishing algorithms, datasets and analyses, this thesis is a first step towards the publication of more sophisticated tools allowing any individual to search old press collections for events, and why not, renew some of our historical knowledge
17

Kent, Stuart John Harding. "Modelling events from natural language." Thesis, Imperial College London, 1993. http://kar.kent.ac.uk/21146/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Khan, Sifat Shahriar. "Power Outage Management using Social Sensing." University of Akron / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=akron1556833736835808.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Arnulphy, Béatrice. "Désignations nominales des événements : étude et extraction automatique dans les textes." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00758062.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Ma thèse a pour but l'étude des désignations nominales des événements pour l'extraction automatique. Mes travaux s'inscrivent en traitement automatique des langues, soit dans une démarche pluridisciplinaire qui fait intervenir linguistique et informatique. L'extraction d'information a pour but d'analyser des documents en langage naturel et d'en extraire les informations utiles à une application particulière. Dans ce but général, de nombreuses campagnes d'extraction d'information ont été menées~: pour chaque événement considéré, il s'agit d'extraire certaines informations relatives (participants, dates, nombres, etc.). Dès le départ, ces challenges touchent de près aux entités nommées (éléments " notables " des textes, comme les noms de personnes ou de lieu). Toutes ces informations forment un ensemble autour de l'événement. Pourtant, ces travaux ne s'intéressent que peu aux mots utilisés pour décrire l'événement (particulièrement lorsqu'il s'agit d'un nom). L'événement est vu comme un tout englobant, comme la quantité et la qualité des informations qui le composent. Contrairement aux travaux en extraction d'informations générale, notre intérêt principal est porté uniquement sur la manière dont sont nommés les événements qui se produisent et particulièrement à la désignation nominale utilisée. Pour nous, l'événement est ce qui arrive, ce qui vaut la peine qu'on en parle. Les événements plus importants font l'objet d'articles de presse ou apparaissent dans les manuels d'Histoire. Un événement peut être évoqué par une description verbale ou nominale. Dans cette thèse, nous avons réfléchi à la notion d'événement. Nous avons observé et comparé les différents aspects présentés dans l'état de l'art jusqu'à construire une définition de l'événement et une typologie des événements en général, et qui conviennent dans le cadre de nos travaux et pour les désignations nominales des événements. Nous avons aussi dégagé de nos études sur corpus différents types de formation de ces noms d'événements, dont nous montrons que chacun peut être ambigu à des titres divers. Pour toutes ces études, la composition d'un corpus annoté est une étape indispensable, nous en avons donc profité pour élaborer un guide d'annotation dédié aux désignations nominales d'événements. Nous avons étudié l'importance et la qualité des lexiques existants pour une application dans notre tâche d'extraction automatique. Nous avons aussi, par des règles d'extraction, porté intérêt au cotexte d'apparition des noms pour en déterminer l'événementialité. À la suite de ces études, nous avons extrait un lexique pondéré en événementialité (dont la particularité est d'être dédié à l'extraction des événements nominaux), qui rend compte du fait que certains noms sont plus susceptibles que d'autres de représenter des événements. Utilisée comme indice pour l'extraction des noms d'événements, cette pondération permet d'extraire des noms qui ne sont pas présents dans les lexiques standards existants. Enfin, au moyen de l'apprentissage automatique, nous avons travaillé sur des traits d'apprentissage contextuels en partie fondés sur la syntaxe pour extraire de noms d'événements.
20

Marcińczuk, Michał. "Pattern Acquisition Methods for Information Extraction Systems." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4291.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This master thesis treats about Event Recognition in the reports of Polish stockholders. Event Recognition is one of the Information Extraction tasks. This thesis provides a comparison of two approaches to Event Recognition: manual and automatic. In the manual approach regular expressions are used. Regular expressions are used as a baseline for the automatic approach. In the automatic approach three Machine Learning methods were applied. In the initial experiment the Decision Trees, naive Bayes and Memory Based Learning methods are compared. A modification of the standard Memory Based Learning method is presented which goal is to create a classifier that uses only positives examples in the classification task. The performance of the modified Memory Based Learning method is presented and compared to the baseline and also to other Machine Learning methods. In the initial experiment one type of annotation is used and it is the meeting date annotation. The final experiment is conducted using three types of annotations: the meeting time, the meeting date and the meeting place annotation. The experiments show that the classification can be performed using only one class of instances with the same level of performance.
(+48)669808616
21

Sarafraz, Farzaneh. "Finding conflicting statements in the biomedical literature." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/finding-conflicting-statements-in-the-biomedical-literature(963e490a-eeea-4f4c-864d-fb318899beed).html.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The main archive of life sciences literature currently contains more than 18,000,000 references, and it is virtually impossible for any human to stay up-to-date with this large number of papers, even in a specific sub-domain. Not every fact that is reported in the literature is novel and distinct. Scientists report repeat experiments, or refer to previous findings. Given the large number of publications, it is not surprising that information on certain topics is repeated over a number of publications. From consensus to contradiction, there are all shades of agreement between the claimed facts in the literature, and considering the volume of the corpus, conflicting findings are not unlikely. Finding such claims is particularly interesting for scientists, as they can present opportunities for knowledge consolidation and future investigations. In this thesis we present a method to extract and contextualise statements about molecular events as expressed in the biomedical literature, and to find those that potentially conflict each other. The approach uses a system that detects event negations and speculation, and combines those with contextual features (e.g. type of event, species, and anatomical location) to build a representational model for establishing relations between different biological events, including relations concerning conflicts. In the detection of negations and speculations, rich lexical, syntactic, and semantic features have been exploited, including the syntactic command relation. Different parts of the proposed method have been evaluated in a context of the BioNLP 09 challenge. The average F-measures for event negation and speculation detection were 63% (with precision of 88%) and 48% (with precision of 64%) respectively. An analysis of a set of 50 extracted event pairs identified as potentially conflicting revealed that 32 of them showed some degree of conflict (64%); 10 event pairs (20%) needed a more complex biological interpretation to decide whether there was a conflict. We also provide an open source integrated text mining framework for extracting events and their context on a large-scale basis using a pipeline of tools that are available or have been developed as part of this research, along with 72,314 potentially conflicting molecular event pairs that have been generated by mining the entire body of accessible biomedical literature. We conclude that, whilst automated conflict mining would need more comprehensive context extraction, it is feasible to provide a support environment for biologists to browse potential conflicting statements and facilitate data and knowledge consolidation.
22

Matsubara, Shigeki. "Corpus-based Natural Language Processing." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2004. http://hdl.handle.net/2237/10355.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Smith, Sydney. "Approaches to Natural Language Processing." Scholarship @ Claremont, 2018. http://scholarship.claremont.edu/cmc_theses/1817.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This paper explores topic modeling through the example text of Alice in Wonderland. It explores both singular value decomposition as well as non-­‐‑negative matrix factorization as methods for feature extraction. The paper goes on to explore methods for partially supervised implementation of topic modeling through introducing themes. A large portion of the paper also focuses on implementation of these techniques in python as well as visualizations of the results which use a combination of python, html and java script along with the d3 framework. The paper concludes by presenting a mixture of SVD, NMF and partially-­‐‑supervised NMF as a possible way to improve topic modeling.
24

Strandberg, Aron, and Patrik Karlström. "Processing Natural Language for the Spotify API : Are sophisticated natural language processing algorithms necessary when processing language in a limited scope?" Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186867.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Knowing whether you can implement something complex in a simple way in your application is always of interest. A natural language interface is some- thing that could theoretically be implemented in a lot of applications but the complexity of most natural language processing algorithms is a limiting factor. The problem explored in this paper is whether a simpler algorithm that doesn’t make use of convoluted statistical models and machine learning can be good enough. We implemented two algorithms, one utilizing Spotify’s own search and one with a more accurate, o✏ine search. With the best precision we could muster being 81% at an average of 2,28 seconds per query this is not a viable solution for a complete and satisfactory user experience. Further work could push the performance into an acceptable range.
25

Chen, Joseph C. H. "Quantum computation and natural language processing." [S.l.] : [s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=965581020.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Knight, Sylvia Frances. "Natural language processing for aerospace documentation." Thesis, University of Cambridge, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621395.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Naphtal, Rachael (Rachael M. ). "Natural language processing based nutritional application." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100640.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 67-68).
The ability to accurately and eciently track nutritional intake is a powerful tool in combating obesity and other food related diseases. Currently, many methods used for this task are time consuming or easily abandoned; however, a natural language based application that converts spoken text to nutritional information could be a convenient and eective solution. This thesis describes the creation of an application that translates spoken food diaries into nutritional database entries. It explores dierent methods for solving the problem of converting brands, descriptions and food item names into entries in nutritional databases. Specifically, we constructed a cache of over 4,000 food items, and also created a variety of methods to allow refinement of database mappings. We also explored methods of dealing with ambiguous quantity descriptions and the mapping of spoken quantity values to numerical units. When assessed by 500 users entering their daily meals on Amazon Mechanical Turk, the system was able to map 83.8% of the correctly interpreted spoken food items to relevant nutritional database entries. It was also able to nd a logical quantity for 92.2% of the correct food entries. Overall, this system shows a signicant step towards the intelligent conversion of spoken food diaries to actual nutritional feedback.
by Rachael Naphtal.
M. Eng.
28

Eriksson, Simon. "COMPARING NATURAL LANGUAGE PROCESSING TO STRUCTURED QUERY LANGUAGE ALGORITHMS." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163310.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Using natural language processing to create Structured Query Language (SQL) queries has many benefi€ts in theory. Even though SQL is an expressive and powerful language it requires certain technical knowledge to use. An interface effectively utilizing natural language processing would instead allow the user to communicate with the SQL database as if they were communicating with another human being. In this paper I compare how two of the currently most advanced open source algorithms (TypeSQL and SyntaxSQL) in this €field can understandadvanced SQL. I show that SyntaxSQL is signi€cantly more accurate but makes some sacri€ces in execution time compared to TypeSQL.
29

Kesarwani, Vaibhav. "Automatic Poetry Classification Using Natural Language Processing." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37309.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Poetry, as a special form of literature, is crucial for computational linguistics. It has a high density of emotions, figures of speech, vividness, creativity, and ambiguity. Poetry poses a much greater challenge for the application of Natural Language Processing algorithms than any other literary genre. Our system establishes a computational model that classifies poems based on similarity features like rhyme, diction, and metaphor. For rhyme analysis, we investigate the methods used to classify poems based on rhyme patterns. First, the overview of different types of rhymes is given along with the detailed description of detecting rhyme type and sub-types by the application of a pronunciation dictionary on our poetry dataset. We achieve an accuracy of 96.51% in identifying rhymes in poetry by applying a phonetic similarity model. Then we achieve a rhyme quantification metric RhymeScore based on the matching phonetic transcription of each poem. We also develop an application for the visualization of this quantified RhymeScore as a scatter plot in 2 or 3 dimensions. For diction analysis, we investigate the methods used to classify poems based on diction. First the linguistic quantitative and semantic features that constitute diction are enumerated. Then we investigate the methodology used to compute these features from our poetry dataset. We also build a word embeddings model on our poetry dataset with 1.5 million words in 100 dimensions and do a comparative analysis with GloVe embeddings. Metaphor is a part of diction, but as it is a very complex topic in its own right, we address it as a stand-alone issue and develop several methods for it. Previous work on metaphor detection relies on either rule-based or statistical models, none of them applied to poetry. Our methods focus on metaphor detection in a poetry corpus, but we test on non-poetry data as well. We combine rule-based and statistical models (word embeddings) to develop a new classification system. Our first metaphor detection method achieves a precision of 0.759 and a recall of 0.804 in identifying one type of metaphor in poetry, by using a Support Vector Machine classifier with various types of features. Furthermore, our deep learning model based on a Convolutional Neural Network achieves a precision of 0.831 and a recall of 0.836 for the same task. We also develop an application for generic metaphor detection in any type of natural text.
30

Pham, Son Bao Computer Science &amp Engineering Faculty of Engineering UNSW. "Incremental knowledge acquisition for natural language processing." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26299.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.
31

張少能 and Siu-nang Bruce Cheung. "A concise framework of natural language processing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1989. http://hub.hku.hk/bib/B31208563.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Cahill, Lynne Julie. "Syllable-based morphology for natural language processing." Thesis, University of Sussex, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386529.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis addresses the problem of accounting for morphological alternation within Natural Language Processing. It proposes an approach to morphology which is based on phonological concepts, in particular the syllable, in contrast to morpheme-based approaches which have standardly been used by both NLP and linguistics. It is argued that morpheme-based approaches, within both linguistics and NLP, grew out of the apparently purely affixational morphology of European languages, and especially English, but are less appropriate for non-affixational languages such as Arabic. Indeed, it is claimed that even accounts of those European languages miss important linguistic generalizations by ignoring more phonologically based alternations, such as umlaut in German and ablaut in English. To justify this approach, we present a wide range of data from languages as diverse as German and Rotuman. A formal language, MOLUSe, is described, which allows for the definition of declarative mappings between syllable-sequences, and accounts of non-trivial fragments of the inflectional morphology of English, Arabic and Sanskrit are presented, to demonstrate the capabilities of the language. A semantics for the language is defined, and the implementation of an interpreter is described. The thesis discusses theoretical (linguistic) issues, as well as implementational issues involved in the incorporation of MOLUSC into a larger lexicon system. The approach is contrasted with previous work in computational morphology, in particular finite-state morphology, and its relation to other work in the fields of morphology and phonology is also discussed.
33

Lei, Tao Ph D. Massachusetts Institute of Technology. "Interpretable neural models for natural language processing." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108990.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 109-119).
The success of neural network models often comes at a cost of interpretability. This thesis addresses the problem by providing justifications behind the model's structure and predictions. In the first part of this thesis, we present a class of sequence operations for text processing. The proposed component generalizes from convolution operations and gated aggregations. As justifications, we relate this component to string kernels, i.e. functions measuring the similarity between sequences, and demonstrate how it encodes the efficient kernel computing algorithm into its structure. The proposed model achieves state-of-the-art or competitive results compared to alternative architectures (such as LSTMs and CNNs) across several NLP applications. In the second part, we learn rationales behind the model's prediction by extracting input pieces as supporting evidence. Rationales are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by the desiderata for rationales. We demonstrate the effectiveness of this learning framework in applications such multi-aspect sentiment analysis. Our method achieves a performance over 90% evaluated against manual annotated rationales.
by Tao Lei.
Ph. D.
34

Grinman, Alex J. "Natural language processing on encrypted patient data." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/113438.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 85-86).
While many industries can benefit from machine learning techniques for data analysis, they often do not have the technical expertise nor computational power to do so. Therefore, many organizations would benefit from outsourcing their data analysis. Yet, stringent data privacy policies prevent outsourcing sensitive data and may stop the delegation of data analysis in its tracks. In this thesis, we put forth a two-party system where one party capable of powerful computation can run certain machine learning algorithms from the natural language processing domain on the second party's data, where the first party is limited to learning only specific functions of the second party's data and nothing else. Our system provides simple cryptographic schemes for locating keywords, matching approximate regular expressions, and computing frequency analysis on encrypted data. We present a full implementation of this system in the form of a extendible software library and a command line interface. Finally, we discuss a medical case study where we used our system to run a suite of unmodified machine learning algorithms on encrypted free text patient notes.
by Alex J. Grinman.
M. Eng.
35

Alharthi, Haifa. "Natural Language Processing for Book Recommender Systems." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39134.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The act of reading has benefits for individuals and societies, yet studies show that reading declines, especially among the young. Recommender systems (RSs) can help stop such decline. There is a lot of research regarding literary books using natural language processing (NLP) methods, but the analysis of textual book content to improve recommendations is relatively rare. We propose content-based recommender systems that extract elements learned from book texts to predict readers’ future interests. One factor that influences reading preferences is writing style; we propose a system that recommends books after learning their authors’ writing style. To our knowledge, this is the first work that transfers the information learned by an author-identification model to a book RS. Another approach that we propose uses over a hundred lexical, syntactic, stylometric, and fiction-based features that might play a role in generating high-quality book recommendations. Previous book RSs include very few stylometric features; hence, our study is the first to include and analyze a wide variety of textual elements for book recommendations. We evaluated both approaches according to a top-k recommendation scenario. They give better accuracy when compared with state-of-the-art content and collaborative filtering methods. We highlight the significant factors that contributed to the accuracy of the recommendations using a forest of randomized regression trees. We also conducted a qualitative analysis by checking if similar books/authors were annotated similarly by experts. Our content-based systems suffer from the new user problem, well-known in the field of RSs, that hinders their ability to make accurate recommendations. Therefore, we propose a Topic Model-Based book recommendation component (TMB) that addresses the issue by using the topics learned from a user’s shared text on social media, to recognize their interests and map them to related books. To our knowledge, there is no literature regarding book RSs that exploits public social networks other than book-cataloging websites. Using topic modeling techniques, extracting user interests can be automatic and dynamic, without the need to search for predefined concepts. Though TMB is designed to complement other systems, we evaluated it against a traditional book CB. We assessed the top k recommendations made by TMB and CB and found that both retrieved a comparable number of books, even though CB relied on users’ rating history, while TMB only required their social profiles.
36

Medlock, Benjamin William. "Investigating classification for natural language processing tasks." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611949.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Woldemariam, Yonas Demeke. "Natural language processing in cross-media analysis." Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-147640.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition. Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks. In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages.
38

Cheung, Siu-nang Bruce. "A concise framework of natural language processing /." [Hong Kong : University of Hong Kong], 1989. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12432544.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Dawborn, Timothy James. "DOCREP: Document Representation for Natural Language Processing." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14767.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The field of natural language processing (NLP) revolves around the computational interpretation and generation of natural language. The language typically processed in NLP occurs in paragraphs or documents rather than in single isolated sentences. Despite this, most NLP tools operate over one sentence at a time, not utilising the context outside of the sentence nor any of the metadata associated with the underlying document. One pragmatic reason for this disparity is that representing documents and their annotations through an NLP pipeline is difficult with existing infrastructure. Representing linguistic annotations for a text document using a plain text markupbased format is not sufficient to capture arbitrarily nested and overlapping annotations. Despite this, most linguistic text corpora and NLP tools still operate in this fashion. A document representation framework (DRF) supports the creation of linguistic annotations stored separately to the original document, overcoming this nesting and overlapping annotations problem. Despite the prevalence of pipelines in NLP, there is little published work on, or implementations of, DRFs. The main DRFs, GATE and UIMA, exhibit usability issues which have limited their uptake by the NLP community. This thesis aims to solve this problem through a novel, modern DRF, DOCREP; a portmanteau of document representation. DOCREP is designed to be efficient, programming language and environment agnostic, and most importantly, easy to use. We want DOCREP to be powerful and simple enough to use that NLP researchers and language technology application developers would even use it in their own small projects instead of developing their own ad hoc solution. This thesis begins by presenting the design criteria for our new DRF, extending upon existing requirements from the literature with additional usability and efficiency requirements that should lead to greater use of DRFs. We outline how our new DRF, DOCREP, differs from existing DRFs in terms of the data model, serialisation strategy, developer interactions, support for rapid prototyping, and the expected runtime and environment requirements. We then describe our provided implementations of DOCREP in Python, C++, and Java, the most common languages in NLP; outlining their efficiency, idiomaticity, and the ways in which these implementations satisfy our design requirements. We then present two different evaluations of DOCREP. First, we evaluate its ability to model complex linguistic corpora through the conversion of the OntoNotes 5 corpus to DOCREP and UIMA, outlining the differences in modelling approaches required and efficiency when using these two DRFs. Second, we evaluate DOCREP against our usability requirements from the perspective of a computational linguist who is new to DOCREP. We walk through a number of common use cases for working with text corpora and contrast traditional approaches again their DOCREP counterpart. These two evaluations conclude that DOCREP satisfies our outlined design requirements and outperforms existing DRFs in terms of efficiency, and most importantly, usability. With DOCREP designed and evaluated, we then show how NLP applications can harness document structure. We present a novel document structureaware tokenization framework for the first stage of fullstack NLP systems. We then present a new structureaware NER system which achieves stateoftheart results on multiple standard NER evaluations. The tokenization framework produces its tokenization, sentence boundary, and document structure annotations as native DOCREP annotations. The NER system consumes DOCREP annotations and utilises many components of the DOCREP runtime. We believe that the adoption of DOCREP throughout the NLP community will assist in the reproducibility of results, substitutability of components, and overall quality assurance of NLP systems and corpora, all of which are problematic areas within NLP research and applications. This adoption will make developing and combining NLP components into applications faster, more efficient, and more reliable.
40

Miao, Yishu. "Deep generative models for natural language processing." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:e4e1f1f9-e507-4754-a0ab-0246f1e1e258.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep generative models are essential to Natural Language Processing (NLP) due to their outstanding ability to use unlabelled data, to incorporate abundant linguistic features, and to learn interpretable dependencies among data. As the structure becomes deeper and more complex, having an effective and efficient inference method becomes increasingly important. In this thesis, neural variational inference is applied to carry out inference for deep generative models. While traditional variational methods derive an analytic approximation for the intractable distributions over latent variables, here we construct an inference network conditioned on the discrete text input to provide the variational distribution. The powerful neural networks are able to approximate complicated non-linear distributions and grant the possibilities for more interesting and complicated generative models. Therefore, we develop the potential of neural variational inference and apply it to a variety of models for NLP with continuous or discrete latent variables. This thesis is divided into three parts. Part I introduces a generic variational inference framework for generative and conditional models of text. For continuous or discrete latent variables, we apply a continuous reparameterisation trick or the REINFORCE algorithm to build low-variance gradient estimators. To further explore Bayesian non-parametrics in deep neural networks, we propose a family of neural networks that parameterise categorical distributions with continuous latent variables. Using the stick-breaking construction, an unbounded categorical distribution is incorporated into our deep generative models which can be optimised by stochastic gradient back-propagation with a continuous reparameterisation. Part II explores continuous latent variable models for NLP. Chapter 3 discusses the Neural Variational Document Model (NVDM): an unsupervised generative model of text which aims to extract a continuous semantic latent variable for each document. In Chapter 4, the neural topic models modify the neural document models by parameterising categorical distributions with continuous latent variables, where the topics are explicitly modelled by discrete latent variables. The models are further extended to neural unbounded topic models with the help of stick-breaking construction, and a truncation-free variational inference method is proposed based on a Recurrent Stick-breaking construction (RSB). Chapter 5 describes the Neural Answer Selection Model (NASM) for learning a latent stochastic attention mechanism to model the semantics of question-answer pairs and predict their relatedness. Part III discusses discrete latent variable models. Chapter 6 introduces latent sentence compression models. The Auto-encoding Sentence Compression Model (ASC), as a discrete variational auto-encoder, generates a sentence by a sequence of discrete latent variables representing explicit words. The Forced Attention Sentence Compression Model (FSC) incorporates a combined pointer network biased towards the usage of words from source sentence, which significantly improves the performance when jointly trained with the ASC model in a semi-supervised learning fashion. Chapter 7 describes the Latent Intention Dialogue Models (LIDM) that employ a discrete latent variable to learn underlying dialogue intentions. Additionally, the latent intentions can be interpreted as actions guiding the generation of machine responses, which could be further refined autonomously by reinforcement learning. Finally, Chapter 8 summarizes our findings and directions for future work.
41

Hu, Jin. "Explainable Deep Learning for Natural Language Processing." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254886.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep learning methods get impressive performance in many Natural Neural Processing (NLP) tasks, but it is still difficult to know what happened inside a deep neural network. In this thesis, a general overview of Explainable AI and how explainable deep learning methods applied for NLP tasks is given. Then the Bi-directional LSTM and CRF (BiLSTM-CRF) model for Named Entity Recognition (NER) task is introduced, as well as the approach to make this model explainable. The approach to visualize the importance of neurons in Bi-LSTM layer of the model for NER by Layer-wise Relevance Propagation (LRP) is proposed, which can measure how neurons contribute to each predictionof a word in a sequence. Ideas about how to measure the influence of CRF layer of the Bi-LSTM-CRF model is also described.
Djupa inlärningsmetoder får imponerande prestanda i många naturliga Neural Processing (NLP) uppgifter, men det är fortfarande svårt att veta vad hände inne i ett djupt neuralt nätverk. I denna avhandling, en allmän översikt av förklarliga AI och hur förklarliga djupa inlärningsmetoder tillämpas för NLP-uppgifter ges. Då den bi-riktiga LSTM och CRF (BiLSTM-CRF) modell för Named Entity Recognition (NER) uppgift införs, liksom tillvägagångssättet för att göra denna modell förklarlig. De tillvägagångssätt för att visualisera vikten av neuroner i BiLSTM-skiktet av Modellen för NER genom Layer-Wise Relevance Propagation (LRP) föreslås, som kan mäta hur neuroner bidrar till varje förutsägelse av ett ord i en sekvens. Idéer om hur man mäter påverkan av CRF-skiktet i Bi-LSTM-CRF-modellen beskrivs också.
42

Guy, Alison. "Logical expressions in natural language conditionals." Thesis, University of Sunderland, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.278644.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Walker, Alden. "Natural language interaction with robots." Diss., Connect to the thesis, 2007. http://hdl.handle.net/10066/1275.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Fuchs, Gil Emanuel. "Practical natural language processing question answering using graphs /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2004. http://uclibs.org/PID/11984.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
45

Kolak, Okan. "Rapid resource transfer for multilingual natural language processing." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/3182.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Dept. of Linguistics. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
46

Takeda, Koichi. "Building Natural Language Processing Applications Using Descriptive Models." 京都大学 (Kyoto University), 2010. http://hdl.handle.net/2433/120372.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Åkerud, Daniel, and Henrik Rendlo. "Natural Language Processing from a Software Engineering Perspective." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2056.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis is intended to deal with questions related to the processing of naturally occurring texts, also known as natural language processing (NLP). The subject will be approached from a software engineering perspective, and the problem description will be formulated thereafter. The thesis is roughly divided into two major parts. The first part contains a literature study covering fundamental concepts and algorithms. We discuss both serial and parallel architectures, and conclude that different scenarios call for different architectures. The second part is an empirical evaluation of an NLP framework or toolkit chosen amongst a few, conducted in order to elucidate the theoretical part of the thesis. We argue that component based development in a portable language could increase the reusability in the NLP community, where reuse is currently low. The recent emergence of the discovered initiatives and the great potential of many applications in this area reveal a bright future for NLP.
48

Byström, Adam. "From Intent to Code : Using Natural Language Processing." Thesis, Uppsala universitet, Avdelningen för datalogi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325238.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Programming and the possibility to express one’s intent to a machine is becoming a very important skill in our digitalizing society. Today, instructing a machine, such as a computer to perform actions is done through programming. What if this could be done with human language? This thesis examines how new technologies and methods in the form of Natural Language Processing can be used to make programming more accessible by translating intent expressed in natural language into code that a computer can execute. Related research has studied using natural language as a programming language and using natural language to instruct robots. These studies have shown promising results but are hindered by strict syntaxes, limited domains and inability to handle ambiguity. Studies have also been made using Natural Language Processing to analyse source code, turning code into natural language. This thesis has the reversed approach. By utilizing Natural Language Processing techniques, an intent can be translated into code containing concepts such as sequential execution, loops and conditional statements. In this study, a system for converting intent, expressed in English sentences, into code is developed. To analyse this approach to programming, an evaluation framework is developed, evaluating the system during the development process as well as usage of the final system. The results show that this way of programming might have potential but conclude that the Natural Language Processing models still have too low accuracy. Further research is required to increase this accuracy to further assess the potential of this way of programming.
49

Bigert, Johnny. "Automatic and unsupervised methods in natural language processing." Doctoral thesis, Stockholm, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-156.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Cohn, Trevor A. "Scaling conditional random fields for natural language processing /." Connect to thesis, 2007. http://eprints.unimelb.edu.au/archive/00002874.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

До бібліографії