Dissertations / Theses on the topic 'Natural Language Processing (NLP)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Natural Language Processing (NLP).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hellmann, Sebastian. "Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-157932.
Full textNOZZA, DEBORA. "Deep Learning for Feature Representation in Natural Language Processing." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2018. http://hdl.handle.net/10281/241185.
Full textThe huge amount of textual user-generated content on the Web has incredibly grown in the last decade, creating new relevant opportunities for different real-world applications and domains. To overcome the difficulties of dealing with this large volume of unstructured data, the research field of Natural Language Processing has provided efficient solutions developing computational models able to understand and interpret human natural language without any (or almost any) human intervention. This field has gained in further computational efficiency and performance from the advent of the recent machine learning research lines concerned with Deep Learning. In particular, this thesis focuses on a specific class of Deep Learning models devoted to learning high-level and meaningful representations of input data in unsupervised settings, by computing multiple non-linear transformations of increasing complexity and abstraction. Indeed, learning expressive representations from the data is a crucial step in Natural Language Processing, because it involves the transformation from discrete symbols (e.g. characters) to a machine-readable representation as real-valued vectors, which should encode semantic and syntactic meanings of the language units. The first research direction of this thesis is aimed at giving evidence that enhancing Natural Language Processing models with representations obtained by unsupervised Deep Learning models can significantly improve the computational abilities of making sense of large volume of user-generated text. In particular, this thesis addresses tasks that were considered crucial for understanding what the text is talking about, by extracting and disambiguating the named entities (Named Entity Recognition and Linking), and which opinion the user is expressing, dealing also with irony (Sentiment Analysis and Irony Detection). For each task, this thesis proposes a novel Natural Language Processing model enhanced by the data representation obtained by Deep Learning. As second research direction, this thesis investigates the development of a novel Deep Learning model for learning a meaningful textual representation taking into account the relational structure underlying user-generated content. The inferred representation comprises both textual and relational information. Once the data representation is obtained, it could be exploited by off-the-shelf machine learning algorithms in order to perform different Natural Language Processing tasks. As conclusion, the experimental investigations reveal that models able to incorporate high-level features, obtained by Deep Learning, show significant performance and improved generalization abilities. Further improvements can be also achieved by models able to take into account the relational information in addition to the textual content.
Panesar, Kulvinder. "Natural language processing (NLP) in Artificial Intelligence (AI): a functional linguistic perspective." Vernon Press, 2020. http://hdl.handle.net/10454/18140.
Full textThis chapter encapsulates the multi-disciplinary nature that facilitates NLP in AI and reports on a linguistically orientated conversational software agent (CSA) (Panesar 2017) framework sensitive to natural language processing (NLP), language in the agent environment. We present a novel computational approach of using the functional linguistic theory of Role and Reference Grammar (RRG) as the linguistic engine. Viewing language as action, utterances change the state of the world, and hence speakers and hearer’s mental state change as a result of these utterances. The plan-based method of discourse management (DM) using the BDI model architecture is deployed, to support a greater complexity of conversation. This CSA investigates the integration, intersection and interface of the language, knowledge, speech act constructions (SAC) as a grammatical object, and the sub-model of BDI and DM for NLP. We present an investigation into the intersection and interface between our linguistic and knowledge (belief base) models for both dialogue management and planning. The architecture has three-phase models: (1) a linguistic model based on RRG; (2) Agent Cognitive Model (ACM) with (a) knowledge representation model employing conceptual graphs (CGs) serialised to Resource Description Framework (RDF); (b) a planning model underpinned by BDI concepts and intentionality and rational interaction; and (3) a dialogue model employing common ground. Use of RRG as a linguistic engine for the CSA was successful. We identify the complexity of the semantic gap of internal representations with details of a conceptual bridging solution.
Välme, Emma, and Lea Renmarker. "Accelerating Sustainability Report Assessment with Natural Language Processing." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445912.
Full textDjoweini, Camran, and Henrietta Hellberg. "Approaches to natural language processing in app development." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230167.
Full textDatalingvistik (engelska natural language processing) är ett område inom datavetenskap som ännu inte är fullt etablerat. En hög efterfrågan av stöd för naturligt språk i applikationer skapar ett behov av tillvägagångssätt och verktyg anpassade för ingenjörer. Detta projekt närmar sig området från en ingenjörs synvinkel för att undersöka de tillvägagångssätt, verktyg och tekniker som finns tillgängliga att arbeta med för utveckling av stöd för naturligt språk i applikationer i dagsläget. Delområdet ‘information retrieval’ undersöktes genom en fallstudie, där prototyper utvecklades för att skapa en djupare förståelse av verktygen och teknikerna som används inom området. Vi kom fram till att det går att kategorisera verktyg och tekniker i två olika grupper, beroende på hur distanserad utvecklaren är från den underliggande bearbetningen av språket. Kategorisering av verktyg och tekniker samt källkod, dokumentering och utvärdering av prototyperna presenteras som resultat. Valet av tillvägagångssätt, tekniker och verktyg bör baseras på krav och specifikationer för den färdiga produkten. Resultaten av studien är till stor del generaliserbara eftersom lösningar till många problem inom området är likartade även om de slutgiltiga målen skiljer sig åt.
Sætre, Rune. "GeneTUC: Natural Language Understanding in Medical Text." Doctoral thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2006. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-545.
Full textNatural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists.
The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems.
The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities.
The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.
Andrén, Samuel, and William Bolin. "NLIs over APIs : Evaluating Pattern Matching as a way of processing natural language for a simple API." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186429.
Full textDen här rapporten utforskar hur genomförbart det är att använda mönstermatchning för att implementera ett robust användargränssnitt för styrning med naturligt språk (Natural Language Interface, NLI) över en begränsad Application Programming Interface (API). Eftersom APIer används i stor utsträckning idag, ofta i mobila applikationer, har det blivit allt mer viktigt att hitta sätt att göra dem ännu mer tillgängliga för slutanvändare. Ett mycket intuitivt sätt att komma åt information är med hjälp av naturligt språk via en API. I den här rapporten redogörs först för möjligheten att bygga ett korpus för en viss API and att skapa mönster för mönstermatchning på det korpuset. Därefter utvärderas en implementation av ett NLI som bygger på mönstermatchning med hjälp av korpuset. Resultatet av korpusuppbyggnaden visar att trots att antalet unika fraser som används för vårt API ökar ganska stadigt, så konvergerar antalat mönster på de fraserna relativt snabbt mot en konstant. Detta antyder att det är mycket möjligt att använda desssa mönster för att skapa en NLI som är robust nog för en API. Utvärderingen av implementationen av mönstermatchingssystemet antyder att tekniken kan användas för att framgångsrikt extrahera information från fraser om mönstret frasen följer finns i systemet.
Wallner, Vanja. "Mapping medical expressions to MedDRA using Natural Language Processing." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-426916.
Full textWoldemariam, Yonas Demeke. "Natural language processing in cross-media analysis." Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-147640.
Full textHuang, Fei. "Improving NLP Systems Using Unconventional, Freely-Available Data." Diss., Temple University Libraries, 2013. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/221031.
Full textPh.D.
Sentence labeling is a type of pattern recognition task that involves the assignment of a categorical label to each member of a sentence of observed words. Standard supervised sentence-labeling systems often have poor generalization: it is difficult to estimate parameters for words which appear in the test set, but seldom (or never) appear in the training set, because they only use words as features in their prediction tasks. Representation learning is a promising technique for discovering features that allow a supervised classifier to generalize from a source domain dataset to arbitrary new domains. We demonstrate that features which are learned from distributional representations of unlabeled data can be used to improve performance on out-of-vocabulary words and help the model to generalize. We also argue that it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. We investigate techniques for building open-domain sentence labeling systems that approach the ideal of a system whose accuracy is high and consistent across domains. In particular, we investigate unsupervised techniques for language model representation learning that provide new features which are stable across domains, in that they are predictive in both the training and out-of-domain test data. In experiments, our best system with the proposed techniques reduce error by as much as 11.4% relative to the previous system using traditional representations on the Part-of-Speech tagging task. Moreover, we leverage the Posterior Regularization framework, and develop an architecture for incorporating biases from prior knowledge into representation learning. We investigate three types of biases: entropy bias, distance bias and predictive bias. Experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners. This results in a relative reduction in error of more than 16% for both tasks with respect to existing state-of-the-art representation learning techniques. We also extend the idea of using additional unlabeled data to improve the system's performance on a different NLP task, word alignment. Traditional word alignment only takes a sentence-level aligned parallel corpus as input and generates the word-level alignments. However, as the integration of different cultures, more and more people are competent in multiple languages, and they often use elements of multiple languages in conversations. Linguist Code Switching (LCS) is such a situation where two or more languages show up in the context of a single conversation. Traditional machine translation (MT) systems treat LCS data as noise, or just as regular sentences. However, if LCS data is processed intelligently, it can provide a useful signal for training word alignment and MT models. In this work, we first extract constraints from this code switching data and then incorporate them into a word alignment model training procedure. We also show that by using the code switching data, we can jointly train a word alignment model and a language model using co-training. Our techniques for incorporating LCS data improve by 2.64 in BLEU score over a baseline MT system trained using only standard sentence-aligned corpora.
Temple University--Theses
Karlin, Ievgen. "An Evaluation of NLP Toolkits for Information Quality Assessment." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-22606.
Full textBoulanger, Hugo. "Data augmentation and generation for natural language processing." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG019.
Full textMore and more fields are looking to automate part of their process.Automatic language processing contains methods for extracting information from texts.These methods can use machine learning.Machine learning requires annotated data to perform information extraction.Applying these methods to new domains requires obtaining annotated data related to the task.In this thesis, our goal is to study generation methods to improve the performance of learned models with low amounts of data.Different methods of generation are explored that either contain machine learning or do not, which are used to generate the data needed to learn sequence labeling models.The first method explored is pattern filling.This data generation method generates annotated data by combining sentences with slots, or patterns, with mentions.We have shown that this method improves the performance of labeling models with tiny amounts of data.The amount of data needed to use this method is also studied.The second approach tested is the use of language models for text generation alongside a semi-supervised learning method for tagging.The semi-supervised learning method used is tri-training and is used to add labels to the generated data.The tri-training is tested on several generation methods using different pre-trained language models.We proposed a version of tri-training called generative tri-training, where the generation is not done in advance but during the tri-training process and takes advantage of it.The performance of the models trained during the semi-supervision process and of the models trained on the data generated by it are tested.In most cases, the data produced match the performance of the models trained with the semi-supervision.This method allows the improvement of the performances at all the tested data levels with respect to the models without augmentation.The third avenue of study combines some aspects of the previous approaches.For this purpose, different approaches are tested.The use of language models to do sentence replacement in the manner of the pattern-filling generation method is unsuccessful.Using a set of data coming from the different generation methods is tested, which does not outperform the best method.Finally, applying the pattern-filling method to the data generated with the tri-training is tested and does not improve the results obtained with the tri-training.While much remains to be studied, we have highlighted simple methods, such as pattern filling, and more complex ones, such as the use of supervised learning with sentences generated by a language model, to improve the performance of labeling models through the generation of annotated data
Lager, Adam. "Improving Solr search with Natural Language Processing : An NLP implementation for information retrieval in Solr." Thesis, Linköpings universitet, Programvara och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177790.
Full textPanesar, Kulvinder. "Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions." Universitat Politécnica de Valéncia, 2020. http://hdl.handle.net/10454/18121.
Full textThis paper aims to demystify the hype and attention on chatbots and its association with conversational artificial intelligence. Both are slowly emerging as a real presence in our lives from the impressive technological developments in machine learning, deep learning and natural language understanding solutions. However, what is under the hood, and how far and to what extent can chatbots/conversational artificial intelligence solutions work – is our question. Natural language is the most easily understood knowledge representation for people, but certainly not the best for computers because of its inherent ambiguous, complex and dynamic nature. We will critique the knowledge representation of heavy statistical chatbot solutions against linguistics alternatives. In order to react intelligently to the user, natural language solutions must critically consider other factors such as context, memory, intelligent understanding, previous experience, and personalized knowledge of the user. We will delve into the spectrum of conversational interfaces and focus on a strong artificial intelligence concept. This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the dialogue to achieve a goal. To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution.
Coppola, Gregory Francis. "Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/10451.
Full textRiedel, Sebastian. "Efficient prediction of relational structure and its application to natural language processing." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/4167.
Full textFernquist, Johan. "Detection of deceptive reviews : using classification and natural language processing features." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-306956.
Full textAlkathiri, Abdul Aziz. "Decentralized Large-Scale Natural Language Processing Using Gossip Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281277.
Full textFältet Naturlig Språkbehandling (Natural Language Processing eller NLP) i maskininlärning har sett en ökande popularitet och användning under de senaste åren. Naturen av Naturlig Språkbehandling, som bearbetar naturliga mänskliga språk och datorer, har lett till forskningen och utvecklingen av många algoritmer som producerar inbäddningar av ord. En av de mest använda av dessa algoritmer är Word2Vec. Med överflödet av data som genereras av användare och organisationer, komplexiteten av maskininlärning och djupa inlärningsmodeller, blir det omöjligt att utföra utbildning med hjälp av en enda maskin. Avancemangen inom distribuerad maskininlärning erbjuder en lösning på detta problem, men tyvärr får data av sekretesskäl och datareglering i vissa verkliga scenarier inte lämna sin lokala maskin. Denna begränsning har lett till utvecklingen av tekniker och protokoll som är massivt parallella och dataprivata. Det mest populära av dessa protokoll är federerad inlärning (federated learning), men på grund av sin centraliserade natur utgör det ändock vissa säkerhets- och robusthetsrisker. Följaktligen ledde detta till utvecklingen av massivt parallella, dataprivata och decentraliserade tillvägagångssätt, såsom skvallerinlärning (gossip learning). I skvallerinlärningsprotokollet väljer varje nod i nätverket slumpmässigt en like för informationsutbyte, vilket eliminerarbehovet av en central nod. Syftet med denna forskning är att testa livskraftighetenav skvallerinlärning i större omfattningens verkliga applikationer. I synnerhet fokuserar forskningen på implementering och utvärdering av en NLP-applikation genom användning av skvallerinlärning. Resultaten visar att tillämpningen av Word2Vec i en skvallerinlärnings ramverk är livskraftig och ger jämförbara resultat med dess icke-distribuerade, centraliserade motsvarighet för olika scenarier, med en genomsnittlig kvalitetsförlust av 6,904%.
Ruberg, Nicolaas. "Bert goes sustainable: an NLP approach to ESG financing." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24787/.
Full textGiménez, Fayos María Teresa. "Natural Language Processing using Deep Learning in Social Media." Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/172164.
Full text[CA] En els últims anys, els models d'aprenentatge automàtic profund (AP) han revolucionat els sistemes de processament de llenguatge natural (PLN). Hem estat testimonis d'un avanç formidable en les capacitats d'aquests sistemes i actualment podem trobar sistemes que integren models PLN de manera ubiqua. Alguns exemples d'aquests models amb els quals interaccionem diàriament inclouen models que determinen la intenció de la persona que va escriure un text, el sentiment que pretén comunicar un tweet o la nostra ideologia política a partir del que compartim en xarxes socials. En aquesta tesi s'han proposats diferents models de PNL que aborden tasques que estudien el text que es comparteix en xarxes socials. En concret, aquest treball se centra en dues tasques fonamentalment: l'anàlisi de sentiments i el reconeixement de la personalitat de la persona autora d'un text. La tasca d'analitzar el sentiment expressat en un text és un dels problemes principals en el PNL i consisteix a determinar la polaritat que un text pretén comunicar. Es tracta per tant d'una tasca estudiada en profunditat de la qual disposem d'una vasta quantitat de recursos i models. Per contra, el problema del reconeixement de la personalitat és una tasca revolucionària que té com a objectiu determinar la personalitat dels usuaris considerant el seu estil d'escriptura. L'estudi d'aquesta tasca és més marginal i en conseqüència disposem de menys recursos per abordar-la però no obstant i això presenta un gran potencial. Tot i que el fouc principal d'aquest treball va ser el desenvolupament de models d'aprenentatge profund, també hem proposat models basats en recursos lingüístics i models clàssics de l'aprenentatge automàtic. Aquests últims models ens han permès explorar les subtileses de diferents elements lingüístics com ara l'impacte que tenen les emocions en la classificació correcta del sentiment expressat en un text. Posteriorment, després d'aquests treballs inicials es van desenvolupar models AP, en particular, Xarxes neuronals convolucionals (XNC) que van ser aplicades a les tasques prèviament esmentades. En el cas de el reconeixement de la personalitat, s'han comparat models clàssics de l'aprenentatge automàtic amb models d'aprenentatge profund la qual cosa a permet establir una comparativa de les dos aproximacions sota les mateixes premisses. Cal remarcar que el PNL ha evolucionat dràsticament en els últims anys gràcies a el desenvolupament de campanyes d'avaluació pública on múltiples equips d'investigació comparen les capacitats dels models que proposen sota les mateixes condicions. La majoria dels models presentats en aquesta tesi van ser o bé avaluats mitjançant campanyes d'avaluació públiques, o bé s'ha emprat la configuració d'una campanya pública prèviament celebrada. Sent conscients, per tant, de la importància d'aquestes campanyes per a l'avanç del PNL, vam desenvolupar una campanya d'avaluació pública on l'objectiu era classificar el tema tractat en un tweet, per a la qual cosa vam recollir i etiquetar un nou conjunt de dades. A mesura que avançàvem en el desenvolupament del treball d'aquesta tesi, vam decidir estudiar en profunditat com les XNC s'apliquen a les tasques de PNL. En aquest sentit, es van explorar dues línies de treball.En primer lloc, vam proposar un mètode d'emplenament semàntic per RNC, que planteja una nova manera de representar el text per resoldre tasques de PNL. I en segon lloc, es va introduir un marc teòric per abordar una de les crítiques més freqüents de l'aprenentatge profund, el qual és la falta de interpretabilitat. Aquest marc cerca visualitzar quins patrons lèxics, si n'hi han, han estat apresos per la xarxa per classificar un text.
[EN] In the last years, Deep Learning (DL) has revolutionised the potential of automatic systems that handle Natural Language Processing (NLP) tasks. We have witnessed a tremendous advance in the performance of these systems. Nowadays, we found embedded systems ubiquitously, determining the intent of the text we write, the sentiment of our tweets or our political views, for citing some examples. In this thesis, we proposed several NLP models for addressing tasks that deal with social media text. Concretely, this work is focused mainly on Sentiment Analysis and Personality Recognition tasks. Sentiment Analysis is one of the leading problems in NLP, consists of determining the polarity of a text, and it is a well-known task where the number of resources and models proposed is vast. In contrast, Personality Recognition is a breakthrough task that aims to determine the users' personality using their writing style, but it is more a niche task with fewer resources designed ad-hoc but with great potential. Despite the fact that the principal focus of this work was on the development of Deep Learning models, we have also proposed models based on linguistic resources and classical Machine Learning models. Moreover, in this more straightforward setup, we have explored the nuances of different language devices, such as the impact of emotions in the correct classification of the sentiment expressed in a text. Afterwards, DL models were developed, particularly Convolutional Neural Networks (CNNs), to address previously described tasks. In the case of Personality Recognition, we explored the two approaches, which allowed us to compare the models under the same circumstances. Noteworthy, NLP has evolved dramatically in the last years through the development of public evaluation campaigns, where multiple research teams compare the performance of their approaches under the same conditions. Most of the models here presented were either assessed in an evaluation task or either used their setup. Recognising the importance of this effort, we curated and developed an evaluation campaign for classifying political tweets. In addition, as we advanced in the development of this work, we decided to study in-depth CNNs applied to NLP tasks. Two lines of work were explored in this regard. Firstly, we proposed a semantic-based padding method for CNNs, which addresses how to represent text more appropriately for solving NLP tasks. Secondly, a theoretical framework was introduced for tackling one of the most frequent critics of Deep Learning: interpretability. This framework seeks to visualise what lexical patterns, if any, the CNN is learning in order to classify a sentence. In summary, the main achievements presented in this thesis are: - The organisation of an evaluation campaign for Topic Classification from texts gathered from social media. - The proposal of several Machine Learning models tackling the Sentiment Analysis task from social media. Besides, a study of the impact of linguistic devices such as figurative language in the task is presented. - The development of a model for inferring the personality of a developer provided the source code that they have written. - The study of Personality Recognition tasks from social media following two different approaches, models based on machine learning algorithms and handcrafted features, and models based on CNNs were proposed and compared both approaches. - The introduction of new semantic-based paddings for optimising how the text was represented in CNNs. - The definition of a theoretical framework to provide interpretable information to what CNNs were learning internally.
Giménez Fayos, MT. (2021). Natural Language Processing using Deep Learning in Social Media [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172164
TESIS
Hänig, Christian. "Unsupervised Natural Language Processing for Knowledge Extraction from Domain-specific Textual Resources." Doctoral thesis, Universitätsbibliothek Leipzig, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-112706.
Full textBhaduri, Sreyoshi. "NLP in Engineering Education - Demonstrating the use of Natural Language Processing Techniques for Use in Engineering Education Classrooms and Research." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/82202.
Full textPh. D.
Eriksson, Caroline, and Emilia Kallis. "NLP-Assisted Workflow Improving Bug Ticket Handling." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301248.
Full textVid mjukvaruutveckling går mycket resurser åt till felsökning, en process där tidigare lösningar kan hjälpa till att lösa aktuella problem. Det är ofta tidskrävande att läsa felrapporterna som innehåller denna information. För att minimera tiden som läggs på felsökning och säkerställa att kunskap från tidigare lösningar bevaras inom företaget, utvärderades om sammanfattningar skulle kunna effektivisera detta. Abstrakta och extraherande sammanfattningsmodeller testades för uppgiften och en finjustering av bert-extractive- summarizer gjordes. De genererade sammanfattningarna jämfördes i avseende på upplevd kvalitet, genereringshastighet, likhet mellan varandra och sammanfattningslängd. Den genomsnittliga sammanfattningen innehöll delar av den viktigaste informationen och den föreslagna lösningen var antingen väldokumenterad eller besvarade inte problembeskrivningen alls. Den finjusterade BERT och den abstrakta modellen BART visade goda förutsättningar för att generera sammanfattningar innehållande all den viktigaste informationen.
Lindén, Johannes. "Huvudtitel: Understand and Utilise Unformatted Text Documents by Natural Language Processing algorithms." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-31043.
Full textDemmelmaier, Gustav, and Carl Westerberg. "Data Segmentation Using NLP: Gender and Age." Thesis, Uppsala universitet, Avdelningen för datalogi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-434622.
Full textAljadri, Sinan. "Chatbot : A qualitative study of users' experience of Chatbots." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105434.
Full textDen föreliggande studie har haft som syfte att undersöka användarnas upplevelse av Chatbot utifrån verksamhetsperspektiv och konsumentperspektiv. Studien har också fokuserat på att lyfta fram vilka begränsningar en Chatbot kan ha och eventuella förbättringar för framtida utvecklingen. Studien är baserad på en kvalitativ forskningsmetod med semistrukturerade intervjuer som har analyserats utifrån en tematisk analys. Resultatet av intervjumaterialet har analyserat utifrån tidigare forskning och olika teoretiska perspektiv som Artificial Intelligence (AI), Natural Language Processing (NLP). Resultatet av studien har visat att upplevelsen av Chatbot kan skilja sig mellan verksamheter som erbjuder Chatbot, som är mer positiva och konsumenter som använder det som kundtjänst. Begränsningar och förslag på förbättringar kring Chatbotar är också ett genomgående resultat i studien.
Kärde, Wilhelm. "Tool for linguistic quality evaluation of student texts." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186434.
Full textGrammatikgranskare finns numera tillgängligt i de flesta ordbehandlare. En student som skriver en uppsats har allt som oftast tillgång till en grammatikgranskare. Dock så skiljer det sig mycket mellan den återkoppling som studenten får från grammatikgranskaren respektive läraren. Detta då läraren ofta har fler aspekter som den använder sig av vid bedömingen utav en elevtext. Läraren, till skillnad från grammatikgranskaren, bedömmer en text på aspekter så som hur väl texten hör till en viss genre, dess struktur och ordvariation. Denna uppsats utforskar hur pass väl dessa aspekter går att anpassas till NLP (Natural Language Processing) och implementerar de som passar väl in i en regelbaserad lösning som heter Granska.
Hellmann, Sebastian [Verfasser], Klaus-Peter [Akademischer Betreuer] Fähnrich, Klaus-Peter [Gutachter] Fähnrich, Sören [Akademischer Betreuer] Auer, Jens [Akademischer Betreuer] Lehmann, and Hans [Gutachter] Uszkoreit. "Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data / Sebastian Hellmann ; Gutachter: Klaus-Peter Fähnrich, Hans Uszkoreit ; Klaus-Peter Fähnrich, Sören Auer, Jens Lehmann." Leipzig : Universitätsbibliothek Leipzig, 2015. http://d-nb.info/1239422202/34.
Full textKhizra, Shufa. "Using Natural Language Processing and Machine Learning for Analyzing Clinical Notes in Sickle Cell Disease Patients." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright154759374321405.
Full textCao, Haoliang. "Automating Question Generation Given the Correct Answer." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287460.
Full textI den här avhandlingen presenteras en djup neural nätverksmodell för en frågeställningsuppgift. Givet en Wikipediaartikel skriven på engelska och ett textsegment i artikeln kan modellen generera en enkel fråga vars svar är det givna textsegmentet. Modellen är baserad på en kodar-avkodararkitektur (encoderdecoder architecture). Våra experiment visar att en modell med en finjusterad BERT-kodare och en självuppmärksamhetsavkodare (self-attention decoder) ger bästa prestanda. Vi föreslår också en utvärderingsmetrik för frågeställningsuppgiften, som utvärderar både syntaktisk korrekthet och relevans för de genererade frågorna. Enligt vår analys av samplade data visar det sig att den nya metriken ger bättre utvärdering jämfört med andra populära metriker för utvärdering.
Storby, Johan. "Information extraction from text recipes in a web format." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189888.
Full textAtt söka på Internet efter recept för att hitta intressanta idéer till måltider att laga blir allt populärare. Det kan dock vara svårt att hitta ett recept till en maträtt som kan tillagas med råvarorna som finns hemma. I detta examensarbete kommer en lösning på en del av detta problem att presenteras. Detta examensarbete undersöker en metod för att extrahera de olika delarna av ett recept från Internet för att spara dem och fylla en sökbar databas av recept där användarna kan söka efter recept baserat på de ingredienser som de har till förfogande. Systemet fungerar för både engelska och svenska och kan identifiera båda språken. Detta är ett problem inom språkteknologi och delfältet informationsextraktion. För att lösa informationsextraktionsproblemet använder vi regelbaserade metoder baserade på entitetsigenkänning, metoder för extraktion av brödtext samt allmäna regelbaserade extraktionsmetoder. Resultaten visar på en generellt bra men inte felfri funktionalitet. För engelska har den regelbaserade algoritmen uppnått ett F1-värde av 83,8 % för ingrediensidentifiering, 94,5 % för identifiering av tillagningsinstruktioner och en träffsäkerhet på 88,0 % och 96,4 % för tillagningstid och antal portioner. För svenska fungerade ingrediensidentifieringen något bättre än för engelska men de andra delarna fungerade något sämre. Resultaten är jämförbara med resultaten för andra liknande metoder och kan därmed betraktas som goda, de är dock inte tillräckligt bra för att systemet skall kunna användas självständigt utan en övervakande människa.
Nahnsen, Thade. "Automation of summarization evaluation methods and their application to the summarization process." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5278.
Full textLuff, Robert. "The use of systems engineering principles for the integration of existing models and simulations." Thesis, Loughborough University, 2017. https://dspace.lboro.ac.uk/2134/26739.
Full textAlsehaimi, Afnan Abdulrahman A. "Sentiment Analysis for E-book Reviews on Amazon to Determine E-book Impact Rank." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619109972210567.
Full textPalm, Myllylä Johannes. "Domain Adaptation for Hypernym Discovery via Automatic Collection of Domain-Specific Training Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157693.
Full textDagerman, Björn. "Semantic Analysis of Natural Language and Definite Clause Grammar using Statistical Parsing and Thesauri." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26142.
Full textDai, Xiang. "Recognising Biomedical Names: Challenges and Solutions." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/25482.
Full textMarano, Federica. "Exploring formal models of linguistic data structuring. Enhanced solutions for knowledge management systems based on NLP applications." Doctoral thesis, Universita degli studi di Salerno, 2012. http://hdl.handle.net/10556/349.
Full textThe principal aim of this research is describing to which extent formal models for linguistic data structuring are crucial in Natural Language Processing (NLP) applications. In this sense, we will pay particular attention to those Knowledge Management Systems (KMS) which are designed for the Internet, and also to the enhanced solutions they may require. In order to appropriately deal with this topics, we will describe how to achieve computational linguistics applications helpful to humans in establishing and maintaining an advantageous relationship with technologies, especially with those technologies which are based on or produce man-machine interactions in natural language. We will explore the positive relationship which may exist between well-structured Linguistic Resources (LR) and KMS, in order to state that if the information architecture of a KMS is based on the formalization of linguistic data, then the system works better and is more consistent. As for the topics we want to deal with, frist of all it is indispensable to state that in order to structure efficient and effective Information Retrieval (IR) tools, understanding and formalizing natural language combinatory mechanisms seems to be the first operation to achieve, also because any piece of information produced by humans on the Internet is necessarily a linguistic act. Therefore, in this research work we will also discuss the NLP structuring of a linguistic formalization Hybrid Model, which we hope will prove to be a useful tool to support, improve and refine KMSs. More specifically, in section 1 we will describe how to structure language resources implementable inside KMSs, to what extent they can improve the performance of these systems and how the problem of linguistic data structuring is dealt with by natural language formalization methods. In section 2 we will proceed with a brief review of computational linguistics, paying particular attention to specific software packages such Intex, Unitex, NooJ, and Cataloga, which are developed according to Lexicon-Grammar (LG) method, a linguistic theory established during the 60’s by Maurice Gross. In section 3 we will describe some specific works useful to monitor the state of the art in Linguistic Data Structuring Models, Enhanced Solutions for KMSs, and NLP Applications for KMSs. In section 4 we will cope with problems related to natural language formalization methods, describing mainly Transformational-Generative Grammar (TGG) and LG, plus other methods based on statistical approaches and ontologies. In section 5 we will propose a Hybrid Model usable in NLP applications in order to create effective enhanced solutions for KMSs. Specific features and elements of our hybrid model will be shown through some results on experimental research work. The case study we will present is a very complex NLP problem yet little explored in recent years, i.e. Multi Word Units (MWUs) treatment. In section 6 we will close our research evaluating its results and presenting possible future work perspectives. [edited by author]
X n.s.
Acosta, Andrew D. "Laff-O-Tron: Laugh Prediction in TED Talks." DigitalCommons@CalPoly, 2016. https://digitalcommons.calpoly.edu/theses/1667.
Full textRamponi, Alan. "Knowledge Extraction from Biomedical Literature with Symbolic and Deep Transfer Learning Methods." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/310787.
Full textRamponi, Alan. "Knowledge Extraction from Biomedical Literature with Symbolic and Deep Transfer Learning Methods." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/310787.
Full textLauly, Stanislas. "Exploration des réseaux de neurones à base d'autoencodeur dans le cadre de la modélisation des données textuelles." Thèse, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/9461.
Full textPiscaglia, Nicola. "Deep Learning for Natural Language Processing: Novel State-of-the-art Solutions in Summarisation of Legal Case Reports." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20342/.
Full textPanesar, Kulvinder. "Functional linguistic based motivations for a conversational software agent." Cambridge Scholars Publishing, 2019. http://hdl.handle.net/10454/18134.
Full textThis chapter discusses a linguistically orientated model of a conversational software agent (CSA) (Panesar 2017) framework sensitive to natural language processing (NLP) concepts and the levels of adequacy of a functional linguistic theory (LT). We discuss the relationship between NLP and knowledge representation (KR), and connect this with the goals of a linguistic theory (Van Valin and LaPolla 1997), in particular Role and Reference Grammar (RRG) (Van Valin Jr 2005). We debate the advantages of RRG and consider its fitness and computational adequacy. We present a design of a computational model of the linking algorithm that utilises a speech act construction as a grammatical object (Nolan 2014a, Nolan 2014b) and the sub-model of belief, desire and intentions (BDI) (Rao and Georgeff 1995). This model has been successfully implemented in software, using the resource description framework (RDF), and we highlight some implementation issues that arose at the interface between language and knowledge representation (Panesar 2017).
The full-text of this article will be released for public view at the end of the publisher embargo on 27 Sep 2024.
Packer, Thomas L. "Surface Realization Using a Featurized Syntactic Statistical Language Model." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1195.pdf.
Full textSidås, Albin, and Simon Sandberg. "Conversational Engine for Transportation Systems." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176810.
Full textMrkšić, Nikola. "Data-driven language understanding for spoken dialogue systems." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276689.
Full textEriksson, Patrik, and Philip Wester. "Granskning av examensarbetesrapporter med IBM Watson molntjänster." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232057.
Full textMolntjänster är ett av de områden som utvecklas snabbast idag. Företag såsom Amazon, Google, Microsoft och IBM tillhandahåller dessa tjänster i flera former. Allteftersom utvecklingen tar fart, uppstår den naturliga frågan ”Vad kan man göra med den här tekniken idag?”. Tekniken erbjuder en skalbarhet mot använd hårdvara och antalet användare, som är attraktiv för utvecklare och företag. Det här examensarbetet försöker svara på hur molntjänster kan användas genom att kombinera det med frågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare?”. Genom att avgränsa undersökningen till IBM Watson molntjänster försöker arbetet huvudsakligen svara på huvudfrågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare med Watson molntjänster?”. Därmed var målet med arbetet att skapa en automatiserad examensarbetesrapportsgranskare. Projektet följde en modifierad version av Bunge’s teknologiska undersökningsmetod, där det första steget var att skapa en definition för en mjukvaruexamensarbetesrapportsgranskare följt av en utredning av de Watson molntjänster som ansågs relevanta från litteratur studien. Dessa undersöktes sedan vidare i empirisk studie. Genom de empiriska studierna skapades förståelse för tjänsternas tillämpligheter och begränsningar, för att kunna kartlägga hur de kan användas i en automatiserad examensarbetsrapportsgranskare. De flesta tjänster behandlades grundligt, förutom Machine Learning, som skulle behövt vidare undersökning om inte tidsresurserna tog slut. Projektet visar på att Watson molntjänster är användbara men inte perfekt anpassade för att granska examensarbetesrapporter. Även om inte målet uppnåddes, undersöktes Watson molntjänster, vilket kan ge förståelse för deras användbarhet och framtida implementationer för att möta den skapade definitionen.
Baglodi, Venkatesh. "A Feature Structure Approach for Disambiguating Preposition Senses." NSUWorks, 2009. http://nsuworks.nova.edu/gscis_etd/83.
Full textMurray, Jonathan. "Finding Implicit Citations in Scientific Publications : Improvements to Citation Context Detection Methods." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-173913.
Full textDetta examensarbete behandlar frågan om att hitta implicita citeringar mellan vetenskapliga publikationer. Förutom att vara intressanta på egen hand kan dessa citeringar användas inom andra problem, såsom att bedöma en författares inställning till en referens eller att sammanfatta en rapport utifrån hur den har blivit citerad av andra. Vi utgår från två nyliga metoder, en maskininlärningsbaserad klassificerare och en iterativ algoritm baserad på en grafmodell. Dessa implementeras och utvärderas på en gemensam förannoterad datamängd. Ett antal förändringar till algoritmerna presenteras i form av nya särdrag hos meningarna (eng. sentence features), olika semantiska textlikhetsmått och ett sätt att kombinera de två metoderna. Arbetets huvudsakliga resultat är att de nya meningssärdragen leder till anmärkningsvärt förbättrade F-värden för de båda metoderna.