Siga este enlace para ver otros tipos de publicaciones sobre el tema: Natural language processing analysis.

Tesis sobre el tema "Natural language processing analysis"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores tesis para su investigación sobre el tema "Natural language processing analysis".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Woldemariam, Yonas Demeke. "Natural language processing in cross-media analysis". Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-147640.

Texto completo
Resumen
A cross-media analysis framework is an integrated multi-modal platform where a media resource containing different types of data such as text, images, audio and video is analyzed with metadata extractors, working jointly to contextualize the media resource. It generally provides cross-media analysis and automatic annotation, metadata publication and storage, searches and recommendation services. For on-line content providers, such services allow them to semantically enhance a media resource with the extracted metadata representing the hidden meanings and make it more efficiently searchable. Within the architecture of such frameworks, Natural Language Processing (NLP) infrastructures cover a substantial part. The NLP infrastructures include text analysis components such as a parser, named entity extraction and linking, sentiment analysis and automatic speech recognition. Since NLP tools and techniques are originally designed to operate in isolation, integrating them in cross-media frameworks and analyzing textual data extracted from multimedia sources is very challenging. Especially, the text extracted from audio-visual content lack linguistic features that potentially provide important clues for text analysis components. Thus, there is a need to develop various techniques to meet the requirements and design principles of the frameworks. In our thesis, we explore developing various methods and models satisfying text and speech analysis requirements posed by cross-media analysis frameworks. The developed methods allow the frameworks to extract linguistic knowledge of various types and predict various information such as sentiment and competence. We also attempt to enhance the multilingualism of the frameworks by designing an analysis pipeline that includes speech recognition, transliteration and named entity recognition for Amharic, that also enables the accessibility of Amharic contents on the web more efficiently. The method can potentially be extended to support other under-resourced languages.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Shepherd, David. "Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools /". Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 176 p, 2007. http://proquest.umi.com/pqdweb?did=1397920371&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Ramachandran, Venkateshwaran. "A temporal analysis of natural language narrative text". Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Li, Wenhui. "Sentiment analysis: Quantitative evaluation of subjective opinions using natural language processing". Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/28000.

Texto completo
Resumen
Sentiment Analysis consists of recognizing sentiment orientation towards specific subjects within natural language texts. Most research in this area focuses on classifying documents as positive or negative. The purpose of this thesis is to quantitatively evaluate subjective opinions of customer reviews using a five star rating system, which is widely used on on-line review web sites, and to try to make the predicted score as accurate as possible. Firstly, this thesis presents two methods for rating reviews: classifying reviews by supervised learning methods as multi-class classification does, or rating reviews by using association scores of sentiment terms with a set of seed words extracted from the corpus, i.e. the unsupervised learning method. We extend the feature selection approach used in Turney's PMI-IR estimation by introducing semantic relatedness measures based up on the content of WordNet. This thesis reports on experiments using the two methods mentioned above for rating reviews using the combined feature set enriched with WordNet-selected sentiment terms. The results of these experiments suggest ways in which incorporating WordNet relatedness measures into feature selection may yield improvement over classification and unsupervised learning methods which do not use it. Furthermore, via ordinal meta-classifiers, we utilize the ordering information contained in the scores of bank reviews to improve the performance, we explore the effectiveness of re-sampling for reducing the problem of skewed data, and we check whether discretization benefits the ordinal meta-learning process. Finally, we combine the unsupervised and supervised meta-learning methods to optimize performance on our sentiment prediction task.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Keller, Thomas Anderson. "Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing". Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.

Texto completo
Resumen

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Patil, Supritha Basavaraj. "Analysis of Moving Events Using Tweets". Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/90884.

Texto completo
Resumen
The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date.
Master of Science
News now travels faster on social media than through news channels. Information from social media can help retrieve minute details that might not be emphasized in news. People tend to describe their actions or sentiments in tweets. I aim at studying if such collections of tweets are dependable sources for identifying paths of moving events. In events like hurricanes, using Twitter can help in analyzing people’s reaction to such moving events. These may include actions such as dislocation or emotions during different phases of the event. The results obtained in the experiments concur with the actual path of the events with respect to the regions affected and time. The frequency of tweets increases during event peaks. The number of locations affected that are identified are significantly more than in news wires.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Giménez, Fayos María Teresa. "Natural Language Processing using Deep Learning in Social Media". Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/172164.

Texto completo
Resumen
[ES] En los últimos años, los modelos de aprendizaje automático profundo (AP) han revolucionado los sistemas de procesamiento de lenguaje natural (PLN). Hemos sido testigos de un avance formidable en las capacidades de estos sistemas y actualmente podemos encontrar sistemas que integran modelos PLN de manera ubicua. Algunos ejemplos de estos modelos con los que interaccionamos a diario incluyen modelos que determinan la intención de la persona que escribió un texto, el sentimiento que pretende comunicar un tweet o nuestra ideología política a partir de lo que compartimos en redes sociales. En esta tesis se han propuestos distintos modelos de PNL que abordan tareas que estudian el texto que se comparte en redes sociales. En concreto, este trabajo se centra en dos tareas fundamentalmente: el análisis de sentimientos y el reconocimiento de la personalidad de la persona autora de un texto. La tarea de analizar el sentimiento expresado en un texto es uno de los problemas principales en el PNL y consiste en determinar la polaridad que un texto pretende comunicar. Se trata por lo tanto de una tarea estudiada en profundidad de la cual disponemos de una vasta cantidad de recursos y modelos. Por el contrario, el problema del reconocimiento de personalidad es una tarea revolucionaria que tiene como objetivo determinar la personalidad de los usuarios considerando su estilo de escritura. El estudio de esta tarea es más marginal por lo que disponemos de menos recursos para abordarla pero que no obstante presenta un gran potencial. A pesar de que el enfoque principal de este trabajo fue el desarrollo de modelos de aprendizaje profundo, también hemos propuesto modelos basados en recursos lingüísticos y modelos clásicos del aprendizaje automático. Estos últimos modelos nos han permitido explorar las sutilezas de distintos elementos lingüísticos como por ejemplo el impacto que tienen las emociones en la clasificación correcta del sentimiento expresado en un texto. Posteriormente, tras estos trabajos iniciales se desarrollaron modelos AP, en particular, Redes neuronales convolucionales (RNC) que fueron aplicadas a las tareas previamente citadas. En el caso del reconocimiento de la personalidad, se han comparado modelos clásicos del aprendizaje automático con modelos de aprendizaje profundo, pudiendo establecer una comparativa bajo las mismas premisas. Cabe destacar que el PNL ha evolucionado drásticamente en los últimos años gracias al desarrollo de campañas de evaluación pública, donde múltiples equipos de investigación comparan las capacidades de los modelos que proponen en las mismas condiciones. La mayoría de los modelos presentados en esta tesis fueron o bien evaluados mediante campañas de evaluación públicas, o bien emplearon la configuración de una campaña pública previamente celebrada. Siendo conscientes, por lo tanto, de la importancia de estas campañas para el avance del PNL, desarrollamos una campaña de evaluación pública cuyo objetivo era clasificar el tema tratado en un tweet, para lo cual recogimos y etiquetamos un nuevo conjunto de datos. A medida que avanzabamos en el desarrollo del trabajo de esta tesis, decidimos estudiar en profundidad como las RNC se aplicaban a las tareas de PNL. En este sentido, se exploraron dos líneas de trabajo. En primer lugar, propusimos un método de relleno semántico para RNC, que plantea una nueva manera de representar el texto para resolver tareas de PNL. Y en segundo lugar, se introdujo un marco teórico para abordar una de las críticas más frecuentes del aprendizaje profundo, el cual es la falta de interpretabilidad. Este marco busca visualizar qué patrones léxicos, si los hay, han sido aprendidos por la red para clasificar un texto.
[CA] En els últims anys, els models d'aprenentatge automàtic profund (AP) han revolucionat els sistemes de processament de llenguatge natural (PLN). Hem estat testimonis d'un avanç formidable en les capacitats d'aquests sistemes i actualment podem trobar sistemes que integren models PLN de manera ubiqua. Alguns exemples d'aquests models amb els quals interaccionem diàriament inclouen models que determinen la intenció de la persona que va escriure un text, el sentiment que pretén comunicar un tweet o la nostra ideologia política a partir del que compartim en xarxes socials. En aquesta tesi s'han proposats diferents models de PNL que aborden tasques que estudien el text que es comparteix en xarxes socials. En concret, aquest treball se centra en dues tasques fonamentalment: l'anàlisi de sentiments i el reconeixement de la personalitat de la persona autora d'un text. La tasca d'analitzar el sentiment expressat en un text és un dels problemes principals en el PNL i consisteix a determinar la polaritat que un text pretén comunicar. Es tracta per tant d'una tasca estudiada en profunditat de la qual disposem d'una vasta quantitat de recursos i models. Per contra, el problema del reconeixement de la personalitat és una tasca revolucionària que té com a objectiu determinar la personalitat dels usuaris considerant el seu estil d'escriptura. L'estudi d'aquesta tasca és més marginal i en conseqüència disposem de menys recursos per abordar-la però no obstant i això presenta un gran potencial. Tot i que el fouc principal d'aquest treball va ser el desenvolupament de models d'aprenentatge profund, també hem proposat models basats en recursos lingüístics i models clàssics de l'aprenentatge automàtic. Aquests últims models ens han permès explorar les subtileses de diferents elements lingüístics com ara l'impacte que tenen les emocions en la classificació correcta del sentiment expressat en un text. Posteriorment, després d'aquests treballs inicials es van desenvolupar models AP, en particular, Xarxes neuronals convolucionals (XNC) que van ser aplicades a les tasques prèviament esmentades. En el cas de el reconeixement de la personalitat, s'han comparat models clàssics de l'aprenentatge automàtic amb models d'aprenentatge profund la qual cosa a permet establir una comparativa de les dos aproximacions sota les mateixes premisses. Cal remarcar que el PNL ha evolucionat dràsticament en els últims anys gràcies a el desenvolupament de campanyes d'avaluació pública on múltiples equips d'investigació comparen les capacitats dels models que proposen sota les mateixes condicions. La majoria dels models presentats en aquesta tesi van ser o bé avaluats mitjançant campanyes d'avaluació públiques, o bé s'ha emprat la configuració d'una campanya pública prèviament celebrada. Sent conscients, per tant, de la importància d'aquestes campanyes per a l'avanç del PNL, vam desenvolupar una campanya d'avaluació pública on l'objectiu era classificar el tema tractat en un tweet, per a la qual cosa vam recollir i etiquetar un nou conjunt de dades. A mesura que avançàvem en el desenvolupament del treball d'aquesta tesi, vam decidir estudiar en profunditat com les XNC s'apliquen a les tasques de PNL. En aquest sentit, es van explorar dues línies de treball.En primer lloc, vam proposar un mètode d'emplenament semàntic per RNC, que planteja una nova manera de representar el text per resoldre tasques de PNL. I en segon lloc, es va introduir un marc teòric per abordar una de les crítiques més freqüents de l'aprenentatge profund, el qual és la falta de interpretabilitat. Aquest marc cerca visualitzar quins patrons lèxics, si n'hi han, han estat apresos per la xarxa per classificar un text.
[EN] In the last years, Deep Learning (DL) has revolutionised the potential of automatic systems that handle Natural Language Processing (NLP) tasks. We have witnessed a tremendous advance in the performance of these systems. Nowadays, we found embedded systems ubiquitously, determining the intent of the text we write, the sentiment of our tweets or our political views, for citing some examples. In this thesis, we proposed several NLP models for addressing tasks that deal with social media text. Concretely, this work is focused mainly on Sentiment Analysis and Personality Recognition tasks. Sentiment Analysis is one of the leading problems in NLP, consists of determining the polarity of a text, and it is a well-known task where the number of resources and models proposed is vast. In contrast, Personality Recognition is a breakthrough task that aims to determine the users' personality using their writing style, but it is more a niche task with fewer resources designed ad-hoc but with great potential. Despite the fact that the principal focus of this work was on the development of Deep Learning models, we have also proposed models based on linguistic resources and classical Machine Learning models. Moreover, in this more straightforward setup, we have explored the nuances of different language devices, such as the impact of emotions in the correct classification of the sentiment expressed in a text. Afterwards, DL models were developed, particularly Convolutional Neural Networks (CNNs), to address previously described tasks. In the case of Personality Recognition, we explored the two approaches, which allowed us to compare the models under the same circumstances. Noteworthy, NLP has evolved dramatically in the last years through the development of public evaluation campaigns, where multiple research teams compare the performance of their approaches under the same conditions. Most of the models here presented were either assessed in an evaluation task or either used their setup. Recognising the importance of this effort, we curated and developed an evaluation campaign for classifying political tweets. In addition, as we advanced in the development of this work, we decided to study in-depth CNNs applied to NLP tasks. Two lines of work were explored in this regard. Firstly, we proposed a semantic-based padding method for CNNs, which addresses how to represent text more appropriately for solving NLP tasks. Secondly, a theoretical framework was introduced for tackling one of the most frequent critics of Deep Learning: interpretability. This framework seeks to visualise what lexical patterns, if any, the CNN is learning in order to classify a sentence. In summary, the main achievements presented in this thesis are: - The organisation of an evaluation campaign for Topic Classification from texts gathered from social media. - The proposal of several Machine Learning models tackling the Sentiment Analysis task from social media. Besides, a study of the impact of linguistic devices such as figurative language in the task is presented. - The development of a model for inferring the personality of a developer provided the source code that they have written. - The study of Personality Recognition tasks from social media following two different approaches, models based on machine learning algorithms and handcrafted features, and models based on CNNs were proposed and compared both approaches. - The introduction of new semantic-based paddings for optimising how the text was represented in CNNs. - The definition of a theoretical framework to provide interpretable information to what CNNs were learning internally.
Giménez Fayos, MT. (2021). Natural Language Processing using Deep Learning in Social Media [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172164
TESIS
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Gorrell, Genevieve. "Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing". Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1045s.pdf.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Marzo, i. Grimalt Núria. "Natural Language Processing Model for Log Analysis to Retrieve Solutions For Troubleshooting Processes". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300042.

Texto completo
Resumen
In the telecommunications industry, one of the most time-consuming tasks is troubleshooting and the resolution of Trouble Report (TR) tickets. This task involves the understanding of textual data which can be challenging due to its domain- and company-specific features. The text contains many abbreviations, typos, tables as well as numerical information. This work tries to solve the issue of retrieving solutions for new troubleshooting reports in an automated way by using a Natural Language Processing (NLP) model, in particular Bidirectional Encoder Representations from Transformers (BERT)- based approaches. It proposes a text ranking model that, given a description of a fault, can rank the best possible solutions to that problem using answers from past TRs. The model tackles the trade-off between accuracy and latency by implementing a multi-stage BERT-based architecture with an initial retrieval stage and a re-ranker stage. Having a model that achieves a desired accuracy under a latency constraint allows it to be suited for industry applications. The experiments to evaluate the latency and the accuracy of the model have been performed on Ericsson’s troubleshooting dataset. The evaluation of the proposed model suggest that it is able to retrieve and re-rank solution for TRs with a significant improvement compared to a non-BERT model.
En av de mest tidskrävande uppgifterna inom telekommunikationsindustrin är att felsöka och hitta lösningar till felrapporter (TR). Denna uppgift kräver förståelse av textdata, som försvåras as att texten innehåller företags- och domänspecifika attribut. Texten innehåller typiskt sett många förkortningar, felskrivningar och tabeller blandat med numerisk information. Detta examensarbete ämnar att förenkla inhämtningen av lösningar av nya felsökningar på ett automatiserat sätt med hjälp av av naturlig språkbehandling (NLP), specifikt modeller baserade på dubbelriktad kodrepresentation (BERT). Examensarbetet föreslår en textrankningsmodell som, givet en felbeskrivning, kan rangordna de bästa möjliga lösningarna till felet baserat på tidigare felsökningar. Modellen hanterar avvägningen mellan noggrannhet och fördröjning genom att implementera den dubbelriktade kodrepresentationen i två faser: en initial inhämtningsfas och en omordningsfas. För industrianvändning krävs att modellen uppnår en given noggrannhet med en viss tidsbegränsning. Experimenten för att utvärdera noggrannheten och fördröjningen har utförts på Ericssons felsökningsdata. Utvärderingen visar att den föreslagna modellen kan hämta och omordna data för felsökningar med signifikanta förbättringar gentemot modeller utan dubbelriktad kodrepresentation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Mc, Kevitt Paul. "Analysing coherence of intention in natural language dialogue". Thesis, University of Exeter, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.303991.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Crocker, Matthew Walter. "A principle-based system for natural language analysis and translation". Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27863.

Texto completo
Resumen
Traditional views of grammatical theory hold that languages are characterised by sets of constructions. This approach entails the enumeration of all possible constructions for each language being described. Current theories of transformational generative grammar have established an alternative position. Specifically, Chomsky's Government-Binding theory proposes a system of principles which are common to human language. Such a theory is referred to as a "Universal Grammar"(UG). Associated with the principles of grammar are parameters of variation which account for the diversity of human languages. The grammar for a particular language is known as a "Core Grammar", and is characterised by an appropriately parametrised instance of UG. Despite these advances in linguistic theory, construction-based approaches have remained the status quo within the field of natural language processing. This thesis investigates the possibility of developing a principle-based system which reflects the modular nature of the linguistic theory. That is, rather than stipulating the possible constructions of a language, a system is developed which uses the principles of grammar and language specific parameters to parse language. Specifically, a system-is presented which performs syntactic analysis and translation for a subset of English and German. The cross-linguistic nature of the theory is reflected by the system which can be considered a procedural model of UG.
Science, Faculty of
Computer Science, Department of
Graduate
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Tempfli, Peter. "Preprocessing method comparison and model tuning for natural language data". Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34438.

Texto completo
Resumen
Twitter and other microblogging services are a valuable source for almost real-time marketing, public opinion and brand-related consumer information mining. As such, collection and analysis of user-generated natural language content is in the focus of research regarding automated sentiment analysis. The most successful approach in the field is supervised machine learning, where the three key problems are data cleaning and transformation, feature generation and model choice and training parameter selection. Papers in recent years thoroughly examined the field and there is a agreement that relatively simple techniques as bag-of-words transformation of text and a naive bayes models can generate acceptable results (between 75% and 85% percent F1-scores for an average dataset) and fine tuning can be really difficult and yields relatively small results. However, a few percent in performance even on a middle-size dataset can mean thousands of better classified documents, which can mean thousands of missed sales or angry customers in any business domain. Thus this work presents and demonstrates a framework for better tailored, fine-tuned models for analysing twitter data. The experiments show that Naive Bayes classifiers with domain specific stopword selection work the best (up to 88% F1-score), however the performance dramatically decreases if the data is unbalanced or the classes are not binary. Filtering stopwords is crucial to increase prediction performance; and the experiment shows that a stopword set should be domain-specific. The conclusion is that there is no one best way for model training and stopword selection in sentiment analysis. Thus the work suggests that there is space for using a comparison framework to fine-tune prediction models to a given problem: such a comparison framework should compare different training settings on the same dataset, so the best trained models can be found for a given real-life problem.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Mazidi, Karen. "Infusing Automatic Question Generation with Natural Language Understanding". Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc955021/.

Texto completo
Resumen
Automatically generating questions from text for educational purposes is an active research area in natural language processing. The automatic question generation system accompanying this dissertation is MARGE, which is a recursive acronym for: MARGE automatically reads generates and evaluates. MARGE generates questions from both individual sentences and the passage as a whole, and is the first question generation system to successfully generate meaningful questions from textual units larger than a sentence. Prior work in automatic question generation from text treats a sentence as a string of constituents to be rearranged into as many questions as allowed by English grammar rules. Consequently, such systems overgenerate and create mainly trivial questions. Further, none of these systems to date has been able to automatically determine which questions are meaningful and which are trivial. This is because the research focus has been placed on NLG at the expense of NLU. In contrast, the work presented here infuses the questions generation process with natural language understanding. From the input text, MARGE creates a meaning analysis representation for each sentence in a passage via the DeconStructure algorithm presented in this work. Questions are generated from sentence meaning analysis representations using templates. The generated questions are automatically evaluated for question quality and importance via a ranking algorithm.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Lindén, Johannes. "Huvudtitel: Understand and Utilise Unformatted Text Documents by Natural Language Processing algorithms". Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-31043.

Texto completo
Resumen
News companies have a need to automate and make the editors process of writing about hot and new events more effective. Current technologies involve robotic programs that fills in values in templates and website listeners that notifies the editors when changes are made so that the editor can read up on the source change at the actual website. Editors can provide news faster and better if directly provided with abstracts of the external sources. This study applies deep learning algorithms to automatically formulate abstracts and tag sources with appropriate tags based on the context. The study is a full stack solution, which manages both the editors need for speed and the training, testing and validation of the algorithms. Decision Tree, Random Forest, Multi Layer Perceptron and phrase document vectors are used to evaluate the categorisation and Recurrent Neural Networks is used to paraphrase unformatted texts. In the evaluation a comparison between different models trained by the algorithms with a variation of parameters are done based on the F-score. The results shows that the F-scores are increasing the more document the training has and decreasing the more categories the algorithm needs to consider. The Multi-Layer Perceptron perform best followed by Random Forest and finally Decision Tree. The document length matters, when larger documents are considered during training the score is increasing considerably. A user survey about the paraphrase algorithms shows the paraphrase result is insufficient to satisfy editors need. It confirms a need for more memory to conduct longer experiments.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Banea, Carmen. "Extrapolating Subjectivity Research to Other Languages". Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc271777/.

Texto completo
Resumen
Socrates articulated it best, "Speak, so I may see you." Indeed, language represents an invisible probe into the mind. It is the medium through which we express our deepest thoughts, our aspirations, our views, our feelings, our inner reality. From the beginning of artificial intelligence, researchers have sought to impart human like understanding to machines. As much of our language represents a form of self expression, capturing thoughts, beliefs, evaluations, opinions, and emotions which are not available for scrutiny by an outside observer, in the field of natural language, research involving these aspects has crystallized under the name of subjectivity and sentiment analysis. While subjectivity classification labels text as either subjective or objective, sentiment classification further divides subjective text into either positive, negative or neutral. In this thesis, I investigate techniques of generating tools and resources for subjectivity analysis that do not rely on an existing natural language processing infrastructure in a given language. This constraint is motivated by the fact that the vast majority of human languages are scarce from an electronic point of view: they lack basic tools such as part-of-speech taggers, parsers, or basic resources such as electronic text, annotated corpora or lexica. This severely limits the implementation of techniques on par with those developed for English, and by applying methods that are lighter in the usage of text processing infrastructure, we are able to conduct multilingual subjectivity research in these languages as well. Since my aim is also to minimize the amount of manual work required to develop lexica or corpora in these languages, the techniques proposed employ a lever approach, where English often acts as the donor language (the fulcrum in a lever) and allows through a relatively minimal amount of effort to establish preliminary subjectivity research in a target language.
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Sunil, Kamalakar FNU. "Automatically Generating Tests from Natural Language Descriptions of Software Behavior". Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/23907.

Texto completo
Resumen
Behavior-Driven Development (BDD) is an emerging agile development approach where all stakeholders (including developers and customers) work together to write user stories in structured natural language to capture a software application's functionality in terms of re- quired "behaviors". Developers then manually write "glue" code so that these scenarios can be executed as software tests. This glue code represents individual steps within unit and acceptance test cases, and tools exist that automate the mapping from scenario descriptions to manually written code steps (typically using regular expressions). Instead of requiring programmers to write manual glue code, this thesis investigates a practical approach to con- vert natural language scenario descriptions into executable software tests fully automatically. To show feasibility, we developed a tool called Kirby that uses natural language processing techniques, code information extraction and probabilistic matching to automatically gener- ate executable software tests from structured English scenario descriptions. Kirby relieves the developer from the laborious work of writing code for the individual steps described in scenarios, so that both developers and customers can both focus on the scenarios as pure behavior descriptions (understandable to all, not just programmers). Results from assessing the performance and accuracy of this technique are presented.
Master of Science
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Henriksson, Jimmy y Carl Hultberg. "Public Sentiment on Twitter and Stock Performance : A Study in Natural Language Processing". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259984.

Texto completo
Resumen
Since recent years, the use of non-traditional data sources by hedge funds in order to support investment decisions has increased. One of the data sources which has increased most is social media and it has become popular to analyze the public opinion with help of sentiment analysis in order to predict the performance of a company. In order to evaluate the public opinion one need big sets of Twitter data. The Twitter data was collected by streaming the Twitter feed and the stock data was collected from a Bloomberg Terminal. The aim of this study was to examine if there is a correlation between the public opinion of a stock and the stock price, and also what affects this relationship. While such a relationship cannot be established in general, we are able to show that if the data quality is good, there is a high correlation between the public opinion and stock price, and that significant events surrounding the company results in a higher correlation during that period.
De senaste åren har användandet av icke-traditionella datakällor ökat av hedgefonder för att ta investeringsbeslut. En av datakällorna som blivit populära är sociala medier och det har blivit vanligt att analysera folkopinionen med hjälp av sentimentanalys för att kunna förutspå ett företags resultat. För att analysera folkopinionen krävdes stora mängder Twitterdata. Twitter-datan hämtades genom att strömma Twitter-flödet och aktiedatan hämtades från en Bloomberg Terminal. Målet med studien var att undersöka ifall det finns en korrelation mellan folkopinionen av en aktie och aktiens prisutveckling, och även vad som påverkar denna relationen. Även om en sådan relation inte kan fastställas i allmänhet så kan vi visa att om datakvaliten är god, så finns det en hög korrelation mellan folkopinionen och aktiepriset, samt att vid betydande händelser som rör företaget, så resultar det i en hög korrelation under den perioden.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Riehl, Sean K. "Property Recommendation System with Geospatial Data Analytics and Natural Language Processing for Urban Land Use". Cleveland State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=csu1590513674513905.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Jochim, Charles [Verfasser] y Hinrich [Akademischer Betreuer] Schütze. "Natural language processing and information retrieval methods for intellectual property analysis / Charles Jochim. Betreuer: Hinrich Schütze". Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2014. http://d-nb.info/1064308643/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Holmes, Wesley J. "Topological Analysis of Averaged Sentence Embeddings". Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1609351352688467.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Zhan, Tianjie. "Semantic analysis for extracting fine-grained opinion aspects". HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1213.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Johansson, David. "Applicability analysis of computation double entendre humor recognition with machine learning methods". Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-12413.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Andersson, Ludwig. "Natural Language Processing In A Distributed Environment : A comparative performance analysis of Apache Spark and Hadoop MapReduce". Thesis, Umeå universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-126865.

Texto completo
Resumen
A big majority of the data hosted on the internet today is in natural text and therefore understanding natural language and how to effectively process and analyzing text has become a big part of data mining. Natural Language Processing has many applications in fields such as business intelligence and security purposes.The problem with natural language text processing and analyzing is the computational power needed to perform the actual processing, performance of personal computer has not kept up with amounts of data that needs to be processed so another approach with good performance scaling potential is needed.This study does a preliminary comparative performance analysis of processing natural language text in an distributed environment using two popular open-source frameworks, Hadoop MapReduce and Apache Spark.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Lee, Wing Kuen. "Interpreting tables in text using probabilistic two-dimensional context-free grammars /". View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LEEW.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Cannon, Paul C. "Extending the information partition function : modeling interaction effects in highly multivariate, discrete data /". Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2263.pdf.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Björner, Amanda. "Natural Language Processing techniques for feedback on text improvement : A qualitative study on press releases". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301303.

Texto completo
Resumen
Press releases play a key role in today’s news production by being public statements of newsworthy content that function as a pre-formulation of news. Press releases originate from a wide range of actors, and a common goal is for them to reach a high societal impact. This thesis examines how Natural Language Processing (NLP) techniques can be successful in giving feedback to press release authors that help enhance the content and quality of their texts. This could, in turn, contribute to increased impact. To examine this, the research question is divided into two parts. The first part examines how content-perception feedback can contribute to improving press releases. This is examined by the development of a web tool where user- written press releases get analyzed. The analysis consists of a readability assessment using the LIX metric and linguistic bias detection of weasel words and peacock words through rule-based sentiment analysis. The user experiences and opinions are evaluated through an online questionnaire and semi-structured interviews. The second part of the research question examines how trending topic information can contribute to improving press releases. This part is examined theoretically based on a literature review of state-of-the- art methods and qualitatively by gathering opinions from press release authors in the previously mentioned questionnaire and interviews. Based on the results, it is identified that for content-perception feedback, it is especially lesser experienced authors and scientific content aimed at the general public that would achieve improved text quality from objective readability assessment and detection of biased expressions. Nevertheless, most of the evaluation participants were more satisfied with their press releases after editing based on the readability feedback, and all participants with biased words in their texts reported that the detection led to positive changes resulting in improved text quality. As for the theoretical part, it is considered that both text quality and the number of publications increase when writing about trending topics. To give authors trending topic information on a detailed level is indicated to be the most helpful.
Aktörer som sträcker sig från privata företag till mydigheter och forskare använder pressmeddelanden för att offentligt delge information med nyhetsvärde. Dessa pressmeddelanden spelar därefter en nyckelroll i dagens nyhetsproduktion genom att förformulera nyheter och eftersträvar därför att hålla en viss språklig nivå. För att förbättra kvalitet och innehåll i pressmeddelanden undersöker detta examensarbete hur språkteknologisk textanalys och återkoppling till författare kan stödja dem i att förbättra sina texter. Denna frågeställning undersöks i två delar, en tillämpad del och en teoretisk del. Den tillämpade delen undersöker hur återkoppling kring innehållsuppfattning kan förbättra pressmeddelanden. Ett webb-baserat verktyg utvecklades där användare kan skriva in pressmeddelanden och få dessa analyserade. Analysen baseras på läsbarhet som bedöms med hjälp av måttet LIX samt språklig bias (partiska uttryck) i form av weasel words (vessleord) och peacock words (påfågelord) som detekteras genom regelbaserad sentimentanalys. Denna del utvärderades kvalitativt genom en enkätundersökning till användarna samt djupintervjuer. Den teoretiska delen av frågeställningen undersöker hur information om trendande ämnen kan bidra till att förbättra pressmeddelanden. Undersökningen genomfördes som en litteraturstudie och utvärderades kvalitativt genom att sammanställa åsikter från yrkesverksamma som arbetar med pressmeddelanden i enkätundersökningen och djupintervjuerna som beskrevs ovan. Resultaten indikerar att för feedback om innehållsuppfattning är det särskilt mindre erfarna författare och vetenskapligt innehåll riktat till allmänheten som skulle uppnå förbättrad textkvalitet till följd av läsbarhetsbedömning och upptäckt av partiska uttryck. Samtidigt var en majoritet av deltagarna i utvärderingen mer nöjda med sina pressmeddelanden efter redigering baserat på läsbarhetsfeedbacken. Dessutom rapporterade alla deltagare med partiska uttryck i sina texter att upptäckten ledde till positiva förändringar som resulterade i förbättrad textkvalitet. Gällande den teoretiska delen anses både textkvaliteten och antalet publikationer öka för pressmeddelnanden om trendande ämnen. Att ge författare information om trendande ämnen på en detaljerad nivå indikeras vara det mest hjälpsamma.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Gorinski, Philip John. "Automatic movie analysis and summarisation". Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31053.

Texto completo
Resumen
Automatic movie analysis is the task of employing Machine Learning methods to the field of screenplays, movie scripts, and motion pictures to facilitate or enable various tasks throughout the entirety of a movie’s life-cycle. From helping with making informed decisions about a new movie script with respect to aspects such as its originality, similarity to other movies, or even commercial viability, all the way to offering consumers new and interesting ways of viewing the final movie, many stages in the life-cycle of a movie stand to benefit from Machine Learning techniques that promise to reduce human effort, time, or both. Within this field of automatic movie analysis, this thesis addresses the task of summarising the content of screenplays, enabling users at any stage to gain a broad understanding of a movie from greatly reduced data. The contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale data set of original movie scripts, annotated with additional meta-information such as genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script- Base is the largest data set of its kind, containing scripts and information for almost 1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the screenplay domain, which allows for extraction of highly informative and important scenes from movie scripts. The extracted summaries allow for the content of the original script to stay largely intact and provide the user with its important parts, while greatly reducing the script-reading time. (iii) We extend our summarisation model to capture additional modalities beyond the screenplay text. The model is rendered multi-modal by introducing visual information obtained from the actual movie and by extracting scenes from the movie, allowing users to generate visual summaries of motion pictures. (iv) We devise a novel end-to-end neural network model for generating natural language screenplay overviews. This model enables the user to generate short descriptive and informative texts that capture certain aspects of a movie script, such as its genres, approximate content, or style, allowing them to gain a fast, high-level understanding of the screenplay. Multiple automatic and human evaluations were carried out to assess the performance of our models, demonstrating that they are well-suited for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the ScriptBase data set has started to gain traction, and is currently used by a number of other researchers in the field to tackle various tasks relating to screenplays and their analysis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Wong, Jimmy Pui Fung. "The use of prosodic features in Chinese speech recognition and spoken language processing /". View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20WONG.

Texto completo
Resumen
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Eglowski, Skylar. "CREATE: Clinical Record Analysis Technology Ensemble". DigitalCommons@CalPoly, 2017. https://digitalcommons.calpoly.edu/theses/1771.

Texto completo
Resumen
In this thesis, we describe an approach that won a psychiatric symptom severity prediction challenge. The challenge was to correctly predict the severity of psychiatric symptoms on a 4-point scale. Our winning submission uses a novel stacked machine learning architecture in which (i) a base data ingestion/cleaning step was followed by the (ii) derivation of a base set of features defined using text analytics, after which (iii) association rule learning was used in a novel way to generate new features, followed by a (iv) feature selection step to eliminate irrelevant features, followed by a (v) classifier training algorithm in which a total of 22 classifiers including new classifier variants of AdaBoost and RandomForest were trained on seven different data views, and (vi) finally an ensemble learning step, in which ensembles of best learners were used to improve on the accuracy of individual learners. All of this was tested via standard 10-fold cross-validation on training data provided by the N-GRID challenge organizers, of which the three best ensembles were selected for submission to N-GRID's blind testing. The best of our submitted solutions garnered an overall final score of 0.863 according to the organizer's measure. All 3 of our submissions placed within the top 10 out of the 65 total submissions. The challenge constituted Track 2 of the 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) Shared Task in Clinical Natural Language Processing.
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Keshtkar, Fazel. "A Computational Approach to the Analysis and Generation of Emotion in Text". Thèse, Université d'Ottawa / University of Ottawa, 2011. http://hdl.handle.net/10393/20137.

Texto completo
Resumen
Sentiment analysis is a field of computational linguistics involving identification, extraction, and classification of opinions, sentiments, and emotions expressed in natural language. Sentiment classification algorithms aim to identify whether the author of a text has a positive or a negative opinion about a topic. One of the main indicators which help to detect the opinion are the words used in the texts. Needless to say, the sentiments expressed in the texts also depend on the syntactic structure and the discourse context. Supervised machine learning approaches to sentiment classification were shown to achieve good results. Classifying texts by emotions requires finer-grained analysis than sentiment classification. In this thesis, we explore the task of emotion and mood classification for blog postings. We propose a novel approach that uses the hierarchy of possible moods to achieve better results than a standard flat classification approach. We also show that using sentiment orientation features improves the performance of classification. We used the LiveJournal blog corpus as a dataset to train and evaluate our method. Another contribution of this work is extracting paraphrases for emotion terms based on the six basics emotions proposed by Ekman (\textit{happiness, anger, sadness, disgust, surprise, fear}). Paraphrases are different ways to express the same information. Algorithms to extract and automatically identify paraphrases are of interest from both linguistic and practical points of view. Our paraphrase extraction method is based on a bootstrapping algorithms that starts with seed words. Unlike in previous work, our algorithm does not need a parallel corpus. In Natural Language Generation (NLG), paraphrasing is employed to create more varied and natural text. In our research, we extract paraphrases for emotions, with the goal of using them to automatically generate emotional texts (such as friendly or hostile texts) for conversations between intelligent agents and characters in educational games. Nowadays, online services are popular in many disciplines such as: e-learning, interactive games, educational games, stock market, chat rooms and so on. NLG methods can be used in order to generate more interesting and normal texts for such applications. Generating text with emotions is one of the contributions of our work. In the last part of this thesis, we give an overview of NLG from an applied system's points of view. We discuss when NLG techniques can be used; we explained the requirements analysis and specification of NLG systems. We also, describe the main NLG tasks of content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation, and linguistic realisation. Moreover, we describe our Authoring Tool that we developed in order to allow writers without programming skills to automatically generate texts for educational games. We develop an NLG system that can generate text with different emotions. To do this, we introduce our pattern-based model for generation. We show our model starts with initial patterns, then constructs extended patterns from which we choose ``final'' patterns that are suitable for generating emotion sentences. A user can generate sentences to express the desired emotions by using our patterns. Alternatively, the user can use our Authoring Tool to generate sentences with emotions. Our acquired paraphrases will be employed by the tool in order to generate more varied outputs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Shen, Mo. "Exploiting Vocabulary, Morphological, and Subtree Knowledge to Improve Chinese Syntactic Analysis". 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215675.

Texto completo
Resumen
In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Kyoto University's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.
Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第19848号
情博第599号
新制||情||104(附属図書館)
32884
京都大学大学院情報学研究科知能情報学専攻
(主査)准教授 河原 大輔, 教授 黒橋 禎夫, 教授 鹿島 久嗣
学位規則第4条第1項該当
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Vlas, Radu. "A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework". Digital Archive @ GSU, 2012. http://digitalarchive.gsu.edu/cis_diss/48.

Texto completo
Resumen
Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects, error-prone. Automated analysis of natural language requirements, even partial, will be of great benefit. Towards that end, I describe the design and validation of an automated natural language requirements classifier for open source software development projects. I compare two strategies for recognizing requirements in open forums of software features. The results suggest that classifying text at the forum post aggregation and sentence aggregation levels may be effective. Initial results suggest that it can reduce the effort required to analyze requirements of open source software development projects. Software development organizations and communities currently employ a large number of software development techniques and methodologies. This implied complexity is also enhanced by a wide range of software project types and development environments. The resulting lack of consistency in the software development domain leads to one important challenge that researchers encounter while exploring this area: specificity. This results in an increased difficulty of maintaining a consistent unit of measure or analysis approach while exploring a wide variety of software development projects and environments. The problem of specificity is more prominently exhibited in an area of software development characterized by a dynamic evolution, a unique development environment, and a relatively young history of research when compared to traditional software development: the open-source domain. While performing research on open source and the associated communities of developers, one can notice the same challenge of specificity being present in requirements engineering research as in the case of closed-source software development. Whether research is aimed at performing longitudinal or cross-sectional analyses, or attempts to link requirements to other aspects of software development projects and their management, specificity calls for a flexible analysis tool capable of adapting to the needs and specifics of the explored context. This dissertation covers the design, implementation, and evaluation of a model, a method, and a software tool comprising a flexible software development analysis framework. These design artifacts use a rule-based natural language processing approach and are built to meet the specifics of a requirements-based analysis of software development projects in the open-source domain. This research follows the principles of design science research as defined by Hevner et. al. and includes stages of problem awareness, suggestion, development, evaluation, and results and conclusion (Hevner et al. 2004; Vaishnavi and Kuechler 2007). The long-term goal of the research stream stemming from this dissertation is to propose a flexible, customizable, requirements-based natural language processing software analysis framework which can be adapted to meet the research needs of multiple different types of domains or different categories of analyses.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Dagerman, Björn. "Semantic Analysis of Natural Language and Definite Clause Grammar using Statistical Parsing and Thesauri". Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26142.

Texto completo
Resumen
Services that rely on the semantic computations of users’ natural linguistic inputs are becoming more frequent. Computing semantic relatedness between texts is problematic due to the inherit ambiguity of natural language. The purpose of this thesis was to show how a sentence could be compared to a predefined semantic Definite Clause Grammar (DCG). Furthermore, it should show how a DCG-based system could benefit from such capabilities. Our approach combines openly available specialized NLP frameworks for statistical parsing, part-of-speech tagging and word-sense disambiguation. We compute the semantic relatedness using a large lexical and conceptual-semantic thesaurus. Also, we extend an existing programming language for multimodal interfaces, which uses static predefined DCGs: COactive Language Definition (COLD). That is, every word that should be acceptable by COLD needs to be explicitly defined. By applying our solution, we show how our approach can remove dependencies on word definitions and improve grammar definitions in DCG-based systems.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Jin, Gongye. "High-quality Knowledge Acquisition of Predicate-argument Structures for Syntactic and Semantic Analysis". 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/215677.

Texto completo
Resumen
If the author of the published paper digitizes such paper and releases it to third parties using digital media such as computer networks or CD-ROMs, the volume, number, and pages of the Journal of Natural Language Processing of the publication must be indicated in a clear manner for all viewers.
Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第19850号
情博第601号
新制||情||105(附属図書館)
32886
京都大学大学院情報学研究科知能情報学専攻
(主査)准教授 河原 大輔, 教授 黒橋 禎夫, 教授 河原 達也
学位規則第4条第1項該当
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Yang, Jianji. "Automatic summarization of mouse gene information for microarray analysis by functional gene clustering and ranking of sentences in MEDLINE abstracts : a dissertation". Oregon Health & Science University, 2007. http://content.ohsu.edu/u?/etd,643.

Texto completo
Resumen
Ph.D.
Medical Informatics and Clinical Epidemiology
Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. Even though several useful human-curated databases of information about genes already exist, these have significant limitations. First, their construction requires intensive human labor. Second, curation of genes lags behind the rapid publication rate of new research and discoveries. Finally, most of the curated knowledge is limited to information on single genes. As such, most original and up-to-date knowledge on genes can only be found in the immense amount of unstructured, free text biomedical literature. Genomic researchers frequently encounter the task of finding information on sets of differentially expressed genes from the results of common highthroughput technologies like microarray experiments. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. For example, PubMed, the first choice of literature research for biologists, usually returns hundreds of references for a search on a single gene in reverse chronological order. Therefore, a tool to summarize the available textual information on genes could be a valuable tool for scientists. In this study, we adapted automatic summarization technologies to the biomedical domain to build a query-based, task-specific automatic summarizer of information on mouse genes studied in microarray experiments - mouse Gene Information Clustering and Summarization System (GICSS). GICSS first clusters a set of differentially expressed genes by Medical Subject Heading (MeSH), Gene Ontology (GO), and free text features into functionally similar groups;next it presents summaries for each gene as ranked sentences extracted from MEDLINE abstracts, with the ranking emphasizing the relation between genes, similarity to the function cluster it belongs to, and recency. GICSS is available as a web application with links to the PubMed (www.pubmed.gov) website for each extracted sentence. It integrates two related steps, functional gene clustering and gene information gathering, of the microarray data analysis process. The information from the clustering step was used to construct the context for summarization. The evaluation of the system was conducted with scientists who were analyzing their real microarray datasets. The evaluation results showed that GICSS can provide meaningful clusters for real users in the genomic research area. In addition, the results also indicated that presenting sentences in the abstract can provide more important information to the user than just showing the title in the default PubMed format. Both domain-specific and non-domain-specific terminologies contributed in the informative sentences selection. Summarization may serve as a useful tool to help scientists to access information at the time of microarray data analysis. Further research includes setting up the automatic update of MEDLINE records; extending and fine-tuning of the feature parameters for sentence scoring using the available evaluation data; and expanding GICSS to incorporate textual information from other species. Finally, dissemination and integration of GICSS into the current workflow of the microarray analysis process will help to make GICSS a truly useful tool for the targeted users, biomedical genomics researchers.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Norsten, Theodor. "Exploring the Potential of Twitter Data and Natural Language Processing Techniques to Understand the Usage of Parks in Stockholm". Thesis, KTH, Geoinformatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278532.

Texto completo
Resumen
Traditional methods used to investigate the usage of parks consists of questionnaire which is both a very time- and- resource consuming method. Today more than four billion people daily use some form of social media platform. This has led to the creation of huge amount of data being generated every day through various social media platforms and has created a potential new source for retrieving large amounts of data. This report will investigate a modern approach, using Natural Language Processing on Twitter data to understand how parks in Stockholm being used. Natural Language Processing (NLP) is an area within artificial intelligence and is referred to the process to read, analyze, and understand large amount of text data and is considered to be the future for understanding unstructured text. Twitter data were obtained through Twitters open API. Data from three parks in Stockholm were collected between the periods 2015-2019. Three analysis were then performed, temporal, sentiment, and topic modeling analysis. The results from the above analysis show that it is possible to understand what attitudes and activities are associated with visiting parks using NLP on social media data. It is clear that sentiment analysis is a difficult task for computers to solve and it is still in an early stage of development. The results from the sentiment analysis indicate some uncertainties. To achieve more reliable results, the analysis would consist of much more data, more thorough cleaning methods and be based on English tweets. One significant conclusion given the results is that people’s attitudes and activities linked to each park are clearly correlated with the different attributes each park consists of. Another clear pattern is that the usage of parks significantly peaks during holiday celebrations and positive sentiments are the most strongly linked emotion with park visits. Findings suggest future studies to focus on combining the approach in this report with geospatial data based on a social media platform were users share their geolocation to a greater extent.
Traditionella metoder använda för att förstå hur människor använder parker består av frågeformulär, en mycket tids -och- resurskrävande metod. Idag använder mer en fyra miljarder människor någon form av social medieplattform dagligen. Det har inneburit att enorma datamängder genereras dagligen via olika sociala media plattformar och har skapat potential för en ny källa att erhålla stora mängder data. Denna undersöker ett modernt tillvägagångssätt, genom användandet av Natural Language Processing av Twitter data för att förstå hur parker i Stockholm används. Natural Language Processing (NLP) är ett område inom artificiell intelligens och syftar till processen att läsa, analysera och förstå stora mängder textdata och anses vara framtiden för att förstå ostrukturerad text. Data från Twitter inhämtades via Twitters öppna API. Data från tre parker i Stockholm erhölls mellan perioden 2015–2019. Tre analyser genomfördes därefter, temporal, sentiment och topic modeling. Resultaten från ovanstående analyser visar att det är möjligt att förstå vilka attityder och aktiviteter som är associerade med att besöka parker genom användandet av NLP baserat på data från sociala medier. Det är tydligt att sentiment analys är ett svårt problem för datorer att lösa och är fortfarande i ett tidigt skede i utvecklingen. Resultaten från sentiment analysen indikerar några osäkerheter. För att uppnå mer tillförlitliga resultat skulle analysen bestått av mycket mer data, mer exakta metoder för data rensning samt baserats på tweets skrivna på engelska. En tydlig slutsats från resultaten är att människors attityder och aktiviteter kopplade till varje park är tydligt korrelerat med de olika attributen respektive park består av. Ytterligare ett tydligt mönster är att användandet av parker är som högst under högtider och att positiva känslor är starkast kopplat till park-besök. Resultaten föreslår att framtida studier fokuserar på att kombinera metoden i denna rapport med geospatial data baserat på en social medieplattform där användare delar sin platsinfo i större utsträckning.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Silva, João. "Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization". Master's thesis, Department of Informatics, University of Lisbon, 2007. http://hdl.handle.net/10451/14016.

Texto completo
Resumen
This dissertation proposes a set of procedures for the computational processing of Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Part-of-Speech Tagging, Nominal Featurization and Nominal Lemmatization. These are some of the initial steps producing linguistic information Ñ such as POS categories or lemmas Ñ that is important to most subsequent processing (e.g. syntactic and semantic analysis). I follow a shallow processing approach, where linguistic information is associated to text based on local information (i.e. using the word itself or perhaps a limited window of context containing just a few words). I begin by identifying and describing the key problems raised by each task, with special focus on the problems that are speci?c to Portuguese. After an overview of existing approaches and tools, I describe the solutions I followed to the issues raised previously. I then report on my implementation of these solutions, which are found either to yield state-of-the-art performance or, in some cases, to advance the state-of-the-art. The major result of this dissertation is thus threefold: A description of the problems found in NLP of Portuguese, a set of algorithms and the corresponding tools to tackle those problems, together with their evaluation results
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Nilsson, Ludvig y Olle Djerf. "How to improve Swedish sentiment polarityclassification using context analysis". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446382.

Texto completo
Resumen
This thesis considers sentiment polarity analysis in Swedish. De-spite being the most widely spoken of the Nordic languages less re-search in sentiment has been conducted in this area compared toneighboring languages. As such this is a largely exploratory projectusing techniques that have shown positive results for other languages.We perform a comparison of techniques applied to a CNN to existingSwedish and multilingual variations of the state of the art BERTmodel. We find that the preprocessing techniques do in fact bene-fit our CNN model, but still do not match the results of fine-tuned BERT models. We conclude that a Swedish specific BERT modelcan outperform the generic multilingual ones, but only under certainconditions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Janevski, Angel. "UniversityIE: Information Extraction From University Web Pages". UKnowledge, 2000. http://uknowledge.uky.edu/gradschool_theses/217.

Texto completo
Resumen
The amount of information available on the web is growing constantly. As a result, theproblem of retrieving any desired information is getting more difficult by the day. Toalleviate this problem, several techniques are currently being used, both for locatingpages of interest and for extracting meaningful information from the retrieved pages.Information extraction (IE) is one such technology that is used for summarizingunrestricted natural language text into a structured set of facts. IE is already being appliedwithin several domains such as news transcripts, insurance information, and weatherreports. Various approaches to IE have been taken and a number of significant resultshave been reported.In this thesis, we describe the application of IE techniques to the domain of universityweb pages. This domain is broader than previously evaluated domains and has a varietyof idiosyncratic problems to address. We present an analysis of the domain of universityweb pages and the consequences of having them input to IE systems. We then presentUniversityIE, a system that can search a web site, extract relevant pages, and processthem for information such as admission requirements or general information. TheUniversityIE system, developed as part of this research, contributes three IE methods anda web-crawling heuristic that worked relatively well and predictably over a test set ofuniversity web sites.We designed UniversityIE as a generic framework for plugging in and executing IEmethods over pages acquired from the web. We also integrated in the system a genericweb crawler (built at the University of Kentucky) and ported to Java and integrated anexternal word lexicon (WordNet) and a syntax parser (Link Grammar Parser).
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Passos, Alexandre Tachard 1986. "Combinatorial algorithms and linear programming for inference in natural language processing = Algoritmos combinatórios e de programação linear para inferência em processamento de linguagem natural". [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275609.

Texto completo
Resumen
Orientador: Jacques Wainer
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-24T00:42:33Z (GMT). No. of bitstreams: 1 Passos_AlexandreTachard_D.pdf: 2615030 bytes, checksum: 93841a46120b968f6da6c9aea28953b7 (MD5) Previous issue date: 2013
Resumo: Em processamento de linguagem natural, e em aprendizado de máquina em geral, é comum o uso de modelos gráficos probabilísticos (probabilistic graphical models). Embora estes modelos sejam muito convenientes, possibilitando a expressão de relações complexas entre várias variáveis que se deseja prever dado uma sentença ou um documento, algoritmos comuns de aprendizado e de previsão utilizando estes modelos são frequentemente ineficientes. Por isso têm-se explorado recentemente o uso de relaxações usando programação linear deste problema de inferência. Esta tese apresenta duas contribuições para a teoria e prática de relaxações de programação linear para inferência em modelos probabilísticos gráficos. Primeiro, apresentamos um novo algoritmo, baseado na técnica de geração de colunas (dual à técnica dos planos de corte) que acelera a execução do algoritmo de Viterbi, a técnica mais utilizada para inferência em modelos lineares. O algoritmo apresentado também se aplica em modelos que são árvores e em hipergrafos. Em segundo mostramos uma nova relaxação linear para o problema de inferência conjunta, quando se quer acoplar vários modelos, em cada qual inferência é eficiente, mas em cuja junção inferência é NP-completa. Esta tese propõe uma extensão à técnica de decomposição dual (dual decomposition) que permite além de juntar vários modelos a adição de fatores que tocam mais de um submodelo eficientemente
Abstract: In natural language processing, and in general machine learning, probabilistic graphical models (and more generally structured linear models) are commonly used. Although these models are convenient, allowing the expression of complex relationships between many random variables one wants to predict given a document or sentence, most learning and prediction algorithms for general models are inefficient. Hence there has recently been interest in using linear programming relaxations for the inference tasks necessary when learning or applying these models. This thesis presents two contributions to the theory and practice of linear programming relaxations for inference in structured linear models. First we present a new algorithm, based on column generation (a technique which is dual to the cutting planes method) to accelerate the Viterbi algorithm, the most popular exact inference technique for linear-chain graphical models. The method is also applicable to tree graphical models and hypergraph models. Then we present a new linear programming relaxation for the problem of joint inference, when one has many submodels and wants to predict using all of them at once. In general joint inference is NP-complete, but algorithms based on dual decomposition have proven to be efficiently applicable for the case when the joint model can be expressed as many separate models plus linear equality constraints. This thesis proposes an extension to dual decomposition which allows also the presence of factors which score parts that belong in different submodels, improving the expressivity of dual decomposition at no extra computational cost
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Currin, Aubrey Jason. "Text data analysis for a smart city project in a developing nation". Thesis, University of Fort Hare, 2015. http://hdl.handle.net/10353/2227.

Texto completo
Resumen
Increased urbanisation against the backdrop of limited resources is complicating city planning and management of functions including public safety. The smart city concept can help, but most previous smart city systems have focused on utilising automated sensors and analysing quantitative data. In developing nations, using the ubiquitous mobile phone as an enabler for crowdsourcing of qualitative public safety reports, from the public, is a more viable option due to limited resources and infrastructure limitations. However, there is no specific best method for the analysis of qualitative text reports for a smart city in a developing nation. The aim of this study, therefore, is the development of a model for enabling the analysis of unstructured natural language text for use in a public safety smart city project. Following the guidelines of the design science paradigm, the resulting model was developed through the inductive review of related literature, assessed and refined by observations of a crowdsourcing prototype and conversational analysis with industry experts and academics. The content analysis technique was applied to the public safety reports obtained from the prototype via computer assisted qualitative data analysis software. This has resulted in the development of a hierarchical ontology which forms an additional output of this research project. Thus, this study has shown how municipalities or local government can use CAQDAS and content analysis techniques to prepare large quantities of text data for use in a smart city.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Erogul, Umut. "Sentiment Analysis In Turkish". Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12610616/index.pdf.

Texto completo
Resumen
Sentiment analysis is the automatic classification of a text, trying to determine the attitude of the writer with respect to a specific topic. The attitude may be either their judgment or evaluation, their feelings or the intended emotional communication. The recent increase in the use of review sites and blogs, has made a great amount of subjective data available. Nowadays, it is nearly impossible to manually process all the relevant data available, and as a consequence, the importance given to the automatic classification of unformatted data, has increased. Up to date, all of the research carried on sentiment analysis was focused on English language. In this thesis, two Turkish datasets tagged with sentiment information is introduced and existing methods for English are applied on these datasets. This thesis also suggests new methods for Turkish sentiment analysis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Sanagavarapu, Krishna Chaitanya. "Determining Whether and When People Participate in the Events They Tweet About". Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984235/.

Texto completo
Resumen
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Sobhani, Parinaz. "Stance Detection and Analysis in Social Media". Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36180.

Texto completo
Resumen
Computational approaches to opinion mining have mostly focused on polarity detection of product reviews by classifying the given text as positive, negative or neutral. While, there is less effort in the direction of socio-political opinion mining to determine favorability towards given targets of interest, particularly for social media data like news comments and tweets. In this research, we explore the task of automatically determining from the text whether the author of the text is in favor of, against, or neutral towards a proposition or target. The target may be a person, an organization, a government policy, a movement, a product, etc. Moreover, we are interested in detecting the reasons behind authors’ positions. This thesis is organized into three main parts: the first part on Twitter stance detection and interaction of stance and sentiment labels, the second part on detecting stance and the reasons behind it in online news comments, and the third part on multi-target stance classification. One may express favor (or disfavor) towards a target by using positive or negative language. Here, for the first time, we present a dataset of tweets annotated for whether the tweeter is in favor of or against pre-chosen targets, as well as for sentiment. These targets may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. We develop a simple stance detection system that outperforms all 19 teams that participated in a recent shared task competition on the same dataset (SemEval-2016 Task #6). Additionally, access to both stance and sentiment annotations allows us to conduct several experiments to tease out their interactions. Next, we proposed a novel framework for joint learning of stance and reasons behind it. This framework relies on topic modeling. Unlike other machine learning approaches for argument tagging which often require a large set of labeled data, our approach is minimally supervised. The extracted arguments are subsequently employed for stance classification. Furthermore, we create and make available the first dataset of online news comments manually annotated for stance and arguments. Experiments on this dataset demonstrate the benefits of using topic modeling, particularly Non-Negative Matrix Factorization, for argument detection. Previous models for stance classification often treat each target independently, ignoring the potential (sometimes very strong) dependency that could exist among targets. However, in many applications, there exist natural dependencies among targets. In this research, we relieve such independence assumptions in order to jointly model the stance expressed towards multiple targets. We present a new dataset that we built for this task and make it publicly available. Next, we show that an attention-based encoder-decoder framework is very effective for this problem, outperforming several alternatives that jointly learn dependent subjectivity through cascading classification or multi-task learning.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Pérez-Rosas, Verónica. "Exploration of Visual, Acoustic, and Physiological Modalities to Complement Linguistic Representations for Sentiment Analysis". Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699996/.

Texto completo
Resumen
This research is concerned with the identification of sentiment in multimodal content. This is of particular interest given the increasing presence of subjective multimodal content on the web and other sources, which contains a rich and vast source of people's opinions, feelings, and experiences. Despite the need for tools that can identify opinions in the presence of diverse modalities, most of current methods for sentiment analysis are designed for textual data only, and few attempts have been made to address this problem. The dissertation investigates techniques for augmenting linguistic representations with acoustic, visual, and physiological features. The potential benefits of using these modalities include linguistic disambiguation, visual grounding, and the integration of information about people's internal states. The main goal of this work is to build computational resources and tools that allow sentiment analysis to be applied to multimodal data. This thesis makes three important contributions. First, it shows that modalities such as audio, video, and physiological data can be successfully used to improve existing linguistic representations for sentiment analysis. We present a method that integrates linguistic features with features extracted from these modalities. Features are derived from verbal statements, audiovisual recordings, thermal recordings, and physiological sensors signals. The resulting multimodal sentiment analysis system is shown to significantly outperform the use of language alone. Using this system, we were able to predict the sentiment expressed in video reviews and also the sentiment experienced by viewers while exposed to emotionally loaded content. Second, the thesis provides evidence of the portability of the developed strategies to other affect recognition problems. We provided support for this by studying the deception detection problem. Third, this thesis contributes several multimodal datasets that will enable further research in sentiment and deception detection.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Eckart, de Castilho Richard [Verfasser], Iryna [Akademischer Betreuer] Gurevych, Andreas [Akademischer Betreuer] Henrich y Christopher D. [Akademischer Betreuer] Manning. "Natural Language Processing: Integration of Automatic and Manual Analysis / Richard Eckart de Castilho. Betreuer: Iryna Gurevych ; Andreas Henrich ; Christopher D. Manning". Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2014. http://d-nb.info/1110979118/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Yeates, Stuart Andrew. "Text Augmentation: Inserting markup into natural language text with PPM Models". The University of Waikato, 2006. http://hdl.handle.net/10289/2600.

Texto completo
Resumen
This thesis describes a new optimisation and new heuristics for automatically marking up XML documents, and CEM, a Java implementation, using PPM models. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BibTeX system and marked up in XML with every field from the original BibTeX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists' Communique corpus and the Reuters' corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked up documents. The performance of the new heuristics and optimisation are examined using the four corpora.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Rahgozar, Arya. "Automatic Poetry Classification and Chronological Semantic Analysis". Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40516.

Texto completo
Resumen
The correction, authentication, validation and identification of the original texts in Hafez’s poetry among 16 or so old versions of his Divan has been a challenge for scholars. The semantic analysis of poetry with modern Digital Humanities techniques is also challenging. Analyzing latent semantics is more challenging in poetry than in prose for evident reasons, such as conciseness, imaginary and metaphorical constructions. Hafez’s poetry is, on the one hand, cryptic and complex because of his era’s restricting social properties and censorship impediments, and on the other hand, sophisticated because of his encapsulation of high-calibre world-views, mystical and philosophical attributes, artistically knitted within majestic decorations. Our research is strongly influenced by and is a continuation of, Mahmoud Houman’s instrumental and essential chronological classification of ghazals by Hafez. Houman’s chronological classification method (Houman, 1938), which we have adopted here, provides guidance to choose the correct version of Hafez’s poem among multiple manuscripts. Houman’s semantic analysis of Hafez’s poetry is unique in that the central concept of his classification is based on intelligent scrutiny of meanings, careful observation the evolutionary psychology of Hafez through his remarkable body of work. Houman’s analysis has provided the annotated data for the classification algorithms we will develop to classify the poems. We pursue to understand Hafez through the Houman’s perspective. In addition, we asked a contemporary expert to annotate Hafez ghazals (Raad, 2019). The rationale behind our research is also to satisfy the need for more efficient means of scholarly research, and to bring literature and computer science together as much as possible. Our research will support semantic analysis, and help with the design and development of tools for poetry research. We have developed a digital corpus of Hafez’s ghazals and applied proper word forms and punctuation. We digitized and extended chronological criteria to guide the correction and validation of Hafez’s poetry. To our knowledge, no automatic chronological classification has been conducted for Hafez poetry. Other than the meticulous preparation of our bilingual Hafez corpus for computational use, the innovative aspect of our classification research is two-fold. The first objective of our work is to develop semantic features to better train automatic classifiers for annotated poems and to apply the classifiers to unannotated poems, which is to classify the rest of the poems by applying machine learning (ML) methodology. The second task is to extract semantic information and properties to help design a visualization scheme to assist with providing a link between the prediction’s rationale and Houman’s perception of Hafez’s chronological properties of Hafez’s poetry. We identified and used effective Natural Language Processing (NLP) techniques such as classification, word-embedding features, and visualization to facilitate and automate semantic analysis of Hafez’s poetry. We defined and applied rigorous and repeatable procedures that can potentially be applied to other kinds of poetry. We showed that the chronological segments identified automatically were coherent. We presented and compared two independent chronological labellings of Hafez’s ghazals in digital form, pro- duced their ontologies and explained the inter-annotator-agreement and distributional semantic properties using relevant NLP techniques to help guide future corrections, authentication, and interpretation of Hafez’s poetry. Chronological labelling of the whole corpus not only helps better understand Hafez’s poetry, but it is a rigorous guide to better recognition of the correct versions of Hafez’s poems among multiple manuscripts. Such a small volume of complex poetic text required careful selection when choosing and developing appropriate ML techniques for the task. Through many classification and clustering experiments, we have achieved state-of-the-art prediction of chronological poems, trained and evaluated against our hand-made Hafez corpus. Our selected classification algorithm was a Support Vector Machine (SVM), trained with Latent Dirichlet Allocation (LDA)-based similarity features. We used clustering to produce an alternative perspective to classification. For our visualization methodology, we used the LDA features but also passed the results to a Principal Component Analysis (PCA) module to reduce the number of dimensions to two, thereby enabling graphical presentations. We believe that applying this method to poetry classifications, and showing the topic relations between poems in the same classes, will help us better understand the interrelated topics within the poems. Many of our methods can potentially be used in similar cases in which the intention is to semantically classify poetry.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Karlin, Ievgen. "An Evaluation of NLP Toolkits for Information Quality Assessment". Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-22606.

Texto completo
Resumen
Documentation is often the first source, which can help user to solve problems or provide conditions of use of some product. That is why it should be clear and understandable. But what does “understandable” mean? And how to detect whether some text is unclear? And this thesis can answer on those questions.The main idea of current work is to measure clarity of the text information using natural language processing capabilities. There are three global steps to achieve this goal: to define criteria of bad clarity of text information, to evaluate different natural language toolkits and find suitable for us, and to implement a prototype system that, given a text, measures text clarity.Current thesis project is planned to be included to VizzAnalyzer (quality analysis tool, which processes information on structure level) and its main task is to perform a clarity analysis of text information extracted by VizzAnalyzer from different XML-files.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Alsehaimi, Afnan Abdulrahman A. "Sentiment Analysis for E-book Reviews on Amazon to Determine E-book Impact Rank". University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619109972210567.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía