Thèses sur le sujet « Multivariate analysis. Natural language processing (Computer science) »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 47 meilleures thèses pour votre recherche sur le sujet « Multivariate analysis. Natural language processing (Computer science) ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.

1

Cannon, Paul C. « Extending the information partition function : modeling interaction effects in highly multivariate, discrete data / ». Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2263.pdf.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Shepherd, David. « Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools / ». Access to citation, abstract and download form provided by ProQuest Information and Learning Company ; downloadable PDF file, 176 p, 2007. http://proquest.umi.com/pqdweb?did=1397920371&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
3

Li, Wenhui. « Sentiment analysis : Quantitative evaluation of subjective opinions using natural language processing ». Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/28000.

Texte intégral
Résumé :
Sentiment Analysis consists of recognizing sentiment orientation towards specific subjects within natural language texts. Most research in this area focuses on classifying documents as positive or negative. The purpose of this thesis is to quantitatively evaluate subjective opinions of customer reviews using a five star rating system, which is widely used on on-line review web sites, and to try to make the predicted score as accurate as possible. Firstly, this thesis presents two methods for rating reviews: classifying reviews by supervised learning methods as multi-class classification does, or rating reviews by using association scores of sentiment terms with a set of seed words extracted from the corpus, i.e. the unsupervised learning method. We extend the feature selection approach used in Turney's PMI-IR estimation by introducing semantic relatedness measures based up on the content of WordNet. This thesis reports on experiments using the two methods mentioned above for rating reviews using the combined feature set enriched with WordNet-selected sentiment terms. The results of these experiments suggest ways in which incorporating WordNet relatedness measures into feature selection may yield improvement over classification and unsupervised learning methods which do not use it. Furthermore, via ordinal meta-classifiers, we utilize the ordering information contained in the scores of bank reviews to improve the performance, we explore the effectiveness of re-sampling for reducing the problem of skewed data, and we check whether discretization benefits the ordinal meta-learning process. Finally, we combine the unsupervised and supervised meta-learning methods to optimize performance on our sentiment prediction task.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Keller, Thomas Anderson. « Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing ». Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.

Texte intégral
Résumé :

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

Styles APA, Harvard, Vancouver, ISO, etc.
5

Ramachandran, Venkateshwaran. « A temporal analysis of natural language narrative text ». Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
6

Crocker, Matthew Walter. « A principle-based system for natural language analysis and translation ». Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27863.

Texte intégral
Résumé :
Traditional views of grammatical theory hold that languages are characterised by sets of constructions. This approach entails the enumeration of all possible constructions for each language being described. Current theories of transformational generative grammar have established an alternative position. Specifically, Chomsky's Government-Binding theory proposes a system of principles which are common to human language. Such a theory is referred to as a "Universal Grammar"(UG). Associated with the principles of grammar are parameters of variation which account for the diversity of human languages. The grammar for a particular language is known as a "Core Grammar", and is characterised by an appropriately parametrised instance of UG. Despite these advances in linguistic theory, construction-based approaches have remained the status quo within the field of natural language processing. This thesis investigates the possibility of developing a principle-based system which reflects the modular nature of the linguistic theory. That is, rather than stipulating the possible constructions of a language, a system is developed which uses the principles of grammar and language specific parameters to parse language. Specifically, a system-is presented which performs syntactic analysis and translation for a subset of English and German. The cross-linguistic nature of the theory is reflected by the system which can be considered a procedural model of UG.
Science, Faculty of
Computer Science, Department of
Graduate
Styles APA, Harvard, Vancouver, ISO, etc.
7

Holmes, Wesley J. « Topological Analysis of Averaged Sentence Embeddings ». Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1609351352688467.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
8

Lee, Wing Kuen. « Interpreting tables in text using probabilistic two-dimensional context-free grammars / ». View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LEEW.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
9

Zhan, Tianjie. « Semantic analysis for extracting fine-grained opinion aspects ». HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1213.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Currin, Aubrey Jason. « Text data analysis for a smart city project in a developing nation ». Thesis, University of Fort Hare, 2015. http://hdl.handle.net/10353/2227.

Texte intégral
Résumé :
Increased urbanisation against the backdrop of limited resources is complicating city planning and management of functions including public safety. The smart city concept can help, but most previous smart city systems have focused on utilising automated sensors and analysing quantitative data. In developing nations, using the ubiquitous mobile phone as an enabler for crowdsourcing of qualitative public safety reports, from the public, is a more viable option due to limited resources and infrastructure limitations. However, there is no specific best method for the analysis of qualitative text reports for a smart city in a developing nation. The aim of this study, therefore, is the development of a model for enabling the analysis of unstructured natural language text for use in a public safety smart city project. Following the guidelines of the design science paradigm, the resulting model was developed through the inductive review of related literature, assessed and refined by observations of a crowdsourcing prototype and conversational analysis with industry experts and academics. The content analysis technique was applied to the public safety reports obtained from the prototype via computer assisted qualitative data analysis software. This has resulted in the development of a hierarchical ontology which forms an additional output of this research project. Thus, this study has shown how municipalities or local government can use CAQDAS and content analysis techniques to prepare large quantities of text data for use in a smart city.
Styles APA, Harvard, Vancouver, ISO, etc.
11

Li, Jie. « Intention-driven textual semantic analysis ». School of Computer Science and Software Engineering, 2008. http://ro.uow.edu.au/theses/104.

Texte intégral
Résumé :
The explosion of World Wide Web has brought endless amount of information within our reach. In order to take advantage of this phenomenon, text search becomes a major contemporary research challenge. Due to the nature of the Web, assisting users to find desired information is still a challenging task. In this thesis, we investigate semantic anlaysis techniques which can facilitate the search process at semantic level. We also study the problem that short queries are less informative and difficult to convey the user's intention into the search service system. We propose a generalized framework to address these issues. We conduct a case study of movie plot search in which a semantic analyzer seamlessly works with a user's intention detector. Our experimental results show the importance and effectiveness of intention detection and semantic analysis techniques.
Styles APA, Harvard, Vancouver, ISO, etc.
12

Riehl, Sean K. « Property Recommendation System with Geospatial Data Analytics and Natural Language Processing for Urban Land Use ». Cleveland State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=csu1590513674513905.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
13

Smith, Andrew Edward. « Development of a practical system for text content analysis and mining / ». [St. Lucia, Qld.], 2002. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe17847.pdf.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
14

Wong, Jimmy Pui Fung. « The use of prosodic features in Chinese speech recognition and spoken language processing / ». View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20WONG.

Texte intégral
Résumé :
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
Styles APA, Harvard, Vancouver, ISO, etc.
15

Pérez-Rosas, Verónica. « Exploration of Visual, Acoustic, and Physiological Modalities to Complement Linguistic Representations for Sentiment Analysis ». Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699996/.

Texte intégral
Résumé :
This research is concerned with the identification of sentiment in multimodal content. This is of particular interest given the increasing presence of subjective multimodal content on the web and other sources, which contains a rich and vast source of people's opinions, feelings, and experiences. Despite the need for tools that can identify opinions in the presence of diverse modalities, most of current methods for sentiment analysis are designed for textual data only, and few attempts have been made to address this problem. The dissertation investigates techniques for augmenting linguistic representations with acoustic, visual, and physiological features. The potential benefits of using these modalities include linguistic disambiguation, visual grounding, and the integration of information about people's internal states. The main goal of this work is to build computational resources and tools that allow sentiment analysis to be applied to multimodal data. This thesis makes three important contributions. First, it shows that modalities such as audio, video, and physiological data can be successfully used to improve existing linguistic representations for sentiment analysis. We present a method that integrates linguistic features with features extracted from these modalities. Features are derived from verbal statements, audiovisual recordings, thermal recordings, and physiological sensors signals. The resulting multimodal sentiment analysis system is shown to significantly outperform the use of language alone. Using this system, we were able to predict the sentiment expressed in video reviews and also the sentiment experienced by viewers while exposed to emotionally loaded content. Second, the thesis provides evidence of the portability of the developed strategies to other affect recognition problems. We provided support for this by studying the deception detection problem. Third, this thesis contributes several multimodal datasets that will enable further research in sentiment and deception detection.
Styles APA, Harvard, Vancouver, ISO, etc.
16

Sanagavarapu, Krishna Chaitanya. « Determining Whether and When People Participate in the Events They Tweet About ». Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984235/.

Texte intégral
Résumé :
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
Styles APA, Harvard, Vancouver, ISO, etc.
17

Paterson, Kimberly Laurel Ms. « TSPOONS : Tracking Salience Profiles Of Online News Stories ». DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1222.

Texte intégral
Résumé :
News space is a relatively nebulous term that describes the general discourse concerning events that affect the populace. Past research has focused on qualitatively analyzing news space in an attempt to answer big questions about how the populace relates to the news and how they respond to it. We want to ask when do stories begin? What stories stand out among the noise? In order to answer the big questions about news space, we need to track the course of individual stories in the news. By analyzing the specific articles that comprise stories, we can synthesize the information gained from several stories to see a more complete picture of the discourse. The individual articles, the groups of articles that become stories, and the overall themes that connect stories together all complete the narrative about what is happening in society. TSPOONS provides a framework for analyzing news stories and answering two main questions: what were the important stories during some time frame and what were the important stories involving some topic. Drawing technical news stories from Techmeme.com, TSPOONS generates profiles of each news story, quantitatively measuring the importance, or salience, of news stories as well as quantifying the impact of these stories over time.
Styles APA, Harvard, Vancouver, ISO, etc.
18

Alsehaimi, Afnan Abdulrahman A. « Sentiment Analysis for E-book Reviews on Amazon to Determine E-book Impact Rank ». University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619109972210567.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
19

Cotra, Aditya Kousik. « Trend Analysis on Artificial Intelligence Patents ». University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1617104823936441.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
20

Greer, Jeremiah. « Unsupervised Interpretable Feature Extraction for Binary Executables using LIBCAISE ». University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1560866693877849.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
21

Bihi, Ahmed. « Analysis of similarity and differences between articles using semantics ». Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-34843.

Texte intégral
Résumé :
Adding semantic analysis in the process of comparing news articles enables a deeper level of analysis than traditional keyword matching. In this bachelor’s thesis, we have compared, implemented, and evaluated three commonly used approaches for document-level similarity. The three similarity measurement selected were, keyword matching, TF-IDF vector distance, and Latent Semantic Indexing. Each method was evaluated on a coherent set of news articles where the majority of the articles were written about Donald Trump and the American election the 9th of November 2016, there were several control articles, about random topics, in the set of articles. TF-IDF vector distance combined with Cosine similarity and Latent Semantic Indexing gave the best results on the set of articles by separating the control articles from the Trump articles. Keyword matching and TF-IDF distance using Euclidean distance did not separate the Trump articles from the control articles. We implemented and performed sentiment analysis on the set of news articles in the classes positive, negative and neutral and then validated them against human readers classifying the articles. With the sentiment analysis (positive, negative, and neutral) implementation, we got a high correlation with human readers (100%).
Styles APA, Harvard, Vancouver, ISO, etc.
22

Passos, Alexandre Tachard 1986. « Combinatorial algorithms and linear programming for inference in natural language processing = Algoritmos combinatórios e de programação linear para inferência em processamento de linguagem natural ». [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275609.

Texte intégral
Résumé :
Orientador: Jacques Wainer
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-24T00:42:33Z (GMT). No. of bitstreams: 1 Passos_AlexandreTachard_D.pdf: 2615030 bytes, checksum: 93841a46120b968f6da6c9aea28953b7 (MD5) Previous issue date: 2013
Resumo: Em processamento de linguagem natural, e em aprendizado de máquina em geral, é comum o uso de modelos gráficos probabilísticos (probabilistic graphical models). Embora estes modelos sejam muito convenientes, possibilitando a expressão de relações complexas entre várias variáveis que se deseja prever dado uma sentença ou um documento, algoritmos comuns de aprendizado e de previsão utilizando estes modelos são frequentemente ineficientes. Por isso têm-se explorado recentemente o uso de relaxações usando programação linear deste problema de inferência. Esta tese apresenta duas contribuições para a teoria e prática de relaxações de programação linear para inferência em modelos probabilísticos gráficos. Primeiro, apresentamos um novo algoritmo, baseado na técnica de geração de colunas (dual à técnica dos planos de corte) que acelera a execução do algoritmo de Viterbi, a técnica mais utilizada para inferência em modelos lineares. O algoritmo apresentado também se aplica em modelos que são árvores e em hipergrafos. Em segundo mostramos uma nova relaxação linear para o problema de inferência conjunta, quando se quer acoplar vários modelos, em cada qual inferência é eficiente, mas em cuja junção inferência é NP-completa. Esta tese propõe uma extensão à técnica de decomposição dual (dual decomposition) que permite além de juntar vários modelos a adição de fatores que tocam mais de um submodelo eficientemente
Abstract: In natural language processing, and in general machine learning, probabilistic graphical models (and more generally structured linear models) are commonly used. Although these models are convenient, allowing the expression of complex relationships between many random variables one wants to predict given a document or sentence, most learning and prediction algorithms for general models are inefficient. Hence there has recently been interest in using linear programming relaxations for the inference tasks necessary when learning or applying these models. This thesis presents two contributions to the theory and practice of linear programming relaxations for inference in structured linear models. First we present a new algorithm, based on column generation (a technique which is dual to the cutting planes method) to accelerate the Viterbi algorithm, the most popular exact inference technique for linear-chain graphical models. The method is also applicable to tree graphical models and hypergraph models. Then we present a new linear programming relaxation for the problem of joint inference, when one has many submodels and wants to predict using all of them at once. In general joint inference is NP-complete, but algorithms based on dual decomposition have proven to be efficiently applicable for the case when the joint model can be expressed as many separate models plus linear equality constraints. This thesis proposes an extension to dual decomposition which allows also the presence of factors which score parts that belong in different submodels, improving the expressivity of dual decomposition at no extra computational cost
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
Styles APA, Harvard, Vancouver, ISO, etc.
23

Yeates, Stuart Andrew. « Text Augmentation : Inserting markup into natural language text with PPM Models ». The University of Waikato, 2006. http://hdl.handle.net/10289/2600.

Texte intégral
Résumé :
This thesis describes a new optimisation and new heuristics for automatically marking up XML documents, and CEM, a Java implementation, using PPM models. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BibTeX system and marked up in XML with every field from the original BibTeX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists' Communique corpus and the Reuters' corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked up documents. The performance of the new heuristics and optimisation are examined using the four corpora.
Styles APA, Harvard, Vancouver, ISO, etc.
24

Tabassum, Binte Jafar Jeniya. « Information Extraction From User Generated Noisy Texts ». The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1606315356821532.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
25

Roman, Norton Trevisan. « Emoção e a sumarização automatica de dialogos ». [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276233.

Texte intégral
Résumé :
Orientadores: Ariadne Maria Brito Rizzoni Carvalho, Paul Piwek
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-08T21:38:00Z (GMT). No. of bitstreams: 1 Roman_NortonTrevisan_D.pdf: 3357998 bytes, checksum: 3ae61241e75f8f93a517ecbc678e1caf (MD5) Previous issue date: 2007
Resumo: Esta tese apresenta várias contribuições ao campo da sumarização automática de diálogos. Ela fornece evidências em favor da hipótese de que toda vez que um diálogo apresentar um comportamento muito impolido, por um ou mais de seus interlocutores, este comportamento tenderá a ser descrito em seu resumo. Além disso, os resultados experimentais mostraram também que o relato deste comportamento é feito de modo a apresentar um forte viés, determinado pelo ponto de vista do sumarizador. Este resultado não foi afetado por restrições no tamanho do resumo. Além disso, os experimentos forneceram informações bastante úteis com relação a quando e como julgamentos de emoção e comportamento devem ser adicionados ao resumo. Para executar os experimentos, um esquema de anotação multi-dimensional e categórico foi desenvolvido, podendo ser de grande ajuda a outros pesquisadores que precisem classificar dados de maneira semelhante. Os resultados dos estudos empíricos foram usados para construir um sistema automático de sumarização de diálogos, de modo a testar sua aplicabilidade computacional. A saída do sistema consiste de resumos nos quais a informação técnica e emocional, como julgamentos do comportamento dos participantes do diálogos, são combinadas de modo a refletir o viés do sumarizador, sendo o ponto de vista definido pelo usuário
Abstract: This thesis presents a number of contributions to the field of automatic dialogue summarisation. It provides evidence for the hypothesis that whenever a dialogue features very impolite behaviour by one or more of its interlocutors, this behaviour will tend to be described in the dialogue¿s summary. Moreover, further experimental results showed that this behaviour is reported with a strong bias determined by the point of view of the summariser. This result was not affected by constraints on the summary length. The experiments provided useful information on when and how assessments of emotion and behaviour should be added to a dialogue summary. To conduct the experiments, a categorical multi-dimensional annotation scheme was developed which may also be helpful to other researchers who need to annotate data in a similar way. The results from the empirical studies were used to build an automatic dialogue summarisation system, in order to test their computational applicability. The system¿s output consists of summaries in which technical and emotional information, such as assessments of the dialogue participants¿ behaviour, are combined in a way that reflects the bias of the summariser, being the point of view defined by the user
Doutorado
Doutor em Ciência da Computação
Styles APA, Harvard, Vancouver, ISO, etc.
26

Mysore, Gopinath Abhijith Athreya. « Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning ». University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535371714338677.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
27

Shankar, Arunprasath. « ONTOLOGY-DRIVEN SEMI-SUPERVISED MODEL FOR CONCEPTUAL ANALYSIS OF DESIGN SPECIFICATIONS ». Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1401706747.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
28

Bulgarov, Florin Adrian. « Toward Supporting Fine-Grained, Structured, Meaningful and Engaging Feedback in Educational Applications ». Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404562/.

Texte intégral
Résumé :
Recent advancements in machine learning have started to put their mark on educational technology. Technology is evolving fast and, as people adopt it, schools and universities must also keep up (nearly 70% of primary and secondary schools in the UK are now using tablets for various purposes). As these numbers are likely going to follow the same increasing trend, it is imperative for schools to adapt and benefit from the advantages offered by technology: real-time processing of data, availability of different resources through connectivity, efficiency, and many others. To this end, this work contributes to the growth of educational technology by developing several algorithms and models that are meant to ease several tasks for the instructors, engage students in deep discussions and ultimately, increase their learning gains. First, a novel, fine-grained knowledge representation is introduced that splits phrases into their constituent propositions that are both meaningful and minimal. An automated extraction algorithm of the propositions is also introduced. Compared with other fine-grained representations, the extraction model does not require any human labor after it is trained, while the results show considerable improvement over two meaningful baselines. Second, a proposition alignment model is created that relies on even finer-grained units of text while also outperforming several alternative systems. Third, a detailed machine learning based analysis of students' unrestricted natural language responses to questions asked in classrooms is made by leveraging the proposition extraction algorithm to make computational predictions of textual assessment. Two computational approaches are introduced that use and compare manually engineered machine learning features with word embeddings input into a two-hidden layers neural network. Both methods achieve notable improvements over two alternative approaches, a recent short answer grading system and DiSAN – a recent, pre-trained, light-weight neural network that obtained state-of-the-art performance on multiple NLP tasks and corpora. Fourth, a clustering algorithm is introduced in order to bring structure to the feedback offered to instructors in classrooms. The algorithm organizes student responses based on three important aspects: propositional importance classifications, computational textual understanding of student understanding and algorithm similarity metrics between student responses. Moreover, a dynamic cluster selection algorithm is designed to decide which are the best groups of responses resulting from the cluster hierarchy. The algorithm achieves a performance that is 86.3% of the performance achieved by humans on the same task and dataset. Fifth, a deep neural network is built to predict, for each cluster, an engagement response that is meant to help generate insightful classroom discussion. This is the first ever computational model to predict how engaging student responses will be in classroom discussion. Its performance reaches 86.8% of the performance obtained by humans on the same task and dataset. Moreover, I also demonstrate the effectiveness of a dynamic algorithm that can self-improve with minimal help from the teachers, in order to reduce its relative error by up to 32%.
Styles APA, Harvard, Vancouver, ISO, etc.
29

Barakat, Arian. « What makes an (audio)book popular ? » Thesis, Linköpings universitet, Statistik och maskininlärning, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152871.

Texte intégral
Résumé :
Audiobook reading has traditionally been used for educational purposes but has in recent times grown into a popular alternative to the more traditional means of consuming literature. In order to differentiate themselves from other players in the market, but also provide their users enjoyable literature, several audiobook companies have lately directed their efforts on producing own content. Creating highly rated content is, however, no easy task and one reoccurring challenge is how to make a bestselling story. In an attempt to identify latent features shared by successful audiobooks and evaluate proposed methods for literary quantification, this thesis employs an array of frameworks from the field of Statistics, Machine Learning and Natural Language Processing on data and literature provided by Storytel - Sweden’s largest audiobook company. We analyze and identify important features from a collection of 3077 Swedish books concerning their promotional and literary success. By considering features from the aspects Metadata, Theme, Plot, Style and Readability, we found that popular books are typically published as a book series, cover 1-3 central topics, write about, e.g., daughter-mother relationships and human closeness but that they also hold, on average, a higher proportion of verbs and a lower degree of short words. Despite successfully identifying these, but also other factors, we recognized that none of our models predicted “bestseller” adequately and that future work may desire to study additional factors, employ other models or even use different metrics to define and measure popularity. From our evaluation of the literary quantification methods, namely topic modeling and narrative approximation, we found that these methods are, in general, suitable for Swedish texts but that they require further improvement and experimentation to be successfully deployed for Swedish literature. For topic modeling, we recognized that the sole use of nouns provided more interpretable topics and that the inclusion of character names tended to pollute the topics. We also identified and discussed the possible problem of word inflections when modeling topics for more morphologically complex languages, and that additional preprocessing treatments such as word lemmatization or post-training text normalization may improve the quality and interpretability of topics. For the narrative approximation, we discovered that the method currently suffers from three shortcomings: (1) unreliable sentence segmentation, (2) unsatisfactory dictionary-based sentiment analysis and (3) the possible loss of sentiment information induced by translations. Despite only examining a handful of literary work, we further found that books written initially in Swedish had narratives that were more cross-language consistent compared to books written in English and then translated to Swedish.
Styles APA, Harvard, Vancouver, ISO, etc.
30

Tadisetty, Srikanth. « Prediction of Psychosis Using Big Web Data in the United States ». Kent State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=kent1532962079970169.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
31

Rosmorduc, Serge. « Analyse morpho-syntaxique de textes non ponctués : application aux textes hieroglyphiques ». Cachan, Ecole normale supérieure, 1996. http://www.theses.fr/1996DENS0028.

Texte intégral
Résumé :
Nous proposons un cadre logiciel pour l'étude linguistique de corpus de textes. Nous développons plus particulièrement un étiqueteur syntaxique robuste, afin de permettre des recherches dans un corpus sur des données grammaticales. L'accent étant mis sur le traitement de textes corrompus et/ou non ponctues, l'analyse utilise deux mécanismes complémentaires : le premier est un analyseur hors contexte structurant le texte selon une grammaire très lâche ; le second est un système de désambigüisation par automates, dont le but est de guider l'analyse en fournissant une représentation de l'usage.
Styles APA, Harvard, Vancouver, ISO, etc.
32

Soderland, Stephen Glenn. « Learning text analysis rules for domain-specific natural language processing ». 1997. https://scholarworks.umass.edu/dissertations/AAI9721493.

Texte intégral
Résumé :
An enormous amount of knowledge is needed to infer the meaning of unrestricted natural language. The problem can be reduced to a manageable size by restricting attention to a specific domain, which is a corpus of texts together with a predefined set of concepts that are of interest to that domain. Two widely different domains are used to illustrate this domain-specific approach. One domain is a collection of Wall Street Journal articles in which the target concept is management succession events: identifying persons moving into corporate management positions or moving out. A second domain is a collection of hospital discharge summaries in which the target concepts are various classes of diagnosis or symptom. The goal of an information extraction system is to identify references to the concept of interest for a particular domain. A key knowledge source for this purpose is a set of text analysis rules based on the vocabulary, semantic classes, and writing style peculiar to the domain. This thesis presents CRYSTAL, an implemented system that automatically induces domain-specific text analysis rules from training examples. CRYSTAL learns rules that approach the performance of hand-coded rules, are robust in the face of noise and inadequate features, and require only a modest amount of training data. CRYSTAL belongs to a class of machine learning algorithms called covering algorithms, and presents a novel control strategy with time and space complexities that are independent of the number of features. CRYSTAL navigates efficiently through an extremely large space of possible rules. CRYSTAL also demonstrates that expressive rule representation is essential for high performance, robust text analysis rules. While simple rules are adequate to capture the most salient regularities in the training data, high performance can only be achieved when rules are expressive enough to reflect the subtlety and variability of unrestricted natural language.
Styles APA, Harvard, Vancouver, ISO, etc.
33

« Advancing Biomedical Named Entity Recognition with Multivariate Feature Selection and Semantically Motivated Features ». Doctoral diss., 2013. http://hdl.handle.net/2286/R.I.18042.

Texte intégral
Résumé :
abstract: Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.
Dissertation/Thesis
Ph.D. Computer Science 2013
Styles APA, Harvard, Vancouver, ISO, etc.
34

« Automatic text categorization for information filtering ». 1998. http://library.cuhk.edu.hk/record=b5889734.

Texte intégral
Résumé :
Ho Chao Yang.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (leaves 157-163).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgment --- p.iii
List of Figures --- p.viii
List of Tables --- p.xiv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Automatic Document Categorization --- p.1
Chapter 1.2 --- Information Filtering --- p.3
Chapter 1.3 --- Contributions --- p.6
Chapter 1.4 --- Organization of the Thesis --- p.7
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9
Chapter 2.1.1 --- Rule-Based Approach --- p.10
Chapter 2.1.2 --- Similarity-Based Approach --- p.13
Chapter 2.2 --- Existing Information Filtering Approaches --- p.19
Chapter 2.2.1 --- Information Filtering Systems --- p.19
Chapter 2.2.2 --- Filtering in TREC --- p.21
Chapter 3 --- Document Pre-Processing --- p.23
Chapter 3.1 --- Document Representation --- p.23
Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26
Chapter 4 --- A New Approach - IBRI --- p.31
Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31
Chapter 4.2 --- The IBRI Representation and Definitions --- p.34
Chapter 4.3 --- The IBRI Learning Algorithm --- p.37
Chapter 5 --- IBRI Experiments --- p.43
Chapter 5.1 --- Experimental Setup --- p.43
Chapter 5.2 --- Evaluation Metric --- p.45
Chapter 5.3 --- Results --- p.46
Chapter 6 --- A New Approach - GIS --- p.50
Chapter 6.1 --- Motivation of GIS --- p.50
Chapter 6.2 --- Similarity-Based Learning --- p.51
Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58
Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63
Chapter 6.5 --- Time Complexity --- p.64
Chapter 7 --- GIS Experiments --- p.68
Chapter 7.1 --- Experimental Setup --- p.68
Chapter 7.2 --- Results --- p.73
Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87
Chapter 8.1 --- Information Filtering Systems --- p.87
Chapter 8.2 --- GIS-Based Information Filtering --- p.90
Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95
Chapter 9.1 --- Experimental Setup --- p.95
Chapter 9.2 --- Results --- p.100
Chapter 10 --- Conclusions and Future Work --- p.108
Chapter 10.1 --- Conclusions --- p.108
Chapter 10.2 --- Future Work --- p.110
Chapter A --- Sample Documents in the corpora --- p.111
Chapter B --- Details of Experimental Results of GIS --- p.120
Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141
Styles APA, Harvard, Vancouver, ISO, etc.
35

Farra, Noura. « Cross-Lingual and Low-Resource Sentiment Analysis ». Thesis, 2019. https://doi.org/10.7916/d8-x3b7-1r92.

Texte intégral
Résumé :
Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages. This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language. Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis. To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments. The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language. In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment.
Styles APA, Harvard, Vancouver, ISO, etc.
36

Hsu, Hsin-jen Tomblin J. Bruce. « A neurophysiological study on probabilistic grammatical learning and sentence processing ». 2009. http://ir.uiowa.edu/etd/243/.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
37

Lin, Zhouhan. « Deep neural networks for natural language processing and its acceleration ». Thèse, 2019. http://hdl.handle.net/1866/23438.

Texte intégral
Résumé :
Cette thèse par article comprend quatre articles qui contribuent au domaine de l'apprentissage profond, en particulier à l'accélération de l’apprentissage par le biais de réseaux à faible précision et à l'application de réseaux de neurones profonds au traitement du langage naturel. Dans le premier article, nous étudions un schéma d’entraînement de réseau de neurones qui élimine la plupart des multiplications en virgule flottante. Cette approche consiste à binariser ou à ternariser les poids dans la propagation en avant et à quantifier les états cachés dans la propagation arrière, ce qui convertit les multiplications en changements de signe et en décalages binaires. Les résultats expérimentaux sur des jeux de données de petite à moyenne taille montrent que cette approche produit des performances encore meilleures que l’approche standard de descente de gradient stochastique, ouvrant la voie à un entraînement des réseaux de neurones rapide et efficace au niveau du matériel. Dans le deuxième article, nous avons proposé un mécanisme structuré d’auto-attention d’enchâssement de phrases qui extrait des représentations interprétables de phrases sous forme matricielle. Nous démontrons des améliorations dans 3 tâches différentes: le profilage de l'auteur, la classification des sentiments et l'implication textuelle. Les résultats expérimentaux montrent que notre modèle génère un gain en performance significatif par rapport aux autres méthodes d’enchâssement de phrases dans les 3 tâches. Dans le troisième article, nous proposons un modèle hiérarchique avec graphe de calcul dynamique, pour les données séquentielles, qui apprend à construire un arbre lors de la lecture de la séquence. Le modèle apprend à créer des connexions de saut adaptatives, ce qui facilitent l'apprentissage des dépendances à long terme en construisant des cellules récurrentes de manière récursive. L’entraînement du réseau peut être fait soit par entraînement supervisée en donnant des structures d’arbres dorés, soit par apprentissage par renforcement. Nous proposons des expériences préliminaires dans 3 tâches différentes: une nouvelle tâche d'évaluation de l'expression mathématique (MEE), une tâche bien connue de la logique propositionnelle et des tâches de modélisation du langage. Les résultats expérimentaux montrent le potentiel de l'approche proposée. Dans le quatrième article, nous proposons une nouvelle méthode d’analyse par circonscription utilisant les réseaux de neurones. Le modèle prédit la structure de l'arbre d'analyse en prédisant un scalaire à valeur réelle, soit la distance syntaxique, pour chaque position de division dans la phrase d'entrée. L'ordre des valeurs relatives de ces distances syntaxiques détermine ensuite la structure de l'arbre d'analyse en spécifiant l'ordre dans lequel les points de division seront sélectionnés, en partitionnant l'entrée de manière récursive et descendante. L’approche proposée obtient une performance compétitive sur le jeu de données Penn Treebank et réalise l’état de l’art sur le jeu de données Chinese Treebank.
This thesis by article consists of four articles which contribute to the field of deep learning, specifically in the acceleration of training through low-precision networks, and the application of deep neural networks on natural language processing. In the first article, we investigate a neural network training scheme that eliminates most of the floating-point multiplications. This approach consists of binarizing or ternarizing the weights in the forward propagation and quantizing the hidden states in the backward propagation, which converts multiplications to sign changes and binary shifts. Experimental results on datasets from small to medium size show that this approach result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardware-friendly training of neural networks. In the second article, we proposed a structured self-attentive sentence embedding that extracts interpretable sentence representations in matrix form. We demonstrate improvements on 3 different tasks: author profiling, sentiment classification and textual entailment. Experimental results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks. In the third article, we propose a hierarchical model with dynamical computation graph for sequential data that learns to construct a tree while reading the sequence. The model learns to create adaptive skip-connections that ease the learning of long-term dependencies through constructing recurrent cells in a recursive manner. The training of the network can either be supervised training by giving golden tree structures, or through reinforcement learning. We provide preliminary experiments in 3 different tasks: a novel Math Expression Evaluation (MEE) task, a well-known propositional logic task, and language modelling tasks. Experimental results show the potential of the proposed approach. In the fourth article, we propose a novel constituency parsing method with neural networks. The model predicts the parse tree structure by predicting a real valued scalar, named syntactic distance, for each split position in the input sentence. The order of the relative values of these syntactic distances then determine the parse tree structure by specifying the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Our proposed approach was demonstrated with competitive performance on Penn Treebank dataset, and the state-of-the-art performance on Chinese Treebank dataset.
Styles APA, Harvard, Vancouver, ISO, etc.
38

« Context-Aware Adaptive Hybrid Semantic Relatedness in Biomedical Science ». Doctoral diss., 2016. http://hdl.handle.net/2286/R.I.38725.

Texte intégral
Résumé :
abstract: Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.
Dissertation/Thesis
Doctoral Dissertation Biomedical Informatics 2016
Styles APA, Harvard, Vancouver, ISO, etc.
39

« Sentiment Analysis for Long-Term Stock Prediction ». Master's thesis, 2016. http://hdl.handle.net/2286/R.I.39401.

Texte intégral
Résumé :
abstract: There have been extensive research in how news and twitter feeds can affect the outcome of a given stock. However, a majority of this research has studied the short term effects of sentiment with a given stock price. Within this research, I studied the long-term effects of a given stock price using fundamental analysis techniques. Within this research, I collected both sentiment data and fundamental data for Apple Inc., Microsoft Corp., and Peabody Energy Corp. Using a neural network algorithm, I found that sentiment does have an effect on the annual growth of these companies but the fundamentals are more relevant when determining overall growth. The stocks which show more consistent growth hold more importance on the previous year’s stock price but companies which have less consistency in their growth showed more reliance on the revenue growth and sentiment on the overall company and CEO. I discuss how I collected my research data and used a multi-layered perceptron to predict a threshold growth of a given stock. The threshold used for this particular research was 10%. I then showed the prediction of this threshold using my perceptron and afterwards, perform an f anova test on my choice of features. The results showed the fundamentals being the better predictor of stock information but fundamentals came in a close second in several cases, proving sentiment does hold an effect over long term growth.
Dissertation/Thesis
Masters Thesis Computer Science 2016
Styles APA, Harvard, Vancouver, ISO, etc.
40

Skomorowski, Jason. « Topical Opinion Retrieval ». Thesis, 2006. http://hdl.handle.net/10012/2653.

Texte intégral
Résumé :
With a growing amount of subjective content distributed across the Web, there is a need for a domain-independent information retrieval system that would support ad hoc retrieval of documents expressing opinions on a specific topic of the user’s query. While the research area of opinion detection and sentiment analysis has received much attention in the recent years, little research has been done on identifying subjective content targeted at a specific topic, i.e. expressing topical opinion. This thesis presents a novel method for ad hoc retrieval of documents which contain subjective content on the topic of the query. Documents are ranked by the likelihood each document expresses an opinion on a query term, approximated as the likelihood any occurrence of the query term is modified by a subjective adjective. Domain-independent user-based evaluation of the proposed methods was conducted, and shows statistically significant gains over Google ranking as the baseline.
Styles APA, Harvard, Vancouver, ISO, etc.
41

« Analysis and Decision-Making with Social Media ». Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.54830.

Texte intégral
Résumé :
abstract: The rapid advancements of technology have greatly extended the ubiquitous nature of smartphones acting as a gateway to numerous social media applications. This brings an immense convenience to the users of these applications wishing to stay connected to other individuals through sharing their statuses, posting their opinions, experiences, suggestions, etc on online social networks (OSNs). Exploring and analyzing this data has a great potential to enable deep and fine-grained insights into the behavior, emotions, and language of individuals in a society. This proposed dissertation focuses on utilizing these online social footprints to research two main threads – 1) Analysis: to study the behavior of individuals online (content analysis) and 2) Synthesis: to build models that influence the behavior of individuals offline (incomplete action models for decision-making). A large percentage of posts shared online are in an unrestricted natural language format that is meant for human consumption. One of the demanding problems in this context is to leverage and develop approaches to automatically extract important insights from this incessant massive data pool. Efforts in this direction emphasize mining or extracting the wealth of latent information in the data from multiple OSNs independently. The first thread of this dissertation focuses on analytics to investigate the differentiated content-sharing behavior of individuals. The second thread of this dissertation attempts to build decision-making systems using social media data. The results of the proposed dissertation emphasize the importance of considering multiple data types while interpreting the content shared on OSNs. They highlight the unique ways in which the data and the extracted patterns from text-based platforms or visual-based platforms complement and contrast in terms of their content. The proposed research demonstrated that, in many ways, the results obtained by focusing on either only text or only visual elements of content shared online could lead to biased insights. On the other hand, it also shows the power of a sequential set of patterns that have some sort of precedence relationships and collaboration between humans and automated planners.
Dissertation/Thesis
Doctoral Dissertation Computer Science 2019
Styles APA, Harvard, Vancouver, ISO, etc.
42

Hirtle, David Z. « 'Healthy' Coreference : Applying Coreference Resolution to the Health Education Domain ». Thesis, 2008. http://hdl.handle.net/10012/3891.

Texte intégral
Résumé :
This thesis investigates coreference and its resolution within the domain of health education. Coreference is the relationship between two linguistic expressions that refer to the same real-world entity, and resolution involves identifying this relationship among sets of referring expressions. The coreference resolution task is considered among the most difficult of problems in Artificial Intelligence; in some cases, resolution is impossible even for humans. For example, "she" in the sentence "Lynn called Jennifer while she was on vacation" is genuinely ambiguous: the vacationer could be either Lynn or Jennifer.

There are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount.

No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data.

Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future.
Styles APA, Harvard, Vancouver, ISO, etc.
43

« Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media ». Doctoral diss., 2018. http://hdl.handle.net/2286/R.I.49154.

Texte intégral
Résumé :
abstract: Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).
Dissertation/Thesis
Doctoral Dissertation Computer Science 2018
Styles APA, Harvard, Vancouver, ISO, etc.
44

(6636317), Qiaofei Ye. « A SENTIMENT BASED AUTOMATIC QUESTION-ANSWERING FRAMEWORK ». Thesis, 2019.

Trouver le texte intégral
Résumé :
With the rapid growth and maturity of Question-Answering (QA) domain, non-factoid Question-Answering tasks are in high demand. However, existing Question-Answering systems are either fact-based, or highly keyword related and hard-coded. Moreover, if QA is to become more personable, sentiment of the question and answer should be taken into account. However, there is not much research done in the field of non-factoid Question-Answering systems based on sentiment analysis, that would enable a system to retrieve answers in a more emotionally intelligent way. This study investigates to what extent could prediction of the best answer be improved by adding an extended representation of sentiment information into non-factoid Question-Answering.
Styles APA, Harvard, Vancouver, ISO, etc.
45

Newsom, Eric Tyner. « An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences ». Thesis, 2013. http://hdl.handle.net/1805/3666.

Texte intégral
Résumé :
Indiana University-Purdue University Indianapolis (IUPUI)
The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed.
Styles APA, Harvard, Vancouver, ISO, etc.
46

Pandey, Ritika. « Text Mining for Social Harm and Criminal Justice Applications ». Thesis, 2020. http://hdl.handle.net/1805/23348.

Texte intégral
Résumé :
Indiana University-Purdue University Indianapolis (IUPUI)
Increasing rates of social harm events and plethora of text data demands the need of employing text mining techniques not only to better understand their causes but also to develop optimal prevention strategies. In this work, we study three social harm issues: crime topic models, transitions into drug addiction and homicide investigation chronologies. Topic modeling for the categorization and analysis of crime report text allows for more nuanced categories of crime compared to official UCR categorizations. This study has important implications in hotspot policing. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. We further explore the transitions into drug addiction using Reddit data. We proposed a prediction model to classify the users’ transition from casual drug discussion forum to recovery drug discussion forum and the likelihood of such transitions. Through this study we offer insights into modern drug culture and provide tools with potential applications in combating opioid crises. Lastly, we present a knowledge graph based framework for homicide investigation chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of key features that determine whether a homicide is ultimately solved. For this purpose we perform named entity recognition to determine witnesses, detectives and suspects from chronology, use keyword expansion to identify various evidence types and finally link these entities and evidence to construct a homicide investigation knowledge graph. We compare the performance over several choice of methodologies for these sub-tasks and analyze the association between network statistics of knowledge graph and homicide solvability.
Styles APA, Harvard, Vancouver, ISO, etc.
47

Pandit, Yogesh. « Context specific text mining for annotating protein interactions with experimental evidence ». Thesis, 2014. http://hdl.handle.net/1805/3809.

Texte intégral
Résumé :
Indiana University-Purdue University Indianapolis (IUPUI)
Proteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie