Thèses sur le sujet « Multivariate analysis. Natural language processing (Computer science) »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les 47 meilleures thèses pour votre recherche sur le sujet « Multivariate analysis. Natural language processing (Computer science) ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.
Cannon, Paul C. « Extending the information partition function : modeling interaction effects in highly multivariate, discrete data / ». Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2263.pdf.
Texte intégralShepherd, David. « Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools / ». Access to citation, abstract and download form provided by ProQuest Information and Learning Company ; downloadable PDF file, 176 p, 2007. http://proquest.umi.com/pqdweb?did=1397920371&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.
Texte intégralLi, Wenhui. « Sentiment analysis : Quantitative evaluation of subjective opinions using natural language processing ». Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/28000.
Texte intégralKeller, Thomas Anderson. « Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing ». Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.
Texte intégralMost machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.
Ramachandran, Venkateshwaran. « A temporal analysis of natural language narrative text ». Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-03122009-040648/.
Texte intégralCrocker, Matthew Walter. « A principle-based system for natural language analysis and translation ». Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27863.
Texte intégralScience, Faculty of
Computer Science, Department of
Graduate
Holmes, Wesley J. « Topological Analysis of Averaged Sentence Embeddings ». Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1609351352688467.
Texte intégralLee, Wing Kuen. « Interpreting tables in text using probabilistic two-dimensional context-free grammars / ». View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20LEEW.
Texte intégralZhan, Tianjie. « Semantic analysis for extracting fine-grained opinion aspects ». HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1213.
Texte intégralCurrin, Aubrey Jason. « Text data analysis for a smart city project in a developing nation ». Thesis, University of Fort Hare, 2015. http://hdl.handle.net/10353/2227.
Texte intégralLi, Jie. « Intention-driven textual semantic analysis ». School of Computer Science and Software Engineering, 2008. http://ro.uow.edu.au/theses/104.
Texte intégralRiehl, Sean K. « Property Recommendation System with Geospatial Data Analytics and Natural Language Processing for Urban Land Use ». Cleveland State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=csu1590513674513905.
Texte intégralSmith, Andrew Edward. « Development of a practical system for text content analysis and mining / ». [St. Lucia, Qld.], 2002. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe17847.pdf.
Texte intégralWong, Jimmy Pui Fung. « The use of prosodic features in Chinese speech recognition and spoken language processing / ». View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20WONG.
Texte intégralIncludes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
Pérez-Rosas, Verónica. « Exploration of Visual, Acoustic, and Physiological Modalities to Complement Linguistic Representations for Sentiment Analysis ». Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699996/.
Texte intégralSanagavarapu, Krishna Chaitanya. « Determining Whether and When People Participate in the Events They Tweet About ». Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984235/.
Texte intégralPaterson, Kimberly Laurel Ms. « TSPOONS : Tracking Salience Profiles Of Online News Stories ». DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1222.
Texte intégralAlsehaimi, Afnan Abdulrahman A. « Sentiment Analysis for E-book Reviews on Amazon to Determine E-book Impact Rank ». University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619109972210567.
Texte intégralCotra, Aditya Kousik. « Trend Analysis on Artificial Intelligence Patents ». University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1617104823936441.
Texte intégralGreer, Jeremiah. « Unsupervised Interpretable Feature Extraction for Binary Executables using LIBCAISE ». University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1560866693877849.
Texte intégralBihi, Ahmed. « Analysis of similarity and differences between articles using semantics ». Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-34843.
Texte intégralPassos, Alexandre Tachard 1986. « Combinatorial algorithms and linear programming for inference in natural language processing = Algoritmos combinatórios e de programação linear para inferência em processamento de linguagem natural ». [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275609.
Texte intégralTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-24T00:42:33Z (GMT). No. of bitstreams: 1 Passos_AlexandreTachard_D.pdf: 2615030 bytes, checksum: 93841a46120b968f6da6c9aea28953b7 (MD5) Previous issue date: 2013
Resumo: Em processamento de linguagem natural, e em aprendizado de máquina em geral, é comum o uso de modelos gráficos probabilísticos (probabilistic graphical models). Embora estes modelos sejam muito convenientes, possibilitando a expressão de relações complexas entre várias variáveis que se deseja prever dado uma sentença ou um documento, algoritmos comuns de aprendizado e de previsão utilizando estes modelos são frequentemente ineficientes. Por isso têm-se explorado recentemente o uso de relaxações usando programação linear deste problema de inferência. Esta tese apresenta duas contribuições para a teoria e prática de relaxações de programação linear para inferência em modelos probabilísticos gráficos. Primeiro, apresentamos um novo algoritmo, baseado na técnica de geração de colunas (dual à técnica dos planos de corte) que acelera a execução do algoritmo de Viterbi, a técnica mais utilizada para inferência em modelos lineares. O algoritmo apresentado também se aplica em modelos que são árvores e em hipergrafos. Em segundo mostramos uma nova relaxação linear para o problema de inferência conjunta, quando se quer acoplar vários modelos, em cada qual inferência é eficiente, mas em cuja junção inferência é NP-completa. Esta tese propõe uma extensão à técnica de decomposição dual (dual decomposition) que permite além de juntar vários modelos a adição de fatores que tocam mais de um submodelo eficientemente
Abstract: In natural language processing, and in general machine learning, probabilistic graphical models (and more generally structured linear models) are commonly used. Although these models are convenient, allowing the expression of complex relationships between many random variables one wants to predict given a document or sentence, most learning and prediction algorithms for general models are inefficient. Hence there has recently been interest in using linear programming relaxations for the inference tasks necessary when learning or applying these models. This thesis presents two contributions to the theory and practice of linear programming relaxations for inference in structured linear models. First we present a new algorithm, based on column generation (a technique which is dual to the cutting planes method) to accelerate the Viterbi algorithm, the most popular exact inference technique for linear-chain graphical models. The method is also applicable to tree graphical models and hypergraph models. Then we present a new linear programming relaxation for the problem of joint inference, when one has many submodels and wants to predict using all of them at once. In general joint inference is NP-complete, but algorithms based on dual decomposition have proven to be efficiently applicable for the case when the joint model can be expressed as many separate models plus linear equality constraints. This thesis proposes an extension to dual decomposition which allows also the presence of factors which score parts that belong in different submodels, improving the expressivity of dual decomposition at no extra computational cost
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
Yeates, Stuart Andrew. « Text Augmentation : Inserting markup into natural language text with PPM Models ». The University of Waikato, 2006. http://hdl.handle.net/10289/2600.
Texte intégralTabassum, Binte Jafar Jeniya. « Information Extraction From User Generated Noisy Texts ». The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1606315356821532.
Texte intégralRoman, Norton Trevisan. « Emoção e a sumarização automatica de dialogos ». [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276233.
Texte intégralTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-08T21:38:00Z (GMT). No. of bitstreams: 1 Roman_NortonTrevisan_D.pdf: 3357998 bytes, checksum: 3ae61241e75f8f93a517ecbc678e1caf (MD5) Previous issue date: 2007
Resumo: Esta tese apresenta várias contribuições ao campo da sumarização automática de diálogos. Ela fornece evidências em favor da hipótese de que toda vez que um diálogo apresentar um comportamento muito impolido, por um ou mais de seus interlocutores, este comportamento tenderá a ser descrito em seu resumo. Além disso, os resultados experimentais mostraram também que o relato deste comportamento é feito de modo a apresentar um forte viés, determinado pelo ponto de vista do sumarizador. Este resultado não foi afetado por restrições no tamanho do resumo. Além disso, os experimentos forneceram informações bastante úteis com relação a quando e como julgamentos de emoção e comportamento devem ser adicionados ao resumo. Para executar os experimentos, um esquema de anotação multi-dimensional e categórico foi desenvolvido, podendo ser de grande ajuda a outros pesquisadores que precisem classificar dados de maneira semelhante. Os resultados dos estudos empíricos foram usados para construir um sistema automático de sumarização de diálogos, de modo a testar sua aplicabilidade computacional. A saída do sistema consiste de resumos nos quais a informação técnica e emocional, como julgamentos do comportamento dos participantes do diálogos, são combinadas de modo a refletir o viés do sumarizador, sendo o ponto de vista definido pelo usuário
Abstract: This thesis presents a number of contributions to the field of automatic dialogue summarisation. It provides evidence for the hypothesis that whenever a dialogue features very impolite behaviour by one or more of its interlocutors, this behaviour will tend to be described in the dialogue¿s summary. Moreover, further experimental results showed that this behaviour is reported with a strong bias determined by the point of view of the summariser. This result was not affected by constraints on the summary length. The experiments provided useful information on when and how assessments of emotion and behaviour should be added to a dialogue summary. To conduct the experiments, a categorical multi-dimensional annotation scheme was developed which may also be helpful to other researchers who need to annotate data in a similar way. The results from the empirical studies were used to build an automatic dialogue summarisation system, in order to test their computational applicability. The system¿s output consists of summaries in which technical and emotional information, such as assessments of the dialogue participants¿ behaviour, are combined in a way that reflects the bias of the summariser, being the point of view defined by the user
Doutorado
Doutor em Ciência da Computação
Mysore, Gopinath Abhijith Athreya. « Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning ». University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535371714338677.
Texte intégralShankar, Arunprasath. « ONTOLOGY-DRIVEN SEMI-SUPERVISED MODEL FOR CONCEPTUAL ANALYSIS OF DESIGN SPECIFICATIONS ». Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1401706747.
Texte intégralBulgarov, Florin Adrian. « Toward Supporting Fine-Grained, Structured, Meaningful and Engaging Feedback in Educational Applications ». Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404562/.
Texte intégralBarakat, Arian. « What makes an (audio)book popular ? » Thesis, Linköpings universitet, Statistik och maskininlärning, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152871.
Texte intégralTadisetty, Srikanth. « Prediction of Psychosis Using Big Web Data in the United States ». Kent State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=kent1532962079970169.
Texte intégralRosmorduc, Serge. « Analyse morpho-syntaxique de textes non ponctués : application aux textes hieroglyphiques ». Cachan, Ecole normale supérieure, 1996. http://www.theses.fr/1996DENS0028.
Texte intégralSoderland, Stephen Glenn. « Learning text analysis rules for domain-specific natural language processing ». 1997. https://scholarworks.umass.edu/dissertations/AAI9721493.
Texte intégral« Advancing Biomedical Named Entity Recognition with Multivariate Feature Selection and Semantically Motivated Features ». Doctoral diss., 2013. http://hdl.handle.net/2286/R.I.18042.
Texte intégralDissertation/Thesis
Ph.D. Computer Science 2013
« Automatic text categorization for information filtering ». 1998. http://library.cuhk.edu.hk/record=b5889734.
Texte intégralThesis (M.Phil.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (leaves 157-163).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgment --- p.iii
List of Figures --- p.viii
List of Tables --- p.xiv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Automatic Document Categorization --- p.1
Chapter 1.2 --- Information Filtering --- p.3
Chapter 1.3 --- Contributions --- p.6
Chapter 1.4 --- Organization of the Thesis --- p.7
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9
Chapter 2.1.1 --- Rule-Based Approach --- p.10
Chapter 2.1.2 --- Similarity-Based Approach --- p.13
Chapter 2.2 --- Existing Information Filtering Approaches --- p.19
Chapter 2.2.1 --- Information Filtering Systems --- p.19
Chapter 2.2.2 --- Filtering in TREC --- p.21
Chapter 3 --- Document Pre-Processing --- p.23
Chapter 3.1 --- Document Representation --- p.23
Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26
Chapter 4 --- A New Approach - IBRI --- p.31
Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31
Chapter 4.2 --- The IBRI Representation and Definitions --- p.34
Chapter 4.3 --- The IBRI Learning Algorithm --- p.37
Chapter 5 --- IBRI Experiments --- p.43
Chapter 5.1 --- Experimental Setup --- p.43
Chapter 5.2 --- Evaluation Metric --- p.45
Chapter 5.3 --- Results --- p.46
Chapter 6 --- A New Approach - GIS --- p.50
Chapter 6.1 --- Motivation of GIS --- p.50
Chapter 6.2 --- Similarity-Based Learning --- p.51
Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58
Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63
Chapter 6.5 --- Time Complexity --- p.64
Chapter 7 --- GIS Experiments --- p.68
Chapter 7.1 --- Experimental Setup --- p.68
Chapter 7.2 --- Results --- p.73
Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87
Chapter 8.1 --- Information Filtering Systems --- p.87
Chapter 8.2 --- GIS-Based Information Filtering --- p.90
Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95
Chapter 9.1 --- Experimental Setup --- p.95
Chapter 9.2 --- Results --- p.100
Chapter 10 --- Conclusions and Future Work --- p.108
Chapter 10.1 --- Conclusions --- p.108
Chapter 10.2 --- Future Work --- p.110
Chapter A --- Sample Documents in the corpora --- p.111
Chapter B --- Details of Experimental Results of GIS --- p.120
Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141
Farra, Noura. « Cross-Lingual and Low-Resource Sentiment Analysis ». Thesis, 2019. https://doi.org/10.7916/d8-x3b7-1r92.
Texte intégralHsu, Hsin-jen Tomblin J. Bruce. « A neurophysiological study on probabilistic grammatical learning and sentence processing ». 2009. http://ir.uiowa.edu/etd/243/.
Texte intégralLin, Zhouhan. « Deep neural networks for natural language processing and its acceleration ». Thèse, 2019. http://hdl.handle.net/1866/23438.
Texte intégralThis thesis by article consists of four articles which contribute to the field of deep learning, specifically in the acceleration of training through low-precision networks, and the application of deep neural networks on natural language processing. In the first article, we investigate a neural network training scheme that eliminates most of the floating-point multiplications. This approach consists of binarizing or ternarizing the weights in the forward propagation and quantizing the hidden states in the backward propagation, which converts multiplications to sign changes and binary shifts. Experimental results on datasets from small to medium size show that this approach result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardware-friendly training of neural networks. In the second article, we proposed a structured self-attentive sentence embedding that extracts interpretable sentence representations in matrix form. We demonstrate improvements on 3 different tasks: author profiling, sentiment classification and textual entailment. Experimental results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks. In the third article, we propose a hierarchical model with dynamical computation graph for sequential data that learns to construct a tree while reading the sequence. The model learns to create adaptive skip-connections that ease the learning of long-term dependencies through constructing recurrent cells in a recursive manner. The training of the network can either be supervised training by giving golden tree structures, or through reinforcement learning. We provide preliminary experiments in 3 different tasks: a novel Math Expression Evaluation (MEE) task, a well-known propositional logic task, and language modelling tasks. Experimental results show the potential of the proposed approach. In the fourth article, we propose a novel constituency parsing method with neural networks. The model predicts the parse tree structure by predicting a real valued scalar, named syntactic distance, for each split position in the input sentence. The order of the relative values of these syntactic distances then determine the parse tree structure by specifying the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Our proposed approach was demonstrated with competitive performance on Penn Treebank dataset, and the state-of-the-art performance on Chinese Treebank dataset.
« Context-Aware Adaptive Hybrid Semantic Relatedness in Biomedical Science ». Doctoral diss., 2016. http://hdl.handle.net/2286/R.I.38725.
Texte intégralDissertation/Thesis
Doctoral Dissertation Biomedical Informatics 2016
« Sentiment Analysis for Long-Term Stock Prediction ». Master's thesis, 2016. http://hdl.handle.net/2286/R.I.39401.
Texte intégralDissertation/Thesis
Masters Thesis Computer Science 2016
Skomorowski, Jason. « Topical Opinion Retrieval ». Thesis, 2006. http://hdl.handle.net/10012/2653.
Texte intégral« Analysis and Decision-Making with Social Media ». Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.54830.
Texte intégralDissertation/Thesis
Doctoral Dissertation Computer Science 2019
Hirtle, David Z. « 'Healthy' Coreference : Applying Coreference Resolution to the Health Education Domain ». Thesis, 2008. http://hdl.handle.net/10012/3891.
Texte intégralThere are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount.
No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data.
Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future.
« Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media ». Doctoral diss., 2018. http://hdl.handle.net/2286/R.I.49154.
Texte intégralDissertation/Thesis
Doctoral Dissertation Computer Science 2018
(6636317), Qiaofei Ye. « A SENTIMENT BASED AUTOMATIC QUESTION-ANSWERING FRAMEWORK ». Thesis, 2019.
Trouver le texte intégralNewsom, Eric Tyner. « An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences ». Thesis, 2013. http://hdl.handle.net/1805/3666.
Texte intégralThe amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed.
Pandey, Ritika. « Text Mining for Social Harm and Criminal Justice Applications ». Thesis, 2020. http://hdl.handle.net/1805/23348.
Texte intégralIncreasing rates of social harm events and plethora of text data demands the need of employing text mining techniques not only to better understand their causes but also to develop optimal prevention strategies. In this work, we study three social harm issues: crime topic models, transitions into drug addiction and homicide investigation chronologies. Topic modeling for the categorization and analysis of crime report text allows for more nuanced categories of crime compared to official UCR categorizations. This study has important implications in hotspot policing. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. We further explore the transitions into drug addiction using Reddit data. We proposed a prediction model to classify the users’ transition from casual drug discussion forum to recovery drug discussion forum and the likelihood of such transitions. Through this study we offer insights into modern drug culture and provide tools with potential applications in combating opioid crises. Lastly, we present a knowledge graph based framework for homicide investigation chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of key features that determine whether a homicide is ultimately solved. For this purpose we perform named entity recognition to determine witnesses, detectives and suspects from chronology, use keyword expansion to identify various evidence types and finally link these entities and evidence to construct a homicide investigation knowledge graph. We compare the performance over several choice of methodologies for these sub-tasks and analyze the association between network statistics of knowledge graph and homicide solvability.
Pandit, Yogesh. « Context specific text mining for annotating protein interactions with experimental evidence ». Thesis, 2014. http://hdl.handle.net/1805/3809.
Texte intégralProteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.