Dissertations / Theses: 'Natural language processing (Computer science) Artificial intelligence'

1

Li, Wenhui. "Sentiment analysis: Quantitative evaluation of subjective opinions using natural language processing." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/28000.

Full text

Abstract:

Sentiment Analysis consists of recognizing sentiment orientation towards specific subjects within natural language texts. Most research in this area focuses on classifying documents as positive or negative. The purpose of this thesis is to quantitatively evaluate subjective opinions of customer reviews using a five star rating system, which is widely used on on-line review web sites, and to try to make the predicted score as accurate as possible. Firstly, this thesis presents two methods for rating reviews: classifying reviews by supervised learning methods as multi-class classification does, or rating reviews by using association scores of sentiment terms with a set of seed words extracted from the corpus, i.e. the unsupervised learning method. We extend the feature selection approach used in Turney's PMI-IR estimation by introducing semantic relatedness measures based up on the content of WordNet. This thesis reports on experiments using the two methods mentioned above for rating reviews using the combined feature set enriched with WordNet-selected sentiment terms. The results of these experiments suggest ways in which incorporating WordNet relatedness measures into feature selection may yield improvement over classification and unsupervised learning methods which do not use it. Furthermore, via ordinal meta-classifiers, we utilize the ordering information contained in the scores of bank reviews to improve the performance, we explore the effectiveness of re-sampling for reducing the problem of skewed data, and we check whether discretization benefits the ordinal meta-learning process. Finally, we combine the unsupervised and supervised meta-learning methods to optimize performance on our sentiment prediction task.

APA, Harvard, Vancouver, ISO, and other styles

2

Jarmasz, Mario. ""Roget's Thesaurus" as a lexical resource for natural language processing." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26493.

Full text

Abstract:

This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.

APA, Harvard, Vancouver, ISO, and other styles

3

Keller, Thomas Anderson. "Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing." Thesis, University of California, San Diego, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599339.

Full text

Abstract:

Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves.

APA, Harvard, Vancouver, ISO, and other styles

4

Cotra, Aditya Kousik. "Trend Analysis on Artificial Intelligence Patents." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1617104823936441.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Oldham, Joseph Dowell. "Generating documents by means of computational registers." Lexington, Ky. : [University of Kentucky Libraries], 2000. http://lib.uky.edu/ETD/ukycosc2000d00006/oldham.pdf.

Full text

Abstract:

Thesis (Ph. D.)--University of Kentucky, 2000.
Title from document title page. Document formatted into pages; contains ix, 169 p. : ill. Includes abstract. Includes bibliographical references (p. 160-167).

APA, Harvard, Vancouver, ISO, and other styles

6

Augustsson, Christopher. "Multipurpose Case-Based Reasoning System, Using Natural Language Processing." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-104890.

Full text

Abstract:

Working as a field technician of any sort can many times be a challenging task. Often you find yourself alone, with a machine you have limited knowledge about, and the only support you have are the user manuals. As a result, it is not uncommon for companies to aid the technicians with a knowledge base that often revolves around some share point. But, unfortunately, the share points quickly get cluttered with too much information that leaves the user overwhelmed. Case-based reasoning (CBR), a form of problem-solving technology, uses previous cases to help users solve new problems they encounter, which could benefit the field technician. But for a CBR system to work with a wide variety of machines, the system must have a dynamic nature and handle multiple data types. By developing a prototype focusing on case retrieval, based on .Net core and MySql, this report sets the foundation for a highly dynamic CBR system that uses natural language processing to map case attributes during case retrieval. In addition, using datasets from UCI and Kaggle, the system's accuracy is validated, and by using a dataset created explicitly for this report, the system manifest to be robust.

APA, Harvard, Vancouver, ISO, and other styles

7

Meyer, Christopher Henry. "On improving natural language processing through phrase-based and one-to-one syntactic algorithms." Thesis, Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/1096.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Crocker, Matthew Walter. "A principle-based system for natural language analysis and translation." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27863.

Full text

Abstract:

Traditional views of grammatical theory hold that languages are characterised by sets of constructions. This approach entails the enumeration of all possible constructions for each language being described. Current theories of transformational generative grammar have established an alternative position. Specifically, Chomsky's Government-Binding theory proposes a system of principles which are common to human language. Such a theory is referred to as a "Universal Grammar"(UG). Associated with the principles of grammar are parameters of variation which account for the diversity of human languages. The grammar for a particular language is known as a "Core Grammar", and is characterised by an appropriately parametrised instance of UG. Despite these advances in linguistic theory, construction-based approaches have remained the status quo within the field of natural language processing. This thesis investigates the possibility of developing a principle-based system which reflects the modular nature of the linguistic theory. That is, rather than stipulating the possible constructions of a language, a system is developed which uses the principles of grammar and language specific parameters to parse language. Specifically, a system-is presented which performs syntactic analysis and translation for a subset of English and German. The cross-linguistic nature of the theory is reflected by the system which can be considered a procedural model of UG.
Science, Faculty of
Computer Science, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

9

Goh, Ong Sing. "A framework and evaluation of conversation agents." Thesis, Goh, Ong Sing (2008) A framework and evaluation of conversation agents. PhD thesis, Murdoch University, 2008. https://researchrepository.murdoch.edu.au/id/eprint/752/.

Full text

Abstract:

This project details the development of a novel and practical framework for the development of conversation agents (CAs), or conversation robots. CAs, are software programs which can be used to provide a natural interface between human and computers. In this study, ‘conversation’ refers to real-time dialogue exchange between human and machine which may range from web chatting to “on-the-go” conversation through mobile devices. In essence, the project proposes a “smart and effective” communication technology where an autonomous agent is able to carry out simulated human conversation via multiple channels. The CA developed in this project is termed “Artificial Intelligence Natural-language Identity” (AINI) and AINI is used to illustrate the implementation and testing carried out in this project. Up to now, most CAs have been developed with a short term objective to serve as tools to convince users that they are talking with real humans as in the case of the Turing Test. The traditional designs have mainly relied on ad-hoc approach and hand-crafted domain knowledge. Such approaches make it difficult for a fully integrated system to be developed and modified for other domain applications and tasks. The proposed framework in this thesis addresses such limitations. Overcoming the weaknesses of previous systems have been the key challenges in this study. The research in this study has provided a better understanding of the system requirements and the development of a systematic approach for the construction of intelligent CAs based on agent architecture using a modular N-tiered approach. This study demonstrates an effective implementation and exploration of the new paradigm of Computer Mediated Conversation (CMC) through CAs. The most significant aspect of the proposed framework is its ability to re-use and encapsulate expertise such as domain knowledge, natural language query and human-computer interface through plug-in components. As a result, the developer does not need to change the framework implementation for different applications. This proposed system provides interoperability among heterogeneous systems and it has the flexibility to be adapted for other languages, interface designs and domain applications. A modular design of knowledge representation facilitates the creation of the CA knowledge bases. This enables easier integration of open-domain and domain-specific knowledge with the ability to provide answers for broader queries. In order to build the knowledge base for the CAs, this study has also proposed a mechanism to gather information from commonsense collaborative knowledge and online web documents. The proposed Automated Knowledge Extraction Agent (AKEA) has been used for the extraction of unstructured knowledge from the Web. On the other hand, it is also realised that it is important to establish the trustworthiness of the sources of information. This thesis introduces a Web Knowledge Trust Model (WKTM) to establish the trustworthiness of the sources. In order to assess the proposed framework, relevant tools and application modules have been developed and an evaluation of their effectiveness has been carried out to validate the performance and accuracy of the system. Both laboratory and public experiments with online users in real-time have been carried out. The results have shown that the proposed system is effective. In addition, it has been demonstrated that the CA could be implemented on the Web, mobile services and Instant Messaging (IM). In the real-time human-machine conversation experiment, it was shown that AINI is able to carry out conversations with human users by providing spontaneous interaction in an unconstrained setting. The study observed that AINI and humans share common properties in linguistic features and paralinguistic cues. These human-computer interactions have been analysed and contributed to the understanding of how the users interact with CAs. Such knowledge is also useful for the development of conversation systems utilising the commonalities found in these interactions. While AINI is found having difficulties in responding to some forms of paralinguistic cues, this could lead to research directions for further work to improve the CA performance in the future.

APA, Harvard, Vancouver, ISO, and other styles

10

Goh, Ong Sing. "A framework and evaluation of conversation agents." Goh, Ong Sing (2008) A framework and evaluation of conversation agents. PhD thesis, Murdoch University, 2008. http://researchrepository.murdoch.edu.au/752/.

Full text

Abstract:

This project details the development of a novel and practical framework for the development of conversation agents (CAs), or conversation robots. CAs, are software programs which can be used to provide a natural interface between human and computers. In this study, ‘conversation’ refers to real-time dialogue exchange between human and machine which may range from web chatting to “on-the-go” conversation through mobile devices. In essence, the project proposes a “smart and effective” communication technology where an autonomous agent is able to carry out simulated human conversation via multiple channels. The CA developed in this project is termed “Artificial Intelligence Natural-language Identity” (AINI) and AINI is used to illustrate the implementation and testing carried out in this project. Up to now, most CAs have been developed with a short term objective to serve as tools to convince users that they are talking with real humans as in the case of the Turing Test. The traditional designs have mainly relied on ad-hoc approach and hand-crafted domain knowledge. Such approaches make it difficult for a fully integrated system to be developed and modified for other domain applications and tasks. The proposed framework in this thesis addresses such limitations. Overcoming the weaknesses of previous systems have been the key challenges in this study. The research in this study has provided a better understanding of the system requirements and the development of a systematic approach for the construction of intelligent CAs based on agent architecture using a modular N-tiered approach. This study demonstrates an effective implementation and exploration of the new paradigm of Computer Mediated Conversation (CMC) through CAs. The most significant aspect of the proposed framework is its ability to re-use and encapsulate expertise such as domain knowledge, natural language query and human-computer interface through plug-in components. As a result, the developer does not need to change the framework implementation for different applications. This proposed system provides interoperability among heterogeneous systems and it has the flexibility to be adapted for other languages, interface designs and domain applications. A modular design of knowledge representation facilitates the creation of the CA knowledge bases. This enables easier integration of open-domain and domain-specific knowledge with the ability to provide answers for broader queries. In order to build the knowledge base for the CAs, this study has also proposed a mechanism to gather information from commonsense collaborative knowledge and online web documents. The proposed Automated Knowledge Extraction Agent (AKEA) has been used for the extraction of unstructured knowledge from the Web. On the other hand, it is also realised that it is important to establish the trustworthiness of the sources of information. This thesis introduces a Web Knowledge Trust Model (WKTM) to establish the trustworthiness of the sources. In order to assess the proposed framework, relevant tools and application modules have been developed and an evaluation of their effectiveness has been carried out to validate the performance and accuracy of the system. Both laboratory and public experiments with online users in real-time have been carried out. The results have shown that the proposed system is effective. In addition, it has been demonstrated that the CA could be implemented on the Web, mobile services and Instant Messaging (IM). In the real-time human-machine conversation experiment, it was shown that AINI is able to carry out conversations with human users by providing spontaneous interaction in an unconstrained setting. The study observed that AINI and humans share common properties in linguistic features and paralinguistic cues. These human-computer interactions have been analysed and contributed to the understanding of how the users interact with CAs. Such knowledge is also useful for the development of conversation systems utilising the commonalities found in these interactions. While AINI is found having difficulties in responding to some forms of paralinguistic cues, this could lead to research directions for further work to improve the CA performance in the future.

APA, Harvard, Vancouver, ISO, and other styles

11

Kauppi, Ilkka. "Intermediate language for mobile robots : a link between the high-level planner and low-level services in robots /." Espoo [Finland] : VTT Technical Research Centre of Finland, 2003. http://www.vtt.fi/inf/pdf/publications/2003/P510.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Grefenstette, Edward Thomas. "Category-theoretic quantitative compositional distributional models of natural language semantics." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:d7f9433b-24c0-4fb5-925b-d8b3744b7012.

Full text

Abstract:

This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce distributional representations for larger units of text (such as a verb and its arguments) by composing the distributional representations of smaller units of text (such as individual words). This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research.

APA, Harvard, Vancouver, ISO, and other styles

13

Mills, Michael Thomas. "Natural Language Document and Event Association Using Stochastic Petri Net Modeling." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1369408524.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Tran, Anh Xuan. "Identifying latent attributes from video scenes using knowledge acquired from large collections of text documents." Thesis, The University of Arizona, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3634275.

Full text

Abstract:

Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that “the most important thing in communication is hearing what isn't said.” It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter.

In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.

APA, Harvard, Vancouver, ISO, and other styles

15

Aljadri, Sinan. "Chatbot : A qualitative study of users' experience of Chatbots." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105434.

Full text

Abstract:

The aim of the present study has been to examine users' experience of Chatbot from a business perspective and a consumer perspective. The study has also focused on highlighting what limitations a Chatbot can have and possible improvements for future development. The study is based on a qualitative research method with semi-structured interviews that have been analyzed on the basis of a thematic analysis. The results of the interview material have been analyzed based on previous research and various theoretical perspectives such as Artificial Intelligence (AI), Natural Language Processing (NLP). The results of the study have shown that the experience of Chatbot can differ between businesses that offer Chatbot, which are more positive and consumers who use it as customer service. Limitations and suggestions for improvements around Chatbotar are also a consistent result of the study.
Den föreliggande studie har haft som syfte att undersöka användarnas upplevelse av Chatbot utifrån verksamhetsperspektiv och konsumentperspektiv. Studien har också fokuserat på att lyfta fram vilka begränsningar en Chatbot kan ha och eventuella förbättringar för framtida utvecklingen. Studien är baserad på en kvalitativ forskningsmetod med semistrukturerade intervjuer som har analyserats utifrån en tematisk analys. Resultatet av intervjumaterialet har analyserat utifrån tidigare forskning och olika teoretiska perspektiv som Artificial Intelligence (AI), Natural Language Processing (NLP). Resultatet av studien har visat att upplevelsen av Chatbot kan skilja sig mellan verksamheter som erbjuder Chatbot, som är mer positiva och konsumenter som använder det som kundtjänst. Begränsningar och förslag på förbättringar kring Chatbotar är också ett genomgående resultat i studien.

APA, Harvard, Vancouver, ISO, and other styles

16

Santos, Denis Neves de Arruda. "Resolução de anafora pronominal em portugues utilizando o algoritmo de Hobbs." [s.n.], 2008. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276062.

Full text

Abstract:

Orientador: Ariadne Maria Brito Rizzoni Carvalho
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-11T18:00:17Z (GMT). No. of bitstreams: 1 Santos_DenisNevesdeArruda_M.pdf: 1378385 bytes, checksum: 10cb49b058677a79380f46221351fb8a (MD5) Previous issue date: 2008
Resumo: Anáfora é uma referência abreviada a uma entidade, esperando que o receptor do discurso possa compreender a referência. A automatização da resolução de anáforas pode melhorar o desempenho de vários sistemas de processamento de língua natural, como tradutores, geradores e sumarizadores. A dificuldade no processo de resolução acontece nos casos em que existe mais de um referente possível. Pesquisas sobre a resolução de anáforas na língua portuguesa ainda são escassas, quando comparadas com as pesquisas para outras línguas, como por exemplo, o inglês. Este trabalho descreve uma adaptação para o português do algoritmo sintático proposto por Hobbs para resolução de anáfora pronominal. A avaliação foi feita comparando os resultados com os obtidos por outro algoritmo sintático para resolução de pronomes, o algoritmo de Lappin e Leass. Os mesmos corpora foram utilizados e uma melhora significativa foi obtida com o algoritmo de Hobbs.
Abstract: Anaphora is an abreviated reference to an entity expecting the receiver of the discourse can understand the reference. Automatic pronoun resolution may improve the performance of natural language systems, such as translators, generators and summarizers. Difficulties may arise when there is more than one potential candidate for a referent. There has been little research on pronoun resolution for Portuguese, if compared to other languages, such as English. This paper describes a variant of Hobbs' syntactic algorithm for pronoun resolution in Portuguese. The system was evaluated comparing the results with the ones obtained with another syntactic algorithm for pronoun resolution handling, the Lappin and Leass' algorithm. The same Portuguese corpora were used and significant improvement was verified with Hobbs' algorithm.
Mestrado
Processamento de Linguas Naturais
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

17

Roman, Norton Trevisan. "Emoção e a sumarização automatica de dialogos." [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276233.

Full text

Abstract:

Orientadores: Ariadne Maria Brito Rizzoni Carvalho, Paul Piwek
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-08T21:38:00Z (GMT). No. of bitstreams: 1 Roman_NortonTrevisan_D.pdf: 3357998 bytes, checksum: 3ae61241e75f8f93a517ecbc678e1caf (MD5) Previous issue date: 2007
Resumo: Esta tese apresenta várias contribuições ao campo da sumarização automática de diálogos. Ela fornece evidências em favor da hipótese de que toda vez que um diálogo apresentar um comportamento muito impolido, por um ou mais de seus interlocutores, este comportamento tenderá a ser descrito em seu resumo. Além disso, os resultados experimentais mostraram também que o relato deste comportamento é feito de modo a apresentar um forte viés, determinado pelo ponto de vista do sumarizador. Este resultado não foi afetado por restrições no tamanho do resumo. Além disso, os experimentos forneceram informações bastante úteis com relação a quando e como julgamentos de emoção e comportamento devem ser adicionados ao resumo. Para executar os experimentos, um esquema de anotação multi-dimensional e categórico foi desenvolvido, podendo ser de grande ajuda a outros pesquisadores que precisem classificar dados de maneira semelhante. Os resultados dos estudos empíricos foram usados para construir um sistema automático de sumarização de diálogos, de modo a testar sua aplicabilidade computacional. A saída do sistema consiste de resumos nos quais a informação técnica e emocional, como julgamentos do comportamento dos participantes do diálogos, são combinadas de modo a refletir o viés do sumarizador, sendo o ponto de vista definido pelo usuário
Abstract: This thesis presents a number of contributions to the field of automatic dialogue summarisation. It provides evidence for the hypothesis that whenever a dialogue features very impolite behaviour by one or more of its interlocutors, this behaviour will tend to be described in the dialogue¿s summary. Moreover, further experimental results showed that this behaviour is reported with a strong bias determined by the point of view of the summariser. This result was not affected by constraints on the summary length. The experiments provided useful information on when and how assessments of emotion and behaviour should be added to a dialogue summary. To conduct the experiments, a categorical multi-dimensional annotation scheme was developed which may also be helpful to other researchers who need to annotate data in a similar way. The results from the empirical studies were used to build an automatic dialogue summarisation system, in order to test their computational applicability. The system¿s output consists of summaries in which technical and emotional information, such as assessments of the dialogue participants¿ behaviour, are combined in a way that reflects the bias of the summariser, being the point of view defined by the user
Doutorado
Doutor em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

18

Li, Jie. "Intention-driven textual semantic analysis." School of Computer Science and Software Engineering, 2008. http://ro.uow.edu.au/theses/104.

Full text

Abstract:

The explosion of World Wide Web has brought endless amount of information within our reach. In order to take advantage of this phenomenon, text search becomes a major contemporary research challenge. Due to the nature of the Web, assisting users to find desired information is still a challenging task. In this thesis, we investigate semantic anlaysis techniques which can facilitate the search process at semantic level. We also study the problem that short queries are less informative and difficult to convey the user's intention into the search service system. We propose a generalized framework to address these issues. We conduct a case study of movie plot search in which a semantic analyzer seamlessly works with a user's intention detector. Our experimental results show the importance and effectiveness of intention detection and semantic analysis techniques.

APA, Harvard, Vancouver, ISO, and other styles

19

Martin, Olga J. "Retranslation a problem in computing with perceptions /." Diss., Online access via UMI:, 2008.

Find full text

Abstract:

Thesis (Ph. D.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Systems Science and Industrial Engineering, 2008.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

20

Newman-Griffis, Denis R. "Capturing Domain Semantics with Representation Learning: Applications to Health and Function." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587658607378958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Tabassum, Binte Jafar Jeniya. "Information Extraction From User Generated Noisy Texts." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1606315356821532.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Moncecchi, Guillermo. "Détection du langage spéculatif dans la littérature scientifique." Phd thesis, Université de Nanterre - Paris X, 2013. http://tel.archives-ouvertes.fr/tel-00800552.

Full text

Abstract:

Ce travail de thèse propose une méthodologie visant la résolution de certains problèmes de classification, notamment ceux concernant la classification séquentielle en tâches de Traitement Automatique des Langues. Afin d'améliorer les résultats de la tâche de classification, nous proposons l'utilisation d'une approche itérative basée sur l'erreur, qui intègre, dans le processus d'apprentissage, des connaissances d'un expert représentées sous la forme de "règles de connaissance". Nous avons appliqué la méthodologie à deux tâches liées à la détection de la spéculation ("hedging") dans la littérature scientifique: la détection de segments textuels spéculatifs ("hedge cue identification") et la détection de la couverture de ces segments ("hedge cue scope detection"). Les résultats son prometteurs: pour la première tâche, nous avons amélioré le F-score de la baseline de 2,5 points en intégrant des données sur la co-occurrence de segments spéculatifs. Concernant la deuxième tâche, l'intégration d'information syntaxique et des règles pour l'élagage syntaxique ont permis d'améliorer les résultats de la classification de 0,712 à 0,835 (F-score). Par rapport aux méthodes de l'état de l'art, les résultats sont très bons et ils suggèrent que l'approche consistant à améliorer les classifieurs basées seulement sur des erreurs commises dans un corpus, peut être également appliquée à d'autres tâches similaires. Qui plus est, ce travail de thèse propose un schéma de classes permettant de représenter l'analyse d'une phrase dans une structure unique qui intègre les résultats de différentes analyses linguistiques. Cela permet de mieux gérer le processus itératif d'amélioration du classifieur, dans lequel différents ensembles d'attributs d'apprentissage sont utilisés à chaque itération. Nous proposons également de stocker les attributs dans un modèle relationnel au lieu des structures textuelles classiques, afin de faciliter l'analyse et la manipulation des données apprises.

APA, Harvard, Vancouver, ISO, and other styles

23

Tallo, Philip T. "Using Sentence Embeddings for Word Sense Induction." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1613748873435158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Elvir, Miguel. "EPISODIC MEMORY MODEL FOR EMBODIED CONVERSATIONAL AGENTS." Master's thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3000.

Full text

Abstract:

Embodied Conversational Agents (ECA) form part of a range of virtual characters whose intended purpose include engaging in natural conversations with human users. While works in literature are ripe with descriptions of attempts at producing viable ECA architectures, few authors have addressed the role of episodic memory models in conversational agents. This form of memory, which provides a sense of autobiographic record-keeping in humans, has only recently been peripherally integrated into dialog management tools for ECAs. In our work, we propose to take a closer look at the shared characteristics of episodic memory models in recent examples from the field. Additionally, we propose several enhancements to these existing models through a unified episodic memory model for ECAÂ s. As part of our research into episodic memory models, we present a process for determining the prevalent contexts in the conversations obtained from the aforementioned interactions. The process presented demonstrates the use of statistical and machine learning services, as well as Natural Language Processing techniques to extract relevant snippets from conversations. Finally, mechanisms to store, retrieve, and recall episodes from previous conversations are discussed. A primary contribution of this research is in the context of contemporary memory models for conversational agents and cognitive architectures. To the best of our knowledge, this is the first attempt at providing a comparative summary of existing works. As implementations of ECAs become more complex and encompass more realistic conversation engines, we expect that episodic memory models will continue to evolve and further enhance the naturalness of conversations.
M.S.Cp.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering MSCpE

APA, Harvard, Vancouver, ISO, and other styles

25

Cooper, Wyatt. "Discovering Hidden Networks Using Topic Modeling." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/cmc_theses/1659.

Full text

Abstract:

This paper explores topic modeling via unsupervised non-negative matrix factorization. This technique is used on a variety of sources in order to extract salient topics. From these topics, hidden entity networks are discovered and visualized in a graph representation. In addition, other visualization techniques such as examining the time series of a topic and examining the top words of a topic are used for evaluation and analysis. There is a large software component to this project, and so this paper will also focus on the design decisions that were made in order to make the program developed as versatile and extensible as possible.

APA, Harvard, Vancouver, ISO, and other styles

26

Eriksson, Patrik, and Philip Wester. "Granskning av examensarbetesrapporter med IBM Watson molntjänster." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232057.

Full text

Abstract:

Cloud services are one of the fast expanding fields of today. Companies such as Amazon, Google, Microsoft and IBM offer these cloud services in various forms. As this field progresses, the natural question occurs ”What can you do with the technology today?”. The technology offers scalability for hardware usage and user demands, that is attractive to developers and companies. This thesis tries to examine the applicability of cloud services, by combining it with the question: ”Is it possible to make an automated thesis examiner?” By narrowing down the services to IBM Watson web services, this thesis main question reads ”Is it possible to make an automated thesis examiner using IBM Watson?”. Hence the goal of this thesis was to create an automated thesis examiner. The project used a modified version of Bunge’s technological research method. Where amongst the first steps, a definition of an software thesis examiner for student theses was created. Then an empirical study of the Watson services, that seemed relevant from the literature study, proceeded. These empirical studies allowed a deeper understanding about the services’ practices and boundaries. From these implications and the definition of a software thesis examiner for student theses, an idea of how to build and implement an automated thesis examiner was created. Most of IBM Watson’s services were thoroughly evaluated, except for the service Machine Learning, that should have been studied further if the time resources would not have been depleted. This project found the Watson web services useful in many cases but did not find a service that was well suited for thesis examination. Although the goal was not reached, this thesis researched the Watson web services and can be used to improve understanding of its applicability, and for future implementations that face the provided definition.
Molntjänster är ett av de områden som utvecklas snabbast idag. Företag såsom Amazon, Google, Microsoft och IBM tillhandahåller dessa tjänster i flera former. Allteftersom utvecklingen tar fart, uppstår den naturliga frågan ”Vad kan man göra med den här tekniken idag?”. Tekniken erbjuder en skalbarhet mot använd hårdvara och antalet användare, som är attraktiv för utvecklare och företag. Det här examensarbetet försöker svara på hur molntjänster kan användas genom att kombinera det med frågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare?”. Genom att avgränsa undersökningen till IBM Watson molntjänster försöker arbetet huvudsakligen svara på huvudfrågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare med Watson molntjänster?”. Därmed var målet med arbetet att skapa en automatiserad examensarbetesrapportsgranskare. Projektet följde en modifierad version av Bunge’s teknologiska undersökningsmetod, där det första steget var att skapa en definition för en mjukvaruexamensarbetesrapportsgranskare följt av en utredning av de Watson molntjänster som ansågs relevanta från litteratur studien. Dessa undersöktes sedan vidare i empirisk studie. Genom de empiriska studierna skapades förståelse för tjänsternas tillämpligheter och begränsningar, för att kunna kartlägga hur de kan användas i en automatiserad examensarbetsrapportsgranskare. De flesta tjänster behandlades grundligt, förutom Machine Learning, som skulle behövt vidare undersökning om inte tidsresurserna tog slut. Projektet visar på att Watson molntjänster är användbara men inte perfekt anpassade för att granska examensarbetesrapporter. Även om inte målet uppnåddes, undersöktes Watson molntjänster, vilket kan ge förståelse för deras användbarhet och framtida implementationer för att möta den skapade definitionen.

APA, Harvard, Vancouver, ISO, and other styles

27

Khan, Sifat Shahriar. "Power Outage Management using Social Sensing." University of Akron / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=akron1556833736835808.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Ogletree, Xavian Alexander. "Human-AI Teaming for Dynamic Interpersonal Skill Training." Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1621853024907269.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Brock, Walter A. "Alternative Approaches to Correction of Malapropisms in AIML Based Conversational Agents." NSUWorks, 2014. http://nsuworks.nova.edu/gscis_etd/20.

Full text

Abstract:

The use of Conversational Agents (CAs) utilizing Artificial Intelligence Markup Language (AIML) has been studied in a number of disciplines. Previous research has shown a great deal of promise. It has also documented significant limitations in the abilities of these CAs. Many of these limitations are related specifically to the method employed by AIML to resolve ambiguities in the meaning and context of words. While methods exist to detect and correct common errors in spelling and grammar of sentences and queries submitted by a user, one class of input error that is particularly difficult to detect and correct is the malapropism. In this research a malapropism is defined a "verbal blunder in which one word is replaced by another similar in sound but different in meaning" ("malapropism," 2013). This research explored the use of alternative methods of correcting malapropisms in sentences input to AIML CAs using measures of Semantic Distance and tri-gram probabilities. Results of these alternate methods were compared against AIML CAs using only the Symbolic Reductions built into AIML. This research found that the use of the two methodologies studied here did indeed lead to a small, but measurable improvement in the performance of the CA in terms of the appropriateness of its responses as classified by human judges. However, it was also noted that in a large number of cases, the CA simply ignored the existence of a malapropism altogether in formulating its responses. In most of these cases, the interpretation and response to the user's input was of such a general nature that one might question the overall efficacy of the AIML engine. The answer to this question is a matter for further study.

APA, Harvard, Vancouver, ISO, and other styles

30

Bevans, Brandon. "Categorizing Blog Spam." DigitalCommons@CalPoly, 2016. https://digitalcommons.calpoly.edu/theses/1623.

Full text

Abstract:

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet. Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to develop more sophisticated detection tools to curb the harmful effects that come with spam. This virtual arms race has no end in sight. Most efforts thus far have been toward accurately detecting spam from ham, and rightfully so since initial detection is essential. However, research is lacking in understanding the current ecosystem of spam, spam campaigns, and the behavior of the botnets that drive the majority of spam traffic. This thesis focuses on characterizing spam, particularly the spam that appears in forums, where the spam is delivered by bots posing as legitimate users. Forum spam is used primarily to push advertisements or to boost other websites’ perceived popularity by including HTTP links in the content of the post. We conduct an experiment to collect a sample of the blog posts and network activity of the spambots that exist in the internet. We then present a corpora available to conduct analysis on and proceed with our own analysis. We cluster associated groups of users and IP addresses into entities, which we accept as a model of the underlying botnets that interact with our honeypots. We use Natural Language Processing (NLP) and Machine Learning (ML) to determine that creating semantic-based models of botnets are sufficient for distinguishing them from one another. We also find that the syntactic structure of posts has little variation from botnet to botnet. Finally we confirm that to a large degree botnet behavior and content hold across different domains.

APA, Harvard, Vancouver, ISO, and other styles

31

Gunaratna, Kalpa. "Semantics-based Summarization of Entities in Knowledge Graphs." Wright State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=wright1496124815009777.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Bartl, Eduard. "Mathematical foundations of graded knowledge spaces." Diss., Online access via UMI:, 2009.

Find full text

Abstract:

Thesis (Ph. D.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Systems Science and Industrial Engineering, 2009.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

33

Lilja, Adam, and Max Kihlborg. "Important criteria when choosing a conversational AI platform for enterprises." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280896.

Full text

Abstract:

This paper evaluates and analyzes three conversational AI-platforms; Dialogflow (Google), Watson Assistant (IBM) and Teneo (Artificial Solutions) on how they perform based on a set of criteria; pricing model, ease-of-use, efficiency, experience working in the software and what results to expect from each platform. The main focus was to investigate the platforms in order to acquire an understanding of which platform would best be suited for enterprises. The platforms were compared by performing a variety of tasks aiming to answer these questions. The technical research was combined with an analysis of each company’s pricing model and strategy to get an understanding of how they target their products on the market. This study concludes that different softwares may be suitable for different settings depending on the size of an enterprise and the demand for complex solutions. Overall, Teneo outperformed its competitors in these tests and seems to be the most scalable solution with the ability to create both simple and complicated solutions. It was more demanding to get started in comparison with the other platforms, but became more efficient as time progressed. Some findings include that Dialogflow and Watson Assistant lacked capabilities when faced with complex and complicated tasks. From a pricing strategy point of view, the companies are similar in their approach but Artificial Solutions and IBM has more flexible methods while Google has a fixed pricing strategy. Combining the pricing strategy and technical analysis this implicates that Teneo would be a better choice for larger enterprises while Watson Assistant and Dialogflow may be more suitable for smaller ones.
Det här arbetet evaluerar och analyserar tre konversationella AI-plattformar; Dialogflow (Google), Watson Assistant (IBM) och Teneo (Artificial Solutions) utifrån hur de presterar baserat på ett antal kriterier; prismodell, enkel användning, effektivitet, upplevelse att arbeta i programvaran och vilka resultat man förväntar sig från varje plattform. Huvudsakligt fokus var att undersöka plattformarna för att få en uppfattning om vilken plattform som skulle passa bäst för företag. Plattformarna jämfördes genom att utföra en mängd olika uppgifter som syftade till att besvara dessa frågor. Den tekniska forskningen kombinerades med en analys av varje företags prismodell och prisstrategi för att få en uppfattning av hur de riktar sina produkter på marknaden. Denna studie drar slutsatsen att olika programvaror kan vara lämpliga för olika sammanhang beroende på ett företags storlek och dess efterfrågan på komplexa lösningar. Sammantaget överträffade Teneo sina konkurrenter i dessa tester och verkar vara den mest skalbara lösningen med förmågan att skapa både enkla och komplicerade lösningar. Det var mer krävande att komma igång i jämförelse med de andra plattformarna, men det blev mer effektivt med tiden. Vissa fynd inkluderar att Dialogflow och Watson Assistant saknade kapacitet när de mötte komplexa och komplicerade uppgifter. Från en prissättningsstrategisk synvinkel är företagen liknande i sin metod men Artificial Solutions och IBM har mer flexibla metoder medan Google har en fast prissättningstrategi. Genom att kombinera prisstrategi och teknisk analys innebär detta att Teneo skulle vara ett bättre val för större företag medan Watson Assistant och Dialogflow kan vara mer lämpade för mindre.

APA, Harvard, Vancouver, ISO, and other styles

34

Dhyani, Dushyanta Dhyani. "Boosting Supervised Neural Relation Extraction with Distant Supervision." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524095334803486.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Rios, Anthony. "Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records." UKnowledge, 2018. https://uknowledge.uky.edu/cs_etds/71.

Full text

Abstract:

Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders. The main difficulty with developing automated EMR coding methods is the nature of the label space. The standardized vocabularies used for medical coding contain over 10 thousand codes. The label space is large, and the label distribution is extremely unbalanced - most codes occur very infrequently, with a few codes occurring several orders of magnitude more than others. A few codes never occur in training dataset at all. In this work, we present three methods to handle the large unbalanced label space. First, we study how to augment EMR training data with biomedical data (research articles indexed on PubMed) to improve the performance of standard neural networks for text classification. PubMed indexes more than 23 million citations. Many of the indexed articles contain relevant information about diagnosis and procedure codes. Therefore, we present a novel method of incorporating this unstructured data in PubMed using transfer learning. Second, we combine ideas from metric learning with recent advances in neural networks to form a novel neural architecture that better handles infrequent codes. And third, we present new methods to predict codes that have never appeared in the training dataset. Overall, our contributions constitute advances in neural multi-label text classification with potential consequences for improving EMR coding.

APA, Harvard, Vancouver, ISO, and other styles

36

Wijeratne, Sanjaya. "A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation of Emoji using EmojiNet." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1547506375922938.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Hughes, Cameron A. "Epistemic Structures of Interrogative Domains." Youngstown State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1227285777.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Shankar, Arunprasath. "ONTOLOGY-DRIVEN SEMI-SUPERVISED MODEL FOR CONCEPTUAL ANALYSIS OF DESIGN SPECIFICATIONS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1401706747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Parde, Natalie. "Reading with Robots: A Platform to Promote Cognitive Exercise through Identification and Discussion of Creative Metaphor in Books." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1248384/.

Full text

Abstract:

Maintaining cognitive health is often a pressing concern for aging adults, and given the world's shifting age demographics, it is impractical to assume that older adults will be able to rely on individualized human support for doing so. Recently, interest has turned toward technology as an alternative. Companion robots offer an attractive vehicle for facilitating cognitive exercise, but the language technologies guiding their interactions are still nascent; in elder-focused human-robot systems proposed to date, interactions have been limited to motion or buttons and canned speech. The incapacity of these systems to autonomously participate in conversational discourse limits their ability to engage users at a cognitively meaningful level. I addressed this limitation by developing a platform for human-robot book discussions, designed to promote cognitive exercise by encouraging users to consider the authors' underlying intentions in employing creative metaphors. The choice of book discussions as the backdrop for these conversations has an empirical basis in neuro- and social science research that has found that reading often, even in late adulthood, has been correlated with a decreased likelihood to exhibit symptoms of cognitive decline. The more targeted focus on novel metaphors within those conversations stems from prior work showing that processing novel metaphors is a cognitively challenging task, for young adults and even more so in older adults with and without dementia. A central contribution arising from the work was the creation of the first computational method for modelling metaphor novelty in word pairs. I show that the method outperforms baseline strategies as well as a standard metaphor detection approach, and additionally discover that incorporating a sentence-based classifier as a preliminary filtering step when applying the model to new books results in a better final set of scored word pairs. I trained and evaluated my methods using new, large corpora from two sources, and release those corpora to the research community. In developing the corpora, an additional contribution was the discovery that training a supervised regression model to automatically aggregate the crowdsourced annotations outperformed existing label aggregation strategies. Finally, I show that automatically-generated questions adhering to the Questioning the Author strategy are comparable to human-generated questions in terms of naturalness, sensibility, and question depth; the automatically-generated questions score slightly higher than human-generated questions in terms of clarity. I close by presenting findings from a usability evaluation in which users engaged in thirty-minute book discussions with a robot using the platform, showing that users find the platform to be likeable and engaging.

APA, Harvard, Vancouver, ISO, and other styles

40

Kartsaklis, Dimitrios. "Compositional distributional semantics with compact closed categories and Frobenius algebras." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:1f6647ef-4606-4b85-8f3b-c501818780f2.

Full text

Abstract:

The provision of compositionality in distributional models of meaning, where a word is represented as a vector of co-occurrence counts with every other word in the vocabulary, offers a solution to the fact that no text corpus, regardless of its size, is capable of providing reliable co-occurrence statistics for anything but very short text constituents. The purpose of a compositional distributional model is to provide a function that composes the vectors for the words within a sentence, in order to create a vectorial representation that re ects its meaning. Using the abstract mathematical framework of category theory, Coecke, Sadrzadeh and Clark showed that this function can directly depend on the grammatical structure of the sentence, providing an elegant mathematical counterpart of the formal semantics view. The framework is general and compositional but stays abstract to a large extent. This thesis contributes to ongoing research related to the above categorical model in three ways: Firstly, I propose a concrete instantiation of the abstract framework based on Frobenius algebras (joint work with Sadrzadeh). The theory improves shortcomings of previous proposals, extends the coverage of the language, and is supported by experimental work that improves existing results. The proposed framework describes a new class of compositional models thatfind intuitive interpretations for a number of linguistic phenomena. Secondly, I propose and evaluate in practice a new compositional methodology which explicitly deals with the different levels of lexical ambiguity (joint work with Pulman). A concrete algorithm is presented, based on the separation of vector disambiguation from composition in an explicit prior step. Extensive experimental work shows that the proposed methodology indeed results in more accurate composite representations for the framework of Coecke et al. in particular and every other class of compositional models in general. As a last contribution, I formalize the explicit treatment of lexical ambiguity in the context of the categorical framework by resorting to categorical quantum mechanics (joint work with Coecke). In the proposed extension, the concept of a distributional vector is replaced with that of a density matrix, which compactly represents a probability distribution over the potential different meanings of the specific word. Composition takes the form of quantum measurements, leading to interesting analogies between quantum physics and linguistics.

APA, Harvard, Vancouver, ISO, and other styles

41

Hughes, Tracey D. "Visualizing Epistemic Structures of Interrogative Domain Models." Youngstown State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1227294380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Eyorokon, Vahid. "Measuring Goal Similarity Using Concept, Context and Task Features." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1534084289041091.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Hughes, Jennifer G. "Misheard Me Oronyminator: Using Oronyms to Validate The Correctness of Frequency Dictionaries." DigitalCommons@CalPoly, 2013. https://digitalcommons.calpoly.edu/theses/936.

Full text

Abstract:

In the field of speech recognition, an algorithm must learn to tell the difference between "a nice rock" and "a gneiss rock". These identical-sounding phrases are called oronyms. Word frequency dictionaries are often used by speech recognition systems to help resolve phonetic sequences with more than one possible orthographic phrase interpretation, by looking up which oronym of the root phonetic sequence contains the most-common words.Our paper demonstrates a technique used to validate word frequency dictionary values. We chose to use frequency values from the UNISYN dictionary, which tallies each word on a per-occurance basis, using a proprietary text corpus, to calculate word frequency.In the first phase of our user study, we generated oronym strings for the phrase "a nice cold hour", and had over a dozen people make 62 of the most-common oronyms for that phrase. In the second phase, we selected 15 of the phase one recordings, and had 74 different people transcribe each one, for a total of 953 transcriptions overall. If the frequency dictionary values for our test phrases accurately reflected the real-world expectations of actual listeners, we would expect that the most-commonly transcribed phrases in our user study would roughly correspond with our metric for the most likely oronym interpretation of the root phrase. During the course of our study, we found that using per-occurance frequency values, like those found in the UNISYN dictionary, when computing our overall-phrase-frequency metric caused the end result to be thrown off by excessively common words, such as "the", "is", and "a" These super-common words had such high per-occurance tallies that they overpowered any effect that any regular word had on a frequency metric. When we used frequency values from the COCA dictionary, which has word frequency values tallied on a document-count basis instead of a UNISYN-like per-occurance basis, we found that this effect was mitigated. As a result, we do not recommend using the UNISYN dictionary for word frequency purposes.

APA, Harvard, Vancouver, ISO, and other styles

44

Lerjebo, Linus, and Johannes Hägglund. "Intelligent chatbot assistant: A study of Natural Language Processing and Artificial Intelligence." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-42691.

Full text

Abstract:

The development and research of Artificial Intelligence have had a recent surge in recent years, which includes the medical field. Despite the new technology and tools available, the staff is still under a heavy workload. The goal of this thesis is to analyze the possibilities of a chatbot whose purpose is to assist the medical staff and provide safety for the patients by guaranteeing that they are being monitored. With the use of technologies such as Artificial Intelligence, Natural Language Processing, and Voice Over Internet Protocol, the chatbot can communicate with the patient. It will work as an assistant for the working staff and provide the information from the calls to the medical staff. With the answers provided from the call, the staff will not be needing to ask routine questions every time and can provide help more quickly. The chatbot is administrated through a web application where administrators can initiate calls and add patients to the database.

APA, Harvard, Vancouver, ISO, and other styles

45

Thanguturi, Naren. "Automatic News Generation System based on Natural Language." University of Toledo / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1525973404437239.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Rogers, Paul Anton Peter. "The baby project : processing character patterns in textual representations of language." Thesis, Bournemouth University, 2000. http://eprints.bournemouth.ac.uk/306/.

Full text

Abstract:

This thesis describes an investigation into a proposed theory of AI. The theory postulates that a machine can be programmed to predict aspects of human behaviour by selecting and processing stored, concrete examples of previously experienced patterns of behaviour. Validity is tested in the domain of natural language. Externalisations that model the resulting theory of NLP entail fuzzy components. Fuzzy formalisms may exhibit inaccuracy and/or over productivity. A research strategy is developed, designed to investigate this aspect of the theory. The strategy includes two experimental hypotheses designed to test, 1) whether the model can process simple language interaction, and 2) the effect of fuzzy processes on such language interaction. Experimental design requires three implementations, each with progressive degrees of fuzziness in their processes. They are respectively named: Nonfuzz Babe, CorrBab and FuzzBabe. Nonfuzz Babe is used to test the first hypothesis and all three implementations are used to test the second hypothesis. A system description is presented for Nonfuzz Babe. Testing the first hypothesis provides results that show NonfuzzBabe is able to process simple language interaction. A system description for CorrBabe and FuzzBabe is presented. Testing the second hypothesis, provides results that show a positive correlation between degree of fuzzy processes and improved simple language performance. FuzzBabe's ability to process more complex language interaction is then investigated and model-intrinsic limitations are found. Research to overcome this problem is designed to illustrate the potential of externalisation of the theory and is conducted less rigorously than previous part of this investigation. Augmenting FuzzBabe to include fuzzy evaluation of non-pattern elements of interaction is hypothesised as a possible solution. The term FuzzyBaby was coined for augmented implementation. Results of a pilot study designed to measure FuzzyBaby's reading comprehension are given. Little research has been conducted that investigates NLP by the fuzzy processing of concrete patterns in language. Consequently, it is proposed that this research contributes to the intellectual disciplines of NLP and AI in general.

APA, Harvard, Vancouver, ISO, and other styles

47

Das, Dipanjan. "Semi-Supervised and Latent-Variable Models of Natural Language Semantics." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/342.

Full text

Abstract:

This thesis focuses on robust analysis of natural language semantics. A primary bottleneck for semantic processing of text lies in the scarcity of high-quality and large amounts of annotated data that provide complete information about the semantic structure of natural language expressions. In this dissertation, we study statistical models tailored to solve problems in computational semantics, with a focus on modeling structure that is not visible in annotated text data. We first investigate supervised methods for modeling two kinds of semantic phenomena in language. First, we focus on the problem of paraphrase identification, which attempts to recognize whether two sentences convey the same meaning. Second, we concentrate on shallow semantic parsing, adopting the theory of frame semantics (Fillmore, 1982). Frame semantics offers deep linguistic analysis that exploits the use of lexical semantic properties and relationships among semantic frames and roles. Unfortunately, the datasets used to train our paraphrase and frame-semantic parsing models are too small to lead to robust performance. Therefore, a common trait in our methods is the hypothesis of hidden structure in the data. To this end, we employ conditional log-linear models over structures, that are firstly capable of incorporating a wide variety of features gathered from the data as well as various lexica, and secondly use latent variables to model missing information in annotated data. Our approaches towards solving these two problems achieve state-of-the-art accuracy on standard corpora. For the frame-semantic parsing problem, we present fast inference techniques for jointly modeling the semantic roles of a given predicate. We experiment with linear program formulations, and use a commercial solver as well as an exact dual decomposition technique that breaks the role labeling problem into several overlapping components. Continuing with the theme of hypothesizing hidden structure in data for modeling natural language semantics, we present methods to leverage large volumes of unlabeled data to improve upon the shallow semantic parsing task. We work within the framework of graph-based semi-supervised learning, a powerful method that associates similar natural language types, and helps propagate supervised annotations to unlabeled data. We use this framework to improve frame-semantic parsing performance on unknown predicates that are absent in annotated data. We also present a family of novel objective functions for graph-based learning that result in sparse probability measures over graph vertices, a desirable property for natural language types. Not only are these objectives easier to numerically optimize, but also they result in smoothed distributions over predicates that are smaller in size. The experiments presented in this dissertation empirically demonstrates that missing information in text corpora contain considerable semantic information that can be incorporated into structured models for semantics, to significant benefit over the current state of the art. The methods in this thesis were originally presented by Das and Smith (2009, 2011, 2012), and Das et al. (2010, 2012). The thesis gives a more thorough exposition, relating and comparing the methods, and also presents several extensions of the aforementioned papers.

APA, Harvard, Vancouver, ISO, and other styles

48

Mahendru, Aroma. "Role of Premises in Visual Question Answering." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78030.

Full text

Abstract:

In this work, we make a simple but important observation questions about images often contain premises -- objects and relationships implied by the question -- and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel irrelevant question detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

49

Taylor, Julia Michelle. "Towards Informal Computer Human Communication: Detecting Humor in a Restricted Domain." Cincinnati, Ohio : University of Cincinnati, 2008. http://rave.ohiolink.edu/etdc/view.cgi?acc_num=ucin1226600183.

Full text

Abstract:

Thesis (Ph.D.)--University of Cincinnati, 2008.
Advisor: Lawrence J. Mazlack. Title from electronic thesis title page (viewed Feb.16, 2009). Keywords: artificial intelligence; computational humor; natural language understanding. Includes abstract. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

50

Nyberg, Selma. "Video Recommendation Based on Object Detection." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-351122.

Full text

Abstract:

In this thesis, various machine learning domains have been combined in order to build a video recommender system that is based on object detection. The work combines two extensively studied research fields, recommender systems and computer vision, that also are rapidly growing and popular techniques on commercial markets. To investigate the performance of the approach, three different content-based recommender systems have been implemented at Spotify, which are based on the following video features: object detections, titles and descriptions, and user preferences. These systems have then been evaluated and compared against each other together with their hybridized result. Two algorithms have been implemented, the prediction and the top-N algorithm, where the former is the more reliable source for evaluating the system's performance. The evaluation of the system shows that the overall performance scores for predicting values of the users' liked and disliked videos are in the range from about 40 % to 70 % for the prediction algorithm and from about 15 % to 70 % for the top-N algorithm. The approach based on object detection performs worse in comparison to the other approaches. Hence, there seems to be is a low correlation between the user preferences and the video contents in terms of object detection data. Therefore, this data is not very suitable for describing the content of videos and using it in the recommender system. However, the results of this study cannot be generalized to apply for other systems before the approach has been evaluated in other environments and for various data sets. Moreover, there are plenty of room for refinements and improvements to the system, as well as there are many interesting research areas for future work.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Natural language processing (Computer science) Artificial intelligence'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles