Dissertations / Theses: 'Computational linguistics ; Semantics ; Linguistic analysis (Linguistics)'

1

Moilanen, Karo. "Compositional entity-level sentiment analysis." Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.559817.

Full text

Abstract:

This thesis presents a computational text analysis tool called AFFECTiS (Affect Interpretation/Inference System) which focuses on the task of interpreting natural language text based on its subjective, non-factual, affective properties that go beyond the 'traditional' factual, objective dimensions of meaning that have so far been the main focus of Natural Language Processing and Computational Linguistics. The thesis presents a fully compositional uniform wide-coverage computational model of sentiment in text that builds on a number of fundamental compositional sentiment phenomena and processes discovered by detailed linguistic analysis of the behaviour of sentiment across key syntactic constructions in English. Driven by the Principle of Semantic Compositionality, the proposed model breaks sentiment interpretation down into strictly binary combinatory steps each of which explains the polarity of a given sentiment expression as a function of the properties of the sentiment carriers contained in it and the grammatical and semantic context(s) involved. An initial implementation of the proposed compositional sentiment model is de- scribed which attempts direct logical sentiment reasoning rather than basing compu- tational sentiment judgements on indirect data-driven evidence. Together with deep grammatical analysis and large hand-written sentiment lexica, the model is applied recursively to assign sentiment to all (sub )sentential structural constituents and to concurrently equip all individual entity mentions with gradient sentiment scores. The system was evaluated on an extensive multi-level and multi-task evaluation framework encompassing over 119,000 test cases from which detailed empirical ex- perimental evidence is drawn. The results across entity-, phrase-, sentence-, word-, and document-level data sets demonstrate that AFFECTiS is capable of human-like sentiment reasoning and can interpret sentiment in a way that is not only coherent syntactically but also defensible logically - even in the presence of the many am- biguous extralinguistic, paralogical, and mixed sentiment anomalies that so tellingly characterise the challenges involved in non-factual classification.

APA, Harvard, Vancouver, ISO, and other styles

2

Sinha, Ravi Som Mihalcea Rada F. "Graph-based centrality algorithms for unsupervised word sense disambiguation." [Denton, Tex.] : University of North Texas, 2008. http://digital.library.unt.edu/permalink/meta-dc-9736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Konrad, Karsten. "Model generation for natural language interpretation and analysis /." Berlin [u.a.] : Springer, 2004. http://www.loc.gov/catdir/enhancements/fy0818/2004042936-d.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Davis, Nathan Scott. "An Analysis of Document Retrieval and Clustering Using an Effective Semantic Distance Measure." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2674.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Bihi, Ahmed. "Analysis of similarity and differences between articles using semantics." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-34843.

Full text

Abstract:

Adding semantic analysis in the process of comparing news articles enables a deeper level of analysis than traditional keyword matching. In this bachelor’s thesis, we have compared, implemented, and evaluated three commonly used approaches for document-level similarity. The three similarity measurement selected were, keyword matching, TF-IDF vector distance, and Latent Semantic Indexing. Each method was evaluated on a coherent set of news articles where the majority of the articles were written about Donald Trump and the American election the 9th of November 2016, there were several control articles, about random topics, in the set of articles. TF-IDF vector distance combined with Cosine similarity and Latent Semantic Indexing gave the best results on the set of articles by separating the control articles from the Trump articles. Keyword matching and TF-IDF distance using Euclidean distance did not separate the Trump articles from the control articles. We implemented and performed sentiment analysis on the set of news articles in the classes positive, negative and neutral and then validated them against human readers classifying the articles. With the sentiment analysis (positive, negative, and neutral) implementation, we got a high correlation with human readers (100%).

APA, Harvard, Vancouver, ISO, and other styles

6

Faruque, Md Ehsanul. "A Minimally Supervised Word Sense Disambiguation Algorithm Using Syntactic Dependencies and Semantic Generalizations." Thesis, University of North Texas, 2005. https://digital.library.unt.edu/ark:/67531/metadc4969/.

Full text

Abstract:

Natural language is inherently ambiguous. For example, the word "bank" can mean a financial institution or a river shore. Finding the correct meaning of a word in a particular context is a task known as word sense disambiguation (WSD), which is essential for many natural language processing applications such as machine translation, information retrieval, and others. While most current WSD methods try to disambiguate a small number of words for which enough annotated examples are available, the method proposed in this thesis attempts to address all words in unrestricted text. The method is based on constraints imposed by syntactic dependencies and concept generalizations drawn from an external dictionary. The method was tested on standard benchmarks as used during the SENSEVAL-2 and SENSEVAL-3 WSD international evaluation exercises, and was found to be competitive.

APA, Harvard, Vancouver, ISO, and other styles

7

Sinha, Ravi Som. "Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation." Thesis, University of North Texas, 2008. https://digital.library.unt.edu/ark:/67531/metadc9736/.

Full text

Abstract:

This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creating graphs for the words to be disambiguated. We experiment with several possible combinations of the semantic similarity measures as the first stage in our experiments. The next stage attempts to score individual vertices in the graphs previously created based on several graph connectivity measures. During the final stage, several voting schemes are applied on the results obtained from the different centrality algorithms. The most important contributions of this work are not only that it is a novel approach and it works well, but also that it has great potential in overcoming the new-knowledge-acquisition bottleneck which has apparently brought research in supervised WSD as an explicit application to a plateau. The type of research reported in this thesis, which does not require manually annotated data, holds promise of a lot of new and interesting things, and our work is one of the first steps, despite being a small one, in this direction. The complete system is built and tested on standard benchmarks, and is comparable with work done on graph-based word sense disambiguation as well as lexical chains. The evaluation indicates that the right combination of the above mentioned metrics can be used to develop an unsupervised disambiguation engine as powerful as the state-of-the-art in WSD.

APA, Harvard, Vancouver, ISO, and other styles

8

Carter, David Maclean. "A shallow processing approach to anaphor resolution." Thesis, University of Cambridge, 1986. https://www.repository.cam.ac.uk/handle/1810/256804.

Full text

Abstract:

The thesis describes an investigation of the feasibility of resolving anaphors in natural language texts by means of a "shallow processing" approach which exploits knowledge of syntax, semantics and local focussing as heavily as possible; it does not rely on the presence of large amounts of world or domain knowledge, which are notoriously hard to process accurately. The ideas reported are implemented in a program called SPAR (Shallow Processing Anaphor Resolver), which resolves anaphoric and other linguistic ambiguities in simple English stories and generates sentence-by-sentence paraphrases that show what interpretations have been selected. Input to SPAR takes the form of semantic structures for single sentences constructed by Boguraev's English analyser. These structures are integrated into a network-style text representation as processing proceeds. To achieve anaphor resolution, SPAR combines and develops several existing techniques, most notably Sidner's theory of local focussing and Wilks' "preference semantics" theory of semantics and common sense inference. Consideration of the need to resolve several anaphors in the same sentence results in Sidner's framework being modified and extended to allow focus-based processing to interact more flexibly with processing based on other types of knowledge. Wilks' treatment of common sense inference is extended to incorporate a wider range of types of inference without jeopardizing its uniformity and simplicity. Further, his primitive-based formalism for word sense meanings is developed in the interests of economy, accuracy and ease of use. Although SPAR is geared mainly towards resolving anaphors, the design of the system allows many non-anaphoric (lexical and structural) ambiguities that cannot be resolved during sentence analysis to be resolved as a by-product of anaphor resolution.

APA, Harvard, Vancouver, ISO, and other styles

9

Gränsbo, Gustav. "Word Clustering in an Interactive Text Analysis Tool." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157497.

Full text

Abstract:

A central operation of users of the text analysis tool Gavagai Explorer is to look through a list of words and arrange them in groups. This thesis explores the use of word clustering to automatically arrange the words in groups intended to help users. A new word clustering algorithm is introduced, which attempts to produce word clusters tailored to be small enough for a user to quickly grasp the common theme of the words. The proposed algorithm computes similarities among words using word embeddings, and clusters them using hierarchical graph clustering. Multiple variants of the algorithm are evaluated in an unsupervised manner by analysing the clusters they produce when applied to 110 data sets previously analysed by users of Gavagai Explorer. A supervised evaluation is performed to compare clusters to the groups of words previously created by users of Gavagai Explorer. Results show that it was possible to choose a set of hyperparameters deemed to perform well across most data sets in the unsupervised evaluation. These hyperparameters also performed among the best on the supervised evaluation. It was concluded that the choice of word embedding and graph clustering algorithm had little impact on the behaviour of the algorithm. Rather, limiting the maximum size of clusters and filtering out similarities between words had a much larger impact on behaviour.

APA, Harvard, Vancouver, ISO, and other styles

10

Prost, Jean-Philippe. "Modelling Syntactic Gradience with Loose Constraint-based Parsing." Phd thesis, Université de Provence - Aix-Marseille I, 2008. http://tel.archives-ouvertes.fr/tel-00352828.

Full text

Abstract:

La grammaticalité d'une phrase est habituellement conçue comme une notion binaire : une phrase est soit grammaticale, soit agrammaticale. Cependant, bon nombre de travaux se penchent de plus en plus sur l'étude de degrés d'acceptabilité intermédiaires, auxquels le terme de gradience fait parfois référence. À ce jour, la majorité de ces travaux s'est concentrée sur l'étude de l'évaluation humaine de la gradience syntaxique. Cette étude explore la possibilité de construire un modèle robuste qui s'accorde avec ces jugements humains.
Nous suggérons d'élargir au langage mal formé les concepts de Gradience Intersective et de Gradience Subsective, proposés par Aarts pour la modélisation de jugements graduels. Selon ce nouveau modèle, le problème que soulève la gradience concerne la classification d'un énoncé dans une catégorie particulière, selon des critères basés sur les caractéristiques syntaxiques de l'énoncé. Nous nous attachons à étendre la notion de Gradience Intersective (GI) afin qu'elle concerne le choix de la meilleure solution parmi un ensemble de candidats, et celle de Gradience Subsective (GS) pour qu'elle concerne le calcul du degré de typicité de cette structure au sein de sa catégorie. La GI est alors modélisée à l'aide d'un critère d'optimalité, tandis que la GS est modélisée par le calcul d'un degré d'acceptabilité grammaticale. Quant aux caractéristiques syntaxiques requises pour permettre de classer un énoncé, notre étude de différents cadres de représentation pour la syntaxe du langage naturel montre qu'elles peuvent aisément être représentées dans un cadre de syntaxe modèle-théorique (Model-Theoretic Syntax). Nous optons pour l'utilisation des Grammaires de Propriétés (GP), qui offrent, précisément, la possibilité de modéliser la caractérisation d'un énoncé. Nous présentons ici une solution entièrement automatisée pour la modélisation de la gradience syntaxique, qui procède de la caractérisation d'une phrase bien ou mal formée, de la génération d'un arbre syntaxique optimal, et du calcul d'un degré d'acceptabilité grammaticale pour l'énoncé.
À travers le développement de ce nouveau modèle, la contribution de ce travail comporte trois volets.
Premièrement, nous spécifions un système logique pour les GP qui permet la révision de sa formalisation sous l'angle de la théorie des modèles. Il s'attache notamment à formaliser les mécanismes de satisfaction et de relâche de contraintes mis en oeuvre dans les GP, ainsi que la façon dont ils permettent la projection d'une catégorie lors du processus d'analyse. Ce nouveau système introduit la notion de satisfaction relâchée, et une formulation en logique du premier ordre permettant de raisonner au sujet d'un énoncé.
Deuxièmement, nous présentons notre implantation du processus d'analyse syntaxique relâchée à base de contraintes (Loose Satisfaction Chart Parsing, ou LSCP), dont nous prouvons qu'elle génère toujours une analyse syntaxique complète et optimale. Cette approche est basée sur une technique de programmation dynamique (dynamic programming), ainsi que sur les mécanismes décrits ci-dessus. Bien que d'une complexité élevée, cette solution algorithmique présente des performances suffisantes pour nous permettre d'expérimenter notre modèle de gradience.
Et troisièmement, après avoir postulé que la prédiction de jugements humains d'acceptabilité peut se baser sur des facteurs dérivés de la LSCP, nous présentons un modèle numérique pour l'estimation du degré d'acceptabilité grammaticale d'un énoncé. Nous mesurons une bonne corrélation de ces scores avec des jugements humains d'acceptabilité grammaticale. Qui plus est, notre modèle s'avère obtenir de meilleures performances que celles obtenues par un modèle préexistant que nous utilisons comme référence, et qui, quant à lui, a été expérimenté à l'aide d'analyses syntaxiques générées manuellement.

APA, Harvard, Vancouver, ISO, and other styles

11

Penton, Dave. "Linguistic data models : presentation and representation /." Connect to thesis, 2006. http://eprints.unimelb.edu.au/archive/00002875.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Bernard, Timothée. "Approches formelles de l'analyse du discours : relations discursives et verbes d'attitude propositionnelle." Thesis, Sorbonne Paris Cité, 2019. http://www.theses.fr/2019USPCC034.

Full text

Abstract:

Cette thèse s’intéresse aux formalismes qui permettent de représenter mathématiquement non seulement le sens de phrases indépendantes mais aussi de textes entiers, en incluant les liens de sens que les différentes phrases qui les composent entretiennent les unes avec les autres. Ces liens de sens — les relations discursives — sont divers ; nous trouvons notamment des relations temporelles, causales et contrastives. Nous ne nous posons pas seulement la question du sens et de sa représentation, mais aussi celle de la détermination algorithmique de cette représentation à partir des séquences de mots qui composent les énoncés. Nous nous situons donc à l’interface de trois traditions : l’analyse discursive, la sémantique formelle et la linguistique computationnelle.La plupart de travaux formels portant sur le discours ne prêtent que peu d’attention aux verbes de dire (affirmer, dire, etc.) et d’attitude propositionnelle (penser, croire, etc.). Tous ces verbes, que nous regroupons sous l’abréviation « VAP », ont en commun d’exprimer l’attitude ou la position d’une personne sur une proposition donnée. Ils sont utilisés fréquemment et introduisent de nombreuses subtilités échappant de fait aux théories actuelles. Cette thèse a pour objectif principal de mettre à jour les principes d’une grammaire formelle compatible avec l’analyse du discours et prenant en compte les VAP. Nous commençons donc par présenter de nombreuses données linguistiques illustrant les interactions entre VAP et relations discursives.Il est souvent considéré que les connecteurs adverbiaux (ensuite, par exemple, etc.) sont anaphoriques. Cependant, nous pouvons nous demander si, en pratique, un système de linguistique computationnelle ne peut pas gérer cette catégorie particulière d’anaphore comme s’il s’agissait d’un type de dépendance structurelle, étendant d’une certaine manière la syntaxe au-delà de la phrase. C’est ce que nous nous proposons de faire à l’aide du formalisme D-STAG. Une telle approche, bien qu’ayant un certain nombre de propriétés intéressantes dans le cadre de l’analyse automatique du discours, fait peser un poids important sur la syntaxe, et nous discutons alors les difficultés qu’elle soulève.Cela nous amène à développer une approche anaphorique, c’est-à-dire dans laquelle les arguments des relations discursives ne sont plus déterminés uniquement par la structure grammaticale des énoncés. Ce sont les mêmes outils conceptuels que nous utilisons pour rendre compte de l’anaphoricité des connecteurs adverbiaux, des structures discursives non-arborées (observées avec tout type de connecteurs), mais aussi de l’usage évidentiel des VAP.Cependant, si nous employons la notion d’anaphore, nous voulons l’intégrer explicitement dans le formalisme grammatical, en spécifiant quand sont exécutées les résolutions d’anaphore et avec quelles informations en entrée. Cela est possible avec la sémantique par continuation, que nous utilisons en combinaison à la sémantique événementielle. Les événements sont souvent invoqués pour exprimer la sémantique des relations notamment causales ou temporelles, mais posent aussi un certain nombre de questions, liées aux schémas logiques d’inférence qu’autorisent les énoncés linguistiques ainsi qu’à la présence de la négation pour exprimer les arguments des relations discursives. Nous avançons plusieurs pistes pour y répondre et étudions plus en détail le cas de la négation.Nous revenons ainsi sur les difficultés que pose la négation linguistique pour une analyse sémantique événementielle, qui concernent autant l’interface syntaxe-sémantique que le niveau purement sémantique. Nous montrons que ces difficultés ont pour origine l’analyse standard de la négation, qui traite phrases positives et phrases négatives de manière fondamentalement différente. Rejetant cette vue, nous présentons une formalisation nouvelle de la notion d’événement négatif, adaptée à l’analyse de divers phénomènes linguistiques
This thesis focuses on the formalisms that make it possible to mathematically represent not only the meaning of independent sentences, but also whole texts, including the meaning relations that link sentences together. These links — the discourse relations — include temporal, causal and contrastive relations. Not only are we interested in meaning and its representation, but also on the algorithmic process of how this representation is computed using the sequence of words that constitute the text. We thus find ourselves at a point where three disciplines intersect: discourse analysis, formal semantics and computational linguistics.Most formal work on discourse pay little attention to reporting verbs (say, tell, etc.) and attitude verbs (think, believe, etc.). These verbs, henceforth ‘AVs’, all express the attitude or stance of one person on a given proposition. They are used frequently and introduce many subtleties that are not addressed in current theories. The main objective of this thesis is to shed light on the principles of a formal grammar that is compatible with discourse analysis that takes AVs into account. We therefore start by presenting a set of linguistic data illustrating the interactions between AVs and discourse relations.Adverbial connectives (then, for example, etc.) are usually considered anaphoric. One might wonder, however, whether, in practice, a computational linguistic system cannot deal with this particular category of anaphora as a kind of structural dependency, meaning that syntax is somehow extended above the sentence level. This is what we try to achieve using the D-STAG formalism. While it has properties that are relevant for automatic discourse analysis, such an approach imposes quite the burden on syntax. We therefore discuss the difficulties that this approach poses.Consequently, we develop an anaphor based approach, in which the arguments of discourse relations are not determined solely by the grammatical structures of the utterances. We use the same conceptual tools to account for the anaphoricity of adverbial connectives, the shape of non-tree discourse structures (observed for all type of connectives) but also the evidential use of AVs.If, however, we look at the notion of anaphora, our aim is to have it explicitly integrated into grammatical formalism. In particular, we set out to specify when anaphora resolution is performed and on which input. This is made possible by continuation semantics, which we use in conjunction with event semantics. Events have often been appealed to in order to describe the semantics of causal and temporal relations. Nevertheless, events raise a number of questions related to the possibility of some inference patterns that are observed, in addition to the presence of negation in the arguments of discourse relations. We suggest a number of potential answers and study the case of negation in more detail.We therefore review the issues facing event semantics when dealing with negation. Such issues concern both the syntax-semantics interface and the purely semantics level. We argue that these difficulties originate from the standard analysis of negation, which interprets positive and negative sentences is an essentially different fashion. Rejecting this view, we propose a novel formalisation of negative events that is relevant to the analysis of various linguistic phenomena

APA, Harvard, Vancouver, ISO, and other styles

13

Rahgozar, Arya. "Automatic Poetry Classification and Chronological Semantic Analysis." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40516.

Full text

Abstract:

The correction, authentication, validation and identification of the original texts in Hafez’s poetry among 16 or so old versions of his Divan has been a challenge for scholars. The semantic analysis of poetry with modern Digital Humanities techniques is also challenging. Analyzing latent semantics is more challenging in poetry than in prose for evident reasons, such as conciseness, imaginary and metaphorical constructions. Hafez’s poetry is, on the one hand, cryptic and complex because of his era’s restricting social properties and censorship impediments, and on the other hand, sophisticated because of his encapsulation of high-calibre world-views, mystical and philosophical attributes, artistically knitted within majestic decorations. Our research is strongly influenced by and is a continuation of, Mahmoud Houman’s instrumental and essential chronological classification of ghazals by Hafez. Houman’s chronological classification method (Houman, 1938), which we have adopted here, provides guidance to choose the correct version of Hafez’s poem among multiple manuscripts. Houman’s semantic analysis of Hafez’s poetry is unique in that the central concept of his classification is based on intelligent scrutiny of meanings, careful observation the evolutionary psychology of Hafez through his remarkable body of work. Houman’s analysis has provided the annotated data for the classification algorithms we will develop to classify the poems. We pursue to understand Hafez through the Houman’s perspective. In addition, we asked a contemporary expert to annotate Hafez ghazals (Raad, 2019). The rationale behind our research is also to satisfy the need for more efficient means of scholarly research, and to bring literature and computer science together as much as possible. Our research will support semantic analysis, and help with the design and development of tools for poetry research. We have developed a digital corpus of Hafez’s ghazals and applied proper word forms and punctuation. We digitized and extended chronological criteria to guide the correction and validation of Hafez’s poetry. To our knowledge, no automatic chronological classification has been conducted for Hafez poetry. Other than the meticulous preparation of our bilingual Hafez corpus for computational use, the innovative aspect of our classification research is two-fold. The first objective of our work is to develop semantic features to better train automatic classifiers for annotated poems and to apply the classifiers to unannotated poems, which is to classify the rest of the poems by applying machine learning (ML) methodology. The second task is to extract semantic information and properties to help design a visualization scheme to assist with providing a link between the prediction’s rationale and Houman’s perception of Hafez’s chronological properties of Hafez’s poetry. We identified and used effective Natural Language Processing (NLP) techniques such as classification, word-embedding features, and visualization to facilitate and automate semantic analysis of Hafez’s poetry. We defined and applied rigorous and repeatable procedures that can potentially be applied to other kinds of poetry. We showed that the chronological segments identified automatically were coherent. We presented and compared two independent chronological labellings of Hafez’s ghazals in digital form, pro- duced their ontologies and explained the inter-annotator-agreement and distributional semantic properties using relevant NLP techniques to help guide future corrections, authentication, and interpretation of Hafez’s poetry. Chronological labelling of the whole corpus not only helps better understand Hafez’s poetry, but it is a rigorous guide to better recognition of the correct versions of Hafez’s poems among multiple manuscripts. Such a small volume of complex poetic text required careful selection when choosing and developing appropriate ML techniques for the task. Through many classification and clustering experiments, we have achieved state-of-the-art prediction of chronological poems, trained and evaluated against our hand-made Hafez corpus. Our selected classification algorithm was a Support Vector Machine (SVM), trained with Latent Dirichlet Allocation (LDA)-based similarity features. We used clustering to produce an alternative perspective to classification. For our visualization methodology, we used the LDA features but also passed the results to a Principal Component Analysis (PCA) module to reduce the number of dimensions to two, thereby enabling graphical presentations. We believe that applying this method to poetry classifications, and showing the topic relations between poems in the same classes, will help us better understand the interrelated topics within the poems. Many of our methods can potentially be used in similar cases in which the intention is to semantically classify poetry.

APA, Harvard, Vancouver, ISO, and other styles

14

Jennings, Matthew. "Nevertheless, She Persisted: A Linguistic Analysis of the Speech of Elizabeth Warren, 2007-2017." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/honors/457.

Full text

Abstract:

A breakout star among American progressives in the recent past, Elizabeth Warren has quickly gone from a law professor to a leading figure in Democratic politics. This paper analyzes Warren’s speech from before her time as a political figure to the present using the quantitative textual methodology established by Jones (2016) in order to see if Warren’s speech supports Jones’s assertion that masculine speech is the language of power. Ratios of feminine to masculine markers ultimately indicate that despite her increasing political sway, Warren’s speech becomes increasingly feminine instead. However, despite associations of feminine speech with weakness, Warren’s speech scores highly for expertise and confidence as its feminine scores increase. These findings relate to the relevant political context and have implications for presumptions of masculine speech as the standard for political power.

APA, Harvard, Vancouver, ISO, and other styles

15

Gray, Tyler. "Measuring Linguistic and Cultural Evolution Using Books and Tweets." ScholarWorks @ UVM, 2019. https://scholarworks.uvm.edu/graddis/1130.

Full text

Abstract:

Written language provides a snapshot of linguistic, cultural, and current events information for a given time period. Aggregating these snapshots by studying many texts over time reveals trends in the evolution of language, culture, and society. The ever-increasing amount of electronic text, both from the digitization of books and other paper documents to the increasing frequency with which electronic text is used as a means of communication, has given us an unprecedented opportunity to study these trends. In this dissertation, we use hundreds of thousands of books spanning two centuries scanned by Google, and over 100 billion messages, or ‘tweets’, posted to the social media platform, Twitter, over the course of a decade to study the English language, as well as study the evolution of culture and society as inferred from the changes in language. We begin by studying the current state of verb regularization and how this compares between the more formal writing of books and the more colloquial writing of tweets on Twitter. We find that the extent of verb regularization is greater on Twitter, taken as a whole, than in English Fiction books, and also for tweets geotagged in the United States relative to American English books, but the opposite is true for tweets geotagged in the United Kingdom relative to British English books. We also find interesting regional variations in regularization across counties in the United States. However, once differences in population are accounted for, we do not identify strong correlations with socio-demographic variables. Next, we study stretchable words, a fundamental aspect of spoken language that, until the advent of social media, was rarely observed within written language. We examine the frequency distributions of stretchable words and introduce two central parameters that capture their main characteristics of balance and stretch. We explore their dynamics by creating visual tools we call ‘balance plots’ and ‘spelling trees’. We also discuss how the tools and methods we develop could be used to study mistypings and misspellings, and may have further applications both within and beyond language. Finally, we take a closer look at the English Fiction n-gram dataset created by Google. We begin by explaining why using token counts as a proxy of word, or more generally, ‘n-gram’, importance is fundamentally flawed. We then devise a method to rebuild the Google Books corpus so that meaningful linguistic and cultural trends may be reliably discerned. We use book counts as the primary ranking for an n-gram and use subsampling to normalize across time to mitigate the extraneous results created by the underlying exponential increase in data volume over time. We also combine the subsampled data over a number of years as a method of smoothing. We then use these improved methods to study linguistic and cultural evolution across the last two centuries. We examine the dynamics of Zipf distributions for n-grams by measuring the churn of language reflected in the flux of n-grams across rank boundaries. Finally, we examine linguistic change using wordshift plots and a rank divergence measure with a tunable parameter to compare the language of two different time periods. Our results address several methodological shortcomings associated with the raw Google Books data, strengthening the potential for cultural inference by word changes.

APA, Harvard, Vancouver, ISO, and other styles

16

Ji, Donghong. "Conceptual relevance : representation and analysis." Thesis, University of Oxford, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.711639.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Rodriguez, D. L. "תחת : a cognitive linguistic analysis of the Biblical Hebrew lexeme." Thesis, Stellenbosch : University of Stellenbosch, 2011. http://hdl.handle.net/10019.1/6641.

Full text

Abstract:

Thesis (MA (Ancient Studies))--University of Stellenbosch, 2011.
AFRIKAANSE OPSOMMING: Hierdie tesis spreek die probleem aan van polisemie in die beskrywing van die Bybels- Hebreeuse lekseem תחת in die Biblia Hebraica Stuttgartensia. Tradisioneel word die lekseem meestal as ‘n voorsetsel beskou. In hierdie ondersoek word aangetoon dat die lekseem ook as as ‘n naamwoord, bywoord of voegwoord gebruik kan word. ‘n Kritiese analise van standaard Bybels-Hebreeuse woordeboeke toon aan dat hierdie bronne mank gaan ‘n leksikografies begronde raamwerk in terme waarvan polisimiese lekseme ge-orden kan word. Wat nodig is vir hierdie doeleindes, is leksikale beskrywings eerder as ‘n lys “oënskynlike” betekenisse. Dit word verder duidelik aangetoon dat vertalingsekwivalente nie altyd gelykgestel kan word aan die betekenis van ‘n lekseem nie – ‘n praktyk wat al jare lank onkrities aanvaar word. Kognitief-linguistiese instrumente ten opsigte van kategorisering en leksikale semantiek word dan ingespan om die lekseem תחת beter te beskryf. Hierdie studie verteenwoordig so ‘n kognitief-linguistiese analise van die polisemiese dimensies van die semantiese netwerk van תחת , wat ook bruikbaar kan wees in digitale leksikografie. Die voorgestelde netwerk word gekomplementeer deur semantiese diagramme wat die betekenis grafies uitbeeld in plaas daarvan om dit met behulp van vertalingsekwivalente te beskryf. Die betekenisonderskeidings wat getref word, is die volgende: substantief (onderkant), plek (spesifieke plek “spot”), substitusie (in die plek van), uitruiling (in ruil vir), oorsaaklikheid (omdat) en implisiete perspektief (x onder [die spreker]). Hierdie betekenisonderskeidings word georganiseer met behulp van ‘n grafiese netwerk wat die semantiese verhouding tussen die verskillende nuanse illustreer. Die semantiese netwerk stel ook ‘n ontwikkelingsprofiel van die lekseem voor. Hierdie diagram bied ‘n moontlike verklaring waarom תחת ‘n bepaalde reeks polemiese onderskeidings simboliseer.
ENGLISH ABSTRACT: This thesis addresses the problem of polysemy in describing the biblical Hebrew lexeme תחת in the Biblia Hebraica Stuttgartensia. Traditionally treated as mainly a preposition, it is demonstrated in this study that תחת can also be used as a noun, adverb or conjunction. A critical analysis of standard biblical Hebrew lexica reveals that they typically lack a clear lexicographic framework with which polysemous lexemes can be organized. Ideally, this would offer lexical explanations to users of a lexicon rather than supply lists of alleged meanings. Further, it is also made clear that target language glosses can no longer be accepted as "meaning", a practice which has been uncritically accepted for years. In order to move beyond English glosses, cognitive linguistic tools for categorization and lexical semantics are utilized. This thesis contributes a cognitive linguistic analysis of the polysemous lexeme תחת and a semantic network of תחת that can be useful for digital lexicography. The proposed network is complemented by frame semantic diagrams which describe meaning imagically rather than only with a target language gloss. The various senses established are: substantive (underpart), place (spot), substitution (in place of), exchange (in exchange for), vertical spatial (under), approximately under (at the foot of), control (under the hand), causation (because), and implied perspective (x below [the speaker]). These senses are organized in the proposed network showing the semantic relationship between the senses. The semantic network also provides an evolutionarily plausible explanation of how תחת came to symbolize so many distinct polysemies.

APA, Harvard, Vancouver, ISO, and other styles

18

Lilliehöök, Hampus. "Extraction of word senses from bilingual resources using graph-based semantic mirroring." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-91880.

Full text

Abstract:

In this thesis we retrieve semantic information that exists implicitly in bilingual data. We gather input data by repeatedly applying the semantic mirroring procedure. The data is then represented by vectors in a large vector space. A resource of synonym clusters is then constructed by performing K-means centroid-based clustering on the vectors. We evaluate the result manually, using dictionaries, and against WordNet, and discuss prospects and applications of this method.
I det här arbetet utvinner vi semantisk information som existerar implicit i tvåspråkig data. Vi samlar indata genom att upprepa proceduren semantisk spegling. Datan representeras som vektorer i en stor vektorrymd. Vi bygger sedan en resurs med synonymkluster genom att applicera K-means-algoritmen på vektorerna. Vi granskar resultatet för hand med hjälp av ordböcker, och mot WordNet, och diskuterar möjligheter och tillämpningar för metoden.

APA, Harvard, Vancouver, ISO, and other styles

19

Pérez-Rosas, Verónica. "Exploration of Visual, Acoustic, and Physiological Modalities to Complement Linguistic Representations for Sentiment Analysis." Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699996/.

Full text

Abstract:

This research is concerned with the identification of sentiment in multimodal content. This is of particular interest given the increasing presence of subjective multimodal content on the web and other sources, which contains a rich and vast source of people's opinions, feelings, and experiences. Despite the need for tools that can identify opinions in the presence of diverse modalities, most of current methods for sentiment analysis are designed for textual data only, and few attempts have been made to address this problem. The dissertation investigates techniques for augmenting linguistic representations with acoustic, visual, and physiological features. The potential benefits of using these modalities include linguistic disambiguation, visual grounding, and the integration of information about people's internal states. The main goal of this work is to build computational resources and tools that allow sentiment analysis to be applied to multimodal data. This thesis makes three important contributions. First, it shows that modalities such as audio, video, and physiological data can be successfully used to improve existing linguistic representations for sentiment analysis. We present a method that integrates linguistic features with features extracted from these modalities. Features are derived from verbal statements, audiovisual recordings, thermal recordings, and physiological sensors signals. The resulting multimodal sentiment analysis system is shown to significantly outperform the use of language alone. Using this system, we were able to predict the sentiment expressed in video reviews and also the sentiment experienced by viewers while exposed to emotionally loaded content. Second, the thesis provides evidence of the portability of the developed strategies to other affect recognition problems. We provided support for this by studying the deception detection problem. Third, this thesis contributes several multimodal datasets that will enable further research in sentiment and deception detection.

APA, Harvard, Vancouver, ISO, and other styles

20

Mann, Jasleen Kaur. "Semantic Topic Modeling and Trend Analysis." Thesis, Linköpings universitet, Statistik och maskininlärning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173924.

Full text

Abstract:

This thesis focuses on finding an end-to-end unsupervised solution to solve a two-step problem of extracting semantically meaningful topics and trend analysis of these topics from a large temporal text corpus. To achieve this, the focus is on using the latest develop- ments in Natural Language Processing (NLP) related to pre-trained language models like Google’s Bidirectional Encoder Representations for Transformers (BERT) and other BERT based models. These transformer-based pre-trained language models provide word and sentence embeddings based on the context of the words. The results are then compared with traditional machine learning techniques for topic modeling. This is done to evalu- ate if the quality of topic models has improved and how dependent the techniques are on manually defined model hyperparameters and data preprocessing. These topic models provide a good mechanism for summarizing and organizing a large text corpus and give an overview of how the topics evolve with time. In the context of research publications or scientific journals, such analysis of the corpus can give an overview of research/scientific interest areas and how these interests have evolved over the years. The dataset used for this thesis is research articles and papers from a journal, namely ’Journal of Cleaner Productions’. This journal has more than 24000 research articles at the time of working on this project. We started with implementing Latent Dirichlet Allocation (LDA) topic modeling. In the next step, we implemented LDA along with document clus- tering to get topics within these clusters. This gave us an idea of the dataset and also gave us a benchmark. After having some base results, we explored transformer-based contextual word and sentence embeddings to evaluate if this leads to more meaningful, contextual, and semantic topics. For document clustering, we have used K-means clustering. In this thesis, we also discuss methods to optimally visualize the topics and the trend changes of these topics over the years. Finally, we conclude with a method for leveraging contextual embeddings using BERT and Sentence-BERT to solve this problem and achieve semantically meaningful topics. We also discuss the results from traditional machine learning techniques and their limitations.

APA, Harvard, Vancouver, ISO, and other styles

21

Wang, Yong. "Incorporating semantic and syntactic information into document representation for document clustering." Diss., Mississippi State : Mississippi State University, 2005. http://library.msstate.edu/etd/show.asp?etd=etd-07072005-105806.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Conrath, Juliette. "Unsupervised extraction of semantic relations using discourse information." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30202/document.

Full text

Abstract:

La compréhension du langage naturel repose souvent sur des raisonnements de sens commun, pour lesquels la connaissance de relations sémantiques, en particulier entre prédicats verbaux, peut être nécessaire. Cette thèse porte sur la problématique de l'utilisation d'une méthode distributionnelle pour extraire automatiquement les informations sémantiques nécessaires à ces inférences de sens commun. Des associations typiques entre des paires de prédicats et un ensemble de relations sémantiques (causales, temporelles, de similarité, d'opposition, partie/tout) sont extraites de grands corpus, par l'exploitation de la présence de connecteurs du discours signalant typiquement ces relations. Afin d'apprécier ces associations, nous proposons plusieurs mesures de signifiance inspirées de la littérature ainsi qu'une mesure novatrice conçue spécifiquement pour évaluer la force du lien entre les deux prédicats et la relation. La pertinence de ces mesures est évaluée par le calcul de leur corrélation avec des jugements humains, obtenus par l'annotation d'un échantillon de paires de verbes en contexte discursif. L'application de cette méthodologie sur des corpus de langue française et anglaise permet la construction d'une ressource disponible librement, Lecsie (Linked Events Collection for Semantic Information Extraction). Celle-ci est constituée de triplets: des paires de prédicats associés à une relation; à chaque triplet correspondent des scores de signifiance obtenus par nos mesures.Cette ressource permet de dériver des représentations vectorielles de paires de prédicats qui peuvent être utilisées comme traits lexico-sémantiques pour la construction de modèles pour des applications externes. Nous évaluons le potentiel de ces représentations pour plusieurs applications. Concernant l'analyse du discours, les tâches de la prédiction d'attachement entre unités du discours, ainsi que la prédiction des relations discursives spécifiques les reliant, sont explorées. En utilisant uniquement les traits provenant de notre ressource, nous obtenons des améliorations significatives pour les deux tâches, par rapport à plusieurs bases de référence, notamment des modèles utilisant d'autres types de représentations lexico-sémantiques. Nous proposons également de définir des ensembles optimaux de connecteurs mieux adaptés à des applications sur de grands corpus, en opérant une réduction de dimension dans l'espace des connecteurs, au lieu d'utiliser des groupes de connecteurs composés manuellement et correspondant à des relations prédéfinies. Une autre application prometteuse explorée dans cette thèse concerne les relations entre cadres sémantiques (semantic frames, e.g. FrameNet): la ressource peut être utilisée pour enrichir cette structure par des relations potentielles entre frames verbaux à partir des associations entre leurs verbes. Ces applications diverses démontrent les contributions prometteuses amenées par notre approche permettant l'extraction non supervisée de relations sémantiques
Natural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations

APA, Harvard, Vancouver, ISO, and other styles

23

Pettersson, Eva. "Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction." Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-269753.

Full text

Abstract:

Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting.

APA, Harvard, Vancouver, ISO, and other styles

24

Lebedeva, Ekaterina. "Expression de la dynamique du discours à l'aide de continuations." Phd thesis, Université de Lorraine, 2012. http://tel.archives-ouvertes.fr/tel-00783245.

Full text

Abstract:

This thesis develops a theoretical formalism of formal semantics of natural language in the spirit of Montague semantics. The developed framework satisfies the principle of compositionality in a simple and elegant way, by being as parsimonious as possible: completely new formalisms or extensions of existing formalisms with even more complex constructions to fit particular linguistic phenomena have been avoided; instead, the framework handles these linguistic phenomena using only basic and well-established formalisms, such as simply-typed lambda calculus and classical logic. Dynamics is achieved by employing a continuation-passing technique and an exception raising and handling mechanism. The context is explicitly represented by a term, and, therefore, can be easily accessed and manipulated. The framework successfully handles cross-sentential anaphora and presuppositions triggered by referring expressions and has potential to be extended for dealing with more complex dynamic phenomena, such as presuppositions triggered by factive verbs and conversational implicatures.

APA, Harvard, Vancouver, ISO, and other styles

25

Al, Batineh Mohammed S. "Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1429300641.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Osipova, Anna. "The Concept of ’Selling/Buying’ in the Russian Linguistic Picture of the World : from standard to sub-standard." Doctoral thesis, Högskolan Dalarna, Ryska, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:du-6082.

Full text

Abstract:

The thesis belongs to the field of lexical semantics studies, associated with describing the Russian linguistic world-image. The research focuses on the universal situation of purchase and sale as reflected in the Russian lexical standard and sub-standard. The work deals also with subjects related to the sphere of social linguistics: the social stratification of the language, the structure of sub-standard, etc. The thesis is a contribution to the description of the Russian linguistic world-image as well as to the further elaboration of the conceptional analysis method. The results are applicable in teaching Russian as a foreign language, particularly in lexis and Russian culture and mentality studies.

APA, Harvard, Vancouver, ISO, and other styles

27

Gorrell, Genevieve. "Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1045s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Öhrström, Fredrik. "Cluster Analysis with Meaning : Detecting Texts that Convey the Same Message." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-153873.

Full text

Abstract:

Textual duplicates can be hard to detect as they differ in words but have similar semantic meaning. At Etteplan, a technical documentation company, they have many writers that accidentally re-write existing instructions explaining procedures. These "duplicates" clutter the database. This is not desired because it is duplicate work. The condition of the database will only deteriorate as the company expands. This thesis attempts to map where the problem is worst, and also how to calculate how many duplicates there are. The corpus is small, but written in a controlled natural language called Simplified Technical English. The method uses document embeddings from doc2vec and clustering by use of HDBSCAN* and validation using Density-Based Clustering Validation index (DBCV), to chart the problems. A survey was sent out to try to determine a threshold value of when documents stop being duplicates, and then using this value, a theoretical duplicate count was calculated.

APA, Harvard, Vancouver, ISO, and other styles

29

Bertucci, Roberlei Alves. "Uma análise semântica para verbos aspectuais em português brasileiro." Universidade de São Paulo, 2011. http://repositorio.utfpr.edu.br/jspui/handle/1/552.

Full text

Abstract:

Capes, Capes-Cofecub
Esta tese investiga a contribuição semântica dos verbos aspectuais acabar, começar, continuar, deixar, parar, passar, voltar e terminar em português brasileiro. O objetivo geral é contribuir para a análise sobre aspecto e sobre verbos aspectuais em PB e nas línguas naturais em geral, utilizando a Semântica Formal como modelo de análise. Este trabalho defende que os verbos aspectuais são modificadores de eventualidades e por isso contribuem para a expressão do aspecto lexical (Aktionsart), seguindo trabalhos anteriores, como os de Oliveira et al. (2001) e de Laca (2002; 2004; 2005). Ele assume que os verbos aqui estudados se comportam de forma diferente dos verbos de aspecto gramatical como estar e ir, tendo inclusive uma posição diferente na estrutura sintática. Como os verbos aspectuais se relacionam com o aspecto lexical, este trabalho também investiga as propriedades do aspecto lexical presentes no predicado selecionado por um verbo aspectual. Para a seleção dos verbos aspectuais, as propriedades relevantes se encontram no nível do sintagma verbal (VP) e são dadas composicionalmente. Por isso, constituintes como o objeto direto podem alterá-las, permitindo (ou restringindo) a seleção do VP em questão por parte do verbo aspectual. A tese aqui defendida é a de que cada verbo aspectual seleciona seus complementos a partir de propriedades específicas presentes no VP e que estão ligadas à semântica do verbo aspectual em questão. Dessa forma, esta pesquisa defende que as restrições de seleção de cada verbo aspectual podem ser acessadas na entrada lexical do verbo aspectual, sendo possível explicar a seleção de complementos feita por esses verbos. Além disso, defendemos a tese de que a entrada lexical pode apresentar a diferença entre verbos como começar e passar, nas perífrases começar a+infinitivo e passar a +infinitivo, por exemplo.
This thesis investigates the semantic contribution of the following aspectual verbs: começar ‘begin’, continuar ‘continue’, deixar ‘quit’, parar ‘stop’, passar ‘pass’, voltar ‘resume’, and acabar/terminar ‘finish’, in Brazilian Portuguese (BrP). The main goal is to contribute to the discussion about aspect and aspectual verbs in BrP and in natural languages in general, within a Formal Semantics approach. This work treats aspectual verbs as eventuality modifiers, and, consequently, it defends that they contribute to express lexical aspect (Aktionsart) – or situation aspect (Smith 1997) – in accordance with previous works such as Oliveira et al. (2001) and Laca (2002; 2004; 2005). This thesis also proposes that the verbs under discussion behave differently from verbs which contribute to express grammatical aspect – or viewpoint aspect (Smith 1997) – such as estar ‘be’, in the progressive aspect, and ir ‘go’, in the prospective aspect, also assuming that they have different positions in the syntactic structure. Since aspectual verbs are related to lexical aspect, the present research also analyzes lexical aspect features in the predicate which is selected for these verbs. Relevant features for this selection are located in the verbal phrase level (VP) and are given compositionally. Thus, phrases such as the direct object are able to change them, allowing (or not) that an aspectual verb select this VP. This work defends the hypothesis that each aspectual verb selects their complements based on specific features found in the VP, and that these features are linked to the semantics of this aspectual verb. Then, this research defends the idea that the selection constraints of each aspectual verb can be observed in its lexical entry, which explains the selection made by these verbs. Furthermore, we defend the idea that we can verify differences between semantically similar verbs like começar and passar, in their periphrastic construction, from differences found in their lexical entries.

APA, Harvard, Vancouver, ISO, and other styles

30

Bochner, Gregory. "Naming and contingency : towards an internalist theory of direct reference." Doctoral thesis, Universite Libre de Bruxelles, 2011. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209797.

Full text

Abstract:

This work is an essay on the reference of names in language and thought. According to the Theory of Direct Reference, nowadays dominant in philosophy of language, the semantic content of a proper name is directly its referent (Chapter 1).

Nevertheless, despite its current fame, this theory must face two major difficulties, familiar since Frege and Russell: the Co-Reference and the No-Reference Problems. The traditional response to these problems consisted precisely in abandoning Referentialism in favour of a version of Descriptivism according to which the semantic content of a proper name would be, not its referent, but a descriptive condition (Chapter 2).

However, it is also this traditional version of Descriptivism that the arguments offered by the pioneers of modern Referentialism—including Kripke, Putnam, and Kaplan—have largely discredited (Chapter 3).

The theoretical tools developed within the framework of possible worlds semantics enable to restate the problems generated by Referentialism in terms of the opacity of linguistic intensions and Modal Illusions (Chapter 4).

At this stage, our semantic theory of names seems to have reached a dead end: on the one hand, modern Referentialism recreates the problems which classical Descriptivism was meant to solve, but, on the other hand, this kind of Descriptivism appears to be refuted by the argumentation of new Referentialists. A common reaction, then, has been to devise more complex semantic theories purporting to combine Referentialism with crucial features from Descriptivism. However, a careful examination reveals that the various versions of this strategy fail (Chapter 5).

Another type of reaction, also ecumenical, has been to draw a distinction between two kinds of contents which would be associated with names and the sentences in which these occur: while the first kind of content would be descriptive, the second would be referential. The Two-Dimensionalist framework has received several interpretations (pragmatic, semantic, metasemantic); but a new construal, metasyntactic, is defended in this work (Chapter 6).

The metasyntactic interpretation of Two-Dimensionalism allows for a radical gap between language and thought: while the thoughts of their users can remain descriptive, names are supposed to achieve direct reference by themselves, and independently of the mental states of their users. Hence, names must be regarded as objects living in the external world, on a par with other ordinary objects like trees or chairs, and not as mental objects. An Externalist metaphysics of names is then submitted, as well as a corresponding epistemology, according to which external names are described in the mind through a description of their reference (Chapter 7).

The general strategy pursued in this work amounts to combining a Theory of Direct Reference in language with a Descriptivist (hence, Internalist) account of thought. Also, certain influential arguments — notably devised by Burge — intended to support Mental Referentialism (hence, Externalism) beyond Linguistic Referentialism, are rejected; it is moreover argued that a Non-Descriptivist conception of the mental is incapable of securing the introspective transparency of thoughts, which, however, seems indispensable, among other things in order to solve and even pose the Co-Reference and the No-Reference Problems (Chapter 8).

----------

Ce travail est un essai sur la référence des noms dans le langage et la pensée. Selon la Théorie de la Référence Directe, aujourd'hui dominante en philosophie du langage, le contenu sémantique d'un nom propre est directement son référent (Chapitre 1).

Or, malgré son succès récent, cette théorie Référentialiste se heurte à deux obstacles majeurs, reconnus depuis Frege et Russell : les Problèmes de la Co-référence et de la Non-Référence. La réponse traditionnelle à ces problèmes consistait précisément à abandonner la conception Référentialiste en faveur d'un Descriptivisme selon lequel le contenu sémantique d'un nom propre serait, non pas son référent, mais une condition descriptive (Chapitre 2).

Toutefois, c'est aussi ce Descriptivisme traditionnel que les arguments formulés par les hérauts du Référentialisme moderne—dont Kripke, Putnam, et Kaplan—ont largement discrédité (Chapitre 3).

Les outils théoriques développés dans le cadre de la sémantique des mondes possibles permettent de reformuler les problèmes générés par le Référentialisme en termes d'opacité des intensions linguistiques et d'Illusions Modales (Chapitre 4).

A ce stade, la théorie sémantique des noms semble dans une impasse : d'une part, le Référentialisme moderne recrée des problèmes que le Descriptivisme classique devait résoudre, mais d'autre part, ce Descriptivisme paraît bel et bien réfuté par l'argumentation des Référentialistes. Aussi, une réaction commune a été de chercher à concilier le Référentialisme et une forme de Descriptivisme au sein d'une même théorie sémantique. Cependant, un examen approfondi révèle que les différentes versions de cette stratégie échouent (Chapitre 5).

Une autre réaction, elle aussi œcuménique, a été d'opérér une distinction entre deux types de contenus qui seraient associés avec les noms et les phrases dans lesquels ceux-ci figurent : le premier contenu serait descriptif, tandis que le second serait référentiel. Le cadre offert par un tel Bi-Dimensionnalisme a reçu plusieurs interprétations très différentes (pragmatique, sémantique, métasémantique) ; mais c'est une nouvelle version, métasyntaxique, qui est défendue dans ce travail (Chapitre 6).

Le Bi-Dimensionalisme métasyntaxique autorise une séparation radicale entre langage et pensée : tandis que les pensées de leurs utilisateurs peuvent rester descriptives, les noms sont censés référer directement par eux-mêmes, indépendamment des états mentaux de leurs utilisateurs. Dès lors, les noms doivent être considérés comme des objets appartenant au monde extérieur, au même titre que des objets ordinaires tels que les arbres ou les chaises, et non comme des objets mentaux. Une métaphysique externaliste des noms est proposée, ainsi qu'une épistémologie assortie, selon laquelle les noms externes sont décrits dans l'esprit à travers une description de leur référence (Chapitre 7).

La stratégie générale qui est défendue dans ce travail revient à combiner une Théorie de la Référence Directe dans le langage avec une conception Descriptiviste (et donc, Internaliste) de la pensée. Aussi, certains arguments influents — émis par notamment Burge — censés établir un Référentialisme non seulement linguistique mais aussi mental (et donc, un Externalisme) sont rejetés ; il est en outre défendu qu'une vision Non-Descriptiviste du mental apparaît incapable de garantir la transparence introspective des pensées, cependant indispensable, notamment pour résoudre et même poser les Problèmes de Co-Référence et de Non-Référence (Chapitre 8).
Doctorat en Langues et lettres
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

31

Bourreau, Pierre. "Jeux de typage et analyse de lambda-grammaires non-contextuelles." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2012. http://tel.archives-ouvertes.fr/tel-00733964.

Full text

Abstract:

Les grammaires catégorielles abstraites (ou λ-grammaires) sont un formalisme basé sur le λ-calcul simplement typé. Elles peuvent être vues comme des grammaires générant de tels termes, et ont été introduites aﬁn de modéliser l'interface entre la syntaxe et la sémantique du langage naturel, réunissant deux idées fondamentales : la distinction entre tectogrammaire (c.a.d. structure profonde d'un énoncé) et phénogrammaire (c.a.d représentation de la surface d'un énoncé) de la langue, exprimé par Curry ; et une modélisation algébrique du principe de compositionnalité aﬁn de rendre compte de la sémantique des phrases, due à Montague. Un des avantages principaux de ce formalisme est que l'analyse d'une grammaires catégorielle abstraite permet de résoudre aussi bien le problème de l'analyse de texte, que celui de la génération de texte. Des algorithmes d'analyse efﬁcaces ont été découverts pour les grammaires catégorielles abstraites de termes linéaires et quasi-linéaires, alors que le problème de l'analyse est non-élémentaire dans sa forme la plus générale. Nous proposons d'étudier des classes de termes pour lesquels l'analyse grammaticale reste solvable en temps polynomial. Ces résultats s'appuient principalement sur deux théorèmes de typage : le théorème de cohérence, spéciﬁant qu'un λ-terme donné est l'unique habitant d'un certain typage ; et le théorème d'expansion du sujet, spéciﬁant que deux termes β-équivalents habitent les même typages. Aﬁn de mener cette étude à bien, nous utiliserons une représentation abstraite des notions de λ-termes et de typages, sous forme de jeux. En particulier, nous nous appuierons grandement sur cette notion aﬁn de démontrer le théorème de cohérence pour de nouvelles familles de λ-termes et de typages. Grâce à ces résultats, nous montrerons qu'il est possible de construire de manière directe, un reconnaisseur dans le langage Datalog, pour des grammaires catégorielles abstraites de λ-termes quasi-afﬁnes.

APA, Harvard, Vancouver, ISO, and other styles

32

Lokhande, Hrishikesh. "Pharmacodynamics miner : an automated extraction of pharmacodynamic drug interactions." Thesis, 2013. http://hdl.handle.net/1805/3757.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Pharmacodynamics (PD) studies the relationship between drug concentration and drug effect on target sites. This field has recently gained attention as studies involving PD Drug-Drug interactions (DDI) assure discovery of multi-targeted drug agents and novel efficacious drug combinations. A PD drug combination could be synergistic, additive or antagonistic depending upon the summed effect of the drug combination at a target site. The PD literature has grown immensely and most of its knowledge is dispersed across different scientific journals, thus the manual identification of PD DDI is a challenge. In order to support an automated means to extract PD DDI, we propose Pharmacodynamics Miner (PD-Miner). PD-Miner is a text-mining tool, which is capable of identifying PD DDI from in vitro PD experiments. It is powered by two major features, i.e., collection of full text articles and in vitro PD ontology. The in vitro PD ontology currently has four classes and more than hundred subclasses; based on these classes and subclasses the full text corpus is annotated. The annotated full text corpus forms a database of articles, which can be queried based upon drug keywords and ontology subclasses. Since the ontology covers term and concept meanings, the system is capable of formulating semantic queries. PD-Miner extracts in vitro PD DDI based upon references to cell lines and cell phenotypes. The results are in the form of fragments of sentences in which important concepts are visually highlighted. To determine the accuracy of the system, we used a gold standard of 5 expert curated articles. PD-Miner identified DDI with a recall of 75% and a precision of 46.55%. Along with the development of PD Miner, we also report development of a semantically annotated in vitro PD corpus. This corpus includes term and sentence level annotations and serves as a gold standard for future text mining.

APA, Harvard, Vancouver, ISO, and other styles

33

Newsom, Eric Tyner. "An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences." Thesis, 2013. http://hdl.handle.net/1805/3666.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed.

APA, Harvard, Vancouver, ISO, and other styles

34

Sauer, Paul Van der Merwe. "The complexity of unavoidable word patterns." Thesis, 2019. http://hdl.handle.net/10500/27343.

Full text

Abstract:

Bibliography: pages 192-195
The avoidability, or unavoidability of patterns in words over finite alphabets has been studied extensively. The word α over a finite set A is said to be unavoidable for an infinite set B+ of nonempty words over a finite set B if, for all but finitely many elements w of B+, there exists a semigroup morphism φ ∶ A+ → B+ such that φ(α) is a factor of w. In this treatise, we start by presenting a historical background of results that are related to unavoidability. We present and discuss the most important theorems surrounding unavoidability in detail. We present various complexity-related properties of unavoidable words. For words that are unavoidable, we provide a constructive upper bound to the lengths of words that avoid them. In particular, for a pattern α of length n over an alphabet of size r, we give a concrete function N(n, r) such that no word of length N(n, r) over the alphabet of size r avoids α. A natural subsequent question is how many unavoidable words there are. We show that the fraction of words that are unavoidable drops exponentially fast in the length of the word. This allows us to calculate an upper bound on the number of unavoidable patterns for any given finite alphabet. Subsequently, we investigate computational aspects of unavoidable words. In particular, we exhibit concrete algorithms for determining whether a word is unavoidable. We also prove results on the computational complexity of the problem of determining whether a given word is unavoidable. Specifically, the NP-completeness of the aforementioned problem is established.
Decision Sciences
D. Phil. (Operations Research)

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Computational linguistics ; Semantics ; Linguistic analysis (Linguistics)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles