Tesis sobre el tema "Generative lexicon"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 48 mejores tesis para su investigación sobre el tema "Generative lexicon".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Thalji, Abdullah Abdel-Majeed. "Systematic polysemy in Arabic : a generative lexicon-based account". Thesis, University of Essex, 2018. http://repository.essex.ac.uk/22121/.
Texto completoMartinez, Jorge Matadamas. "AXEL : a framework to deal with ambiguity in three-noun compounds". Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/4774.
Texto completoRomeo, Lauren Michele. "The Structure of the lexicon in the task of the automatic acquisition of lexical information". Doctoral thesis, Universitat Pompeu Fabra, 2015. http://hdl.handle.net/10803/325420.
Texto completoLa información de clase semántica de los nombres es fundamental para una amplia variedad de tareas del procesamiento del lenguaje natural (PLN), como la traducción automática, la discriminación de referentes en tareas como la detección y el seguimiento de eventos, la búsqueda de respuestas, el reconocimiento y la clasificación de nombres de entidades, la construcción y ampliación automática de ontologías, la inferencia textual, etc. Una aproximación para resolver la construcción y el mantenimiento de los léxicos de gran cobertura que alimentan los sistemas de PNL, una tarea muy costosa y lenta, es la adquisición automática de información léxica, que consiste en la inducción de una clase semántica relacionada con una palabra en concreto a partir de datos de su distribución obtenidos de un corpus. Precisamente, por esta razón, se espera que la investigación actual sobre los métodos para la producción automática de léxicos de alta calidad, con gran cantidad de información y con anotación de clase como el trabajo que aquí presentamos, tenga un gran impacto en el rendimiento de la mayoría de las aplicaciones de PNL. En esta tesis, tratamos la adquisición automática de información léxica como un problema de clasificación. Con este propósito, adoptamos métodos de aprendizaje automático para generar un modelo que represente los datos de distribución vectorial que, basados en ejemplos conocidos, permitan hacer predicciones de otras palabras desconocidas. Las principales preguntas de investigación que planteamos en esta tesis son: (i) si los datos de corpus proporcionan suficiente información para construir representaciones de palabras de forma eficiente y que resulten en decisiones de clasificación precisas y sólidas, y (ii) si la adquisición automática puede gestionar, también, los nombres polisémicos. Para hacer frente a estos problemas, realizamos una serie de validaciones empíricas sobre nombres en inglés. Nuestros resultados confirman que la información obtenida a partir de la distribución de los datos de corpus es suficiente para adquirir automáticamente clases semánticas, como lo demuestra un valor-F global promedio de 0,80 aproximadamente utilizando varios modelos de recuento de contextos y en datos de corpus de distintos tamaños. No obstante, tanto el estado de la cuestión como los experimentos que realizamos destacaron una serie de retos para este tipo de modelos, que son reducir la escasez de datos del vector y dar cuenta de la polisemia nominal en las representaciones distribucionales de las palabras. En este contexto, los modelos de word embedding (WE) mantienen la “semántica” subyacente en las ocurrencias de un nombre en los datos de corpus asignándole un vector. Con esta elección, hemos sido capaces de superar el problema de la escasez de datos, como lo demuestra un valor-F general promedio de 0,91 para las clases semánticas de nombres de sentido único, a través de una combinación de la reducción de la dimensionalidad y de números reales. Además, las representaciones de WE obtuvieron un rendimiento superior en la gestión de las ocurrencias asimétricas de cada sentido de los nombres de tipo complejo polisémicos regulares en datos de corpus. Como resultado, hemos podido clasificar directamente esos nombres en su propia clase semántica con un valor-F global promedio de 0,85. La principal aportación de esta tesis consiste en una validación empírica de diferentes representaciones de distribución utilizadas para la clasificación semántica de nombres junto con una posterior expansión del trabajo anterior, lo que se traduce en recursos léxicos y conjuntos de datos innovadores que están disponibles de forma gratuita para su descarga y uso.
Lexical semantic class information for nouns is critical for a broad variety of Natural Language Processing (NLP) tasks including, but not limited to, machine translation, discrimination of referents in tasks such as event detection and tracking, question answering, named entity recognition and classification, automatic construction and extension of ontologies, textual inference, etc. One approach to solve the costly and time-consuming manual construction and maintenance of large-coverage lexica to feed NLP systems is the Automatic Acquisition of Lexical Information, which involves the induction of a semantic class related to a particular word from distributional data gathered within a corpus. This is precisely why current research on methods for the automatic production of high- quality information-rich class-annotated lexica, such as the work presented here, is expected to have a high impact on the performance of most NLP applications. In this thesis, we address the automatic acquisition of lexical information as a classification problem. For this reason, we adopt machine learning methods to generate a model representing vectorial distributional data which, grounded on known examples, allows for the predictions of other unknown words. The main research questions we investigate in this thesis are: (i) whether corpus data provides sufficient distributional information to build efficient word representations that result in accurate and robust classification decisions and (ii) whether automatic acquisition can handle also polysemous nouns. To tackle these problems, we conducted a number of empirical validations on English nouns. Our results confirmed that the distributional information obtained from corpus data is indeed sufficient to automatically acquire lexical semantic classes, demonstrated by an average overall F1-Score of almost 0.80 using diverse count-context models and on different sized corpus data. Nonetheless, both the State of the Art and the experiments we conducted highlighted a number of challenges of this type of model such as reducing vector sparsity and accounting for nominal polysemy in distributional word representations. In this context, Word Embeddings (WE) models maintain the “semantics” underlying the occurrences of a noun in corpus data by mapping it to a feature vector. With this choice, we were able to overcome the sparse data problem, demonstrated by an average overall F1-Score of 0.91 for single-sense lexical semantic noun classes, through a combination of reduced dimensionality and “real” numbers. In addition, the WE representations obtained a higher performance in handling the asymmetrical occurrences of each sense of regular polysemous complex-type nouns in corpus data. As a result, we were able to directly classify such nouns into their own lexical-semantic class with an average overall F1-Score of 0.85. The main contribution of this dissertation consists of an empirical validation of different distributional representations used for nominal lexical semantic classification along with a subsequent expansion of previous work, which results in novel lexical resources and data sets that have been made freely available for download and use.
Marruche, Vanessa de Sales. "Uma análise do verbo poder do português brasileiro à luz da HPSG e do léxico gerativo". Universidade Federal do Amazonas, 2012. http://tede.ufam.edu.br/handle/tede/2375.
Texto completoCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
This study presents an analysis both syntactic and semantic of the verb poder in Brazilian Portuguese. To achieve this goal, we started with a literature review, which consisted of works dedicated to the study of auxiliarity and modality in order to determine what these issues imply and what is usually considered for classifying the verb under investigation as an auxiliary and/or modal verb. As foundations of this study, we used two theories, namely, HPSG (Head-Driven Phrase Structure Grammar Gramática de Estruturas Sintagmáticas Orientadas pelo Núcleo), a model of surface oriented generative grammar, which consists of a phonological, a syntactic and a semantic component, and GL (The Generative Lexicon O Léxico Gerativo), a lexicalist model of semantic interpretation of natural language, which is proposed to deal with problems such as compositionality, semantic creativity, and logical polysemy. Because these models are unable to handle the verb poder of the Brazilian Portuguese as they were originally proposed, it was necessary to use the GL to make some modifications in HPSG, in order to semantically enrich this model of grammar, so that it can cope with the logical polysemy of the verb poder, its behavior as a raising and a control verb, the saturation of its internal argument, as well as to identify when it is an auxiliary verb. The analysis showed that: (a) poder has four meanings inherent to it, namely, CAPACITY, ABILITY, POSSIBILITY and PERMISSION; (b) to saturate the internal argument of poder, the phrase candidate to saturate that argument must be of type [proposition] and the core of that phrase must be of type [event]. In case those types are not identical, the type coercion is applied in order to recover the requested type for that verb; (c) poder is a raising verb when it means POSSIBILITY, in such case it selects no external argument. That is, it accepts as its subject whatever the subject of its VP-complement is; (d) poder is a control verb when it means CAPACITY, ABILITY and/or PERMISSION and in this case it requires that the saturator of its internal argument be of type [entity] when poder means CAPACITY, or of type [animal] when it means ABILITY and/or PERMISSION; (e) poder is an auxiliary verb only when it is a raising verb, because only in this situation it does not impose any selectional restrictions on the external argument and (f ) poder is considered a modal verb because it can express an epistemic notion possibility and at least three non-epistemic notions of modality capacity, ability and permission.
Este trabalho apresenta uma análise tanto sintática quanto semântica do verbo poder do português brasileiro. Para alcançar esse objetivo, partiu-se de uma revisão de literatura, a qual compreendeu trabalhos dedicados ao estudo da auxiliaridade e da modalidade, a fim de verificar o que essas questões implicam e o que geralmente é levado em consideração para classificar o verbo investigado como auxiliar e/ou modal. Como alicerces deste trabalho, foram utilizadas duas teorias, quais sejam, a HPSG (Head-Driven Phrase Structure Grammar Gramática de Estruturas Sintagmáticas Orientadas pelo Núcleo), um modelo de gramática gerativa orientada pela superfície, a qual é constituída de um componente fonológico, um sintático e um semântico, e o GL (The Generative Lexicon O Léxico Gerativo), um modelo lexicalista de interpretação semântica de língua natural, que se propõe a lidar com problemas como a composicionalidade, a criatividade semântica e a polissemia lógica. Devido ao fato de esses modelos não conseguirem lidar com o verbo poder do português brasileiro como eles foram propostos originalmente, foi necessário utilizar o GL para fazer algumas modificações na HPSG, a fim de enriquecer semanticamente esse modelo de gramática, de modo que ele consiga dar conta da polissemia lógica do verbo poder, de seu comportamento como verbo de alçamento e de controle, da saturação de seu argumento interno, além de identificar quando ele é um verbo auxiliar. A análise mostrou que: (a) quatro são os significados inerentes ao verbo poder, quais sejam, CAPACIDADE, HABILIDADE, PERMISSÃO e POSSIBILIDADE; (b) para saturar o argumento interno do verbo poder, o sintagma candidato a saturador deve ser do tipo [proposição], e o núcleo desse sintagma deve ser do tipo [evento] e, não havendo essa identidade de tipos, recorre-se à aplicação da construção de coerção de tipo para recuperar o tipo solicitado por aquele verbo; (c) poder é verbo de alçamento quando significa POSSIBILIDADE e, nesse caso, não seleciona argumento externo. Ou seja, aceita como sujeito qualquer que seja o sujeito de seu VP-complemento; (d) poder é verbo de controle quando significa CAPACIDADE, HABILIDADE e/ou PERMISSÃO e, nesse caso, requer que o sintagma saturador de seu argumento interno seja ou do tipo [entidade], quando significa CAPACIDADE, ou do tipo [animal], quando significa HABILIDADE e/ou PERMISSÃO; (e) poder só é verbo auxiliar quando é um verbo de alçamento, pois só nessa situação não impõe restrições selecionais quanto ao argumento externo; e (f) poder é considerado um verbo modal porque pode expressar uma noção epistêmica possibilidade e pelo menos três noções não epistêmicas de modalidade capacidade, habilidade e permissão.
Mangcunyana, Mteteleli Nelson. "Uhlalutyo lwesemantiki yelekhisikoni yesenzi sentshukumo u-hamba kwisiXhosa". Thesis, Stellenbosch : University of Stellenbosch, 2007. http://hdl.handle.net/10019.1/1684.
Texto completoThis study explores semantic analysis of motion verb-hamba in IsiXhosa. In chapter 1 I have stated the aim of the study. I have discussed properties related to the lexical semantic analysis of the verb-hamba as well as Pustejovsky’s theory of the Generative Lexicon. The theoretical framework and the organization of study are also discussed in this chapter. Chapter 2 addresses in more detail the type system for semantics. A generative theory of the lexicon includes multiple levels of representation for different types of lexical information needed. These levels include Argument Structure, Event Structure, Qualia Structure and Lexical Inherent Structure. In this chapter there is a more detailed structure of the qualia and the role they play in distributing the functional behavior of words and phrases in composition. In chapter 3 I have examined the lexical semantic analysis of the verb-hamba to account for the range of selectional properties of the NP phrase subject argument of the verb-hamba and various interpretations that arise in terms of composition with its complement arguments. The polysemous behavior of the verb-hamba is examined in sentence alternation constructions with respect to the properties of the event structure. I have also investigated the lexical representation in terms of argument structure and the event structure of the verb-hamba in different sentences. Chapter 4 is the conclusion, summarizing the findings of all the previous chapters in this study on lexical semantic analysis of the motion verb-hamba in IsiXhosa. This is followed by word lists that contain meanings of words in the context in which they are used.
Dias, Márcio de Souza. "Análise de nomes da química Orgânica à luz da teoria do Léxico Gerativo - da análise sintático-semântica à geração das estruturas químicas através dos combinadores de Parser". Universidade Federal de Uberlândia, 2006. https://repositorio.ufu.br/handle/123456789/12559.
Texto completoO presente trabalho propõe um sistema automático de análise de nomes de compostos da Química Orgânica visando a geração do desenho de suas estruturas químicas. Para tanto, o sistema recebe um nome de um composto orgânico, analisa-o sintática e semanticamente e, caso ele represente um composto quimicamente correto, gera uma saída visual para a estrutura química que lhe corresponda. Um avanço que o sistema apresenta com relação a outros que se propõem a efetuar tarefa semelhante e o fato de ele conseguir analisar tanto nomes de compostos que se enquadram nos padrões das nomenclaturas oficiais vigentes, quanto aqueles que, apesar de não se enquadrarem nos mesmos, representam compostos orgânicos verdadeiros (quando ocorrer tal situação, o sistema teria resolvido um problema de ambigüidade de nomenclatura). As análises sintática e semântica são guiadas pelos tipos dos componentes dos nomes químicos, fato que motivou a implementação do sistema nos moldes do formalismo da Teoria do Léxico Gerativo (TLG). Além disso, as análises guiadas pelo tipo motivaram a escolha dos combinadores de Parser e da Linguagem de Programação Funcional Clean como utilitários eficazes e adequados na execução das análises lingüísticas. O sistema implementado representa uma ferramenta muito útil como instrutor automático de Química Orgânica.
Mestre em Ciência da Computação
Msibi, Phakamile Innocentia. "Ucwaningo lwesimantikhi yelekhizikhoni yesenzo u-phuma esizulwini". Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/5377.
Texto completoENGLISH ABSTRACT: The main concern of this thesis relates to an investigation of the lexical-semantic nature of the motion verb –phuma (exit, go out) in isiZulu within the framework of Generative Lexicon Theory. In particular, the thesis explores the event structure and aspectual verb class properties in the locative-subject alternation with the verb –phuma in isiZulu. Chapter one presents a general introduction to the study, stating the purpose and aims of the research, giving a broad perspective of the theoretical framework adopted, and outlining the organisation of the investigation of the lexical-semantic properties of –phuma. Chapter two presents a detailed discussion of Generative Lexicon Theory, which centrally concerns accounting for polysemy phenomena across various nominal and verbal expressions. The four dimensions of lexical-semantic representation that constitute the central theoretical properties in Generative Lexicon Theory are reviewed, i.e. Argument structure, Event structure, Qualia structure and Lexical Inheritance structure. In addition, the various facets of meaning of Qualia structure namely Fomral, Constitutive, Telic and Agentive facets, are described in relation to their theoretical significance in accounting for word meaning and polysemy. Chapter three examines in a systematic and comprehensive way the range of locative-subject alternation possibilities with the verb –phuma. In particular the range of semantic types of the NP subject argument of –phuma taking a locative complement is explored to determine whether all these sentences permit a corresponding locative-alternation construction. In addition, the aspectual verb class properties of the two variants in the alternation are analysed with regard to a range of diagnositics associated with stative events, activity events, achievement events and accomplishments events. It is known that the two variants in the alternation can be distinguished in terms of their aspectual verb class properties. Chapter four summarises the main findings of the study and presents the conclusion.
AFRIKAANSE OPSOMMING: Die hoofbelang van hierdie tesis hou verband met die ondersoek van die leksikaal-semantiese aard van die bewegingswerkwoorde –phuma in isiZulu binne die raamwerk van Generatiewe Leksikon teorie soos uiteengesit deur Pustejovsky (1996). Die tesis ondersoek spesifiek die gebeurtenis ('event') struktuur en aspektuele werkwoordklas eienskappe in die lokatief-subjek alternasie met die werkwoord –phuma in isiZulu. Hoofstuk een gee 'n algemene oorsig van die studie, stel die doelstellings van die teoretiese raamwerk wat aanvaar word, en skets die organisasie van die studie oor die leksikaalsemantiese kenmerke van –phuma. Hoofstuk twee bied 'n detail bespreking van Generatiewe Leksikonteorie, wat sentraal verband hou met die verklaring van polisemieverskynsels van naamwoordelike en werkwoordelike uitdrukkings. Die vier dimensies van leksikaal-semantiese representasie wat die sentrale teoretiese eienskappe vorm in Generatiewe Leksikonteorie word beskou, naamlik argumentstruktuur, Gebeurtenis ('Event') struktuur, Qualiastruktuur en Leksikaleerwingstruktuur. Voorts word die verskillende fasette van betekenis van Qualiastruktuur, nl. Formeel, Konstitief, Doel ('Telic') en Agentief beskryf rakende die teoretiese belang daarvan vir die verklaring van woordbetekenis en polisemie. Hoofstuk drie ondersoek op 'n sistematiese wyse die verskeidenheid van lokatief-subjek alternasie moontlikhede met die werkwoord –phuma. In die besonder, word die semantiese tipes van die NP subjek argument van –phuma wat 'n lokatiewe komplement neem ondersoek om te bepaal watter van hierdie sinne neem 'n lokatiewe-alternasie konstruksie. Voorts word die aspektuele werkwoordklas kenmerke van die twee variante in die alternasie ontleed met verwysing na 'n reeks toetse vir die onderskeid van aspektuele werkwoordklasse. Daar word aangetoon dat die twee alternasies onderskei kan word in terme van aspektuele werkwoordklas. Hoofstuk vier gee die opsomming en konklusie van die studie.
UKUBUKEZA KAFUSHANE: Lesi sifundo sibheka ucwaningo lwesimantikhi yelekhizikhoni yezenzo ezikhethiweyo esiZulwini. Esahlukweni soku – 1, injongo yalesisifundo iyashiwo, imiphumela yocwaningo mayelana nesimathikhi yelekhizikhoni yesenzo u – phuma kanjalo nengqikithi yelekhizikhoni itshengiswe ngokukaPustejovosky (1996). Isimo sengqikithi kanye nokulungiselelwa kwesifundo kuzoxoxwa ngakho kulesisifundo. Isahluko sesi – 2 siveza uhlobo lwesimantikhi yethu. Ulwazi olucutshunguliwe lwelekhizikhoni lufaka amazinga amaningi amele izinhlobo ezahlukeneyo zolwazi lwesimantikhi. Kula mazinga singabala isakhiwo sempikiswano, isakhiwo sesigameko, isakhiwo sekhwaliya kanye nesakhiwo esisohlwini ololandelayo. Lesi sahluko sesibili sibuye siboniso ngokucace kakhulu ngokwesakhiwo sekhwaliya nangendima edlaliwe ekuqhubekiseni imisebenzi yamagama kanye namabinzana ahlanganisiwe. Isahluko sesi – 3 sihlola ucwaningo lwesimantikhi lwesenzo u – phuma esimayelana nezingxenye zezimpawu ezikhethiweyo zempikiswano yebinzana lebizo eliyinhloko yesenzo u – phuma kanye nezincazelo ezahiukahlukene ezivela emagameni ahlanganiswe ngokwempikiswano yemfezeko. Izindlela zezincazelo eziningi zesenzo u – phuma zihloliwe esakhiweni sokushintshana emishweni ngokubandakanye esakhiweni sesigameko. Incazelo yelekhizikhoni ngokwamagama esakhiwo sempikiswano kanye nesakhiwo sesigameko sesenzo u – phuma emishweni eyahlukahlukene icutshunguliwe. Isahluko sesi – 4 siyisiphetho esifingqa konke okutholakala ezahlukweni ezindlule esifundweni socwaningo lwelekhizikhoni yesimantikhi yezenzo ezikhethwe esiZulwini.
Mirzapour, Mehdi. "Modeling Preferences for Ambiguous Utterance Interpretations". Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS094/document.
Texto completoThe problem of automatic logical meaning representation for ambiguous natural language utterances has been the subject of interest among the researchers in the domain of computational and logical semantics. Ambiguity in natural language may be caused in lexical/syntactical/semantical level of the meaning construction or it may be caused by other factors such as ungrammaticality and lack of the context in which the sentence is actually uttered. The traditional Montagovian framework and the family of its modern extensions have tried to capture this phenomenon by providing some models that enable the automatic generation of logical formulas as the meaning representation. However, there is a line of research which is not profoundly investigated yet: to rank the interpretations of ambiguous utterances based on the real preferences of the language users. This gap suggests a new direction for study which is partially carried out in this dissertation by modeling meaning preferences in alignment with some of the well-studied human preferential performance theories available in the linguistics and psycholinguistics literature.In order to fulfill this goal, we suggest to use/extend Categorial Grammars for our syntactical analysis and Categorial Proof Nets as our syntactic parse. We also use Montagovian Generative Lexicon for deriving multi-sorted logical formula as our semantical meaning representation. This would pave the way for our five-folded contributions, namely, (i) ranking the multiple-quantifier scoping by means of underspecified Hilbert's epsilon operator and categorial proof nets; (ii) modeling the semantic gradience in sentences that have implicit coercions in their meanings. We use a framework called Montagovian Generative Lexicon. Our task is introducing a procedure for incorporating types and coercions using crowd-sourced lexical data that is gathered by a serious game called JeuxDeMots; (iii) introducing a new locality-based referent-sensitive metrics for measuring linguistic complexity by means of Categorial Proof Nets; (iv) introducing algorithms for sentence completions with different linguistically motivated metrics to select the best candidates; (v) and finally integration of different computational metrics for ranking preferences in order to make them a unique model
Salazar, Burgos Hada Rosabel. "Descripción y representación de los adjetivos deverbales de participio en el discurso especializado". Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/41720.
Texto completoThe goal of this thesis is to pinpoint the grammatical information that is necessary to determine which Spanish verb stems give rise to an adjectival participle (AP). This information will allow us to describe the linguistic indicators that, within the domain of economy, activate a specialized meaning in those terms that have the structure AP+noun. These syntactic minimal constructions are highly productive in specialized discourse. Nevertheless, the hybrid nature of the participial form invokes many conflicts in Natural Language Processing (NLP) applications. This descriptive approach to the adjectival participles is linguistic in nature, based on the Communicative Theory of Terminology (CTT), intends to be the point of contact between theory and application.
Matos, Ely Edison da Silva. "LUDI: um framework para desambiguação lexical com base no enriquecimento da semântica de frames". Universidade Federal de Juiz de Fora, 2014. https://repositorio.ufjf.br/jspui/handle/ufjf/695.
Texto completoApproved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-02-26T11:51:46Z (GMT) No. of bitstreams: 1 elyedisondasilvamatos.pdf: 5520917 bytes, checksum: c9e7d798d96928a6ad4f2ee48d912531 (MD5)
Made available in DSpace on 2016-02-26T11:51:47Z (GMT). No. of bitstreams: 1 elyedisondasilvamatos.pdf: 5520917 bytes, checksum: c9e7d798d96928a6ad4f2ee48d912531 (MD5) Previous issue date: 2014-06-27
Enquanto no âmbito da Sintaxe, as técnicas, os algoritmos e as aplicações em Processamento da Língua Natural são bem estudados e já estão relativamente bem estabelecidos, no âmbito da Semântica não é possível observar ainda a mesma maturidade. Visando, então, contribuir para os estudos em Semântica Computacional, este trabalho busca maneiras de implementar algumas das ideias e dos insights propostos pela Linguística Cognitiva, que é, por si, uma alternativa à Linguística Gerativa. A tentativa é reunir algumas das ferramentas disponíveis, seja no viés computacional (Bancos de Dados, Teoria dos Grafos, Ontologias, Mecanismos de inferências, Modelos Conexionistas), seja no viés linguístico (Semântica de Frames e Teoria do Léxico Gerativo), seja no viés de aplicações (FrameNet e ontologia SIMPLE), a fim de abordar as questões semânticas de forma mais flexível. O objeto de estudo é o processo de desambiguação de Unidades Lexicais. O resultado da pesquisa realizada é corporificado na forma de uma aplicação computacional, chamada Framework LUDI (Lexical Unit Discovery through Inference), composta por algoritmos e estruturas de dados usados na desambiguação. O framework é uma aplicação de Compreensão da Língua Natural, que pode ser integrada em ferramentas para recuperação de informação e sumarização, bem como em processos de Etiquetagem de Papéis Semânticos (SRL - Semantic Role Labeling).
While in the field of Syntax techniques, algorithms and applications in Natural Language Processing are well known and relatively well established, the same situation does not hold for the field of Semantics. Aiming at contributing to the studies in Computational Semantics, this work implements ideas and insights offered by Cognitive Linguistics, which is itself an alternative to Generative Linguistics. We attempt to bring together contributions from the computational domain (Databases, Graph Theory, Ontologies, inference mechanisms, Connectionists Models), the linguistic domain (Frame Semantics and the Generative Lexicon), and the application domain (FrameNet and SIMPLE Ontology) in order to address the semantic issues more flexibly. The object of study is the process of disambiguation of Lexical Units. The results of the research are embodied in the form of a computer application, called Framework LUDI (Lexical Unit Discovery through Inference), and composed of algorithms and data structures used for Lexical Unit disambiguation. The framework is an application of Natural Language Understanding, which can be integrated into information retrieval and summarization tools, as well as into processes of Semantic Role Labeling (SRL).
Mery, Bruno. "Modélisation de la Sémantique Lexicale dans le cadre de la théorie des types". Phd thesis, Université Sciences et Technologies - Bordeaux I, 2011. http://tel.archives-ouvertes.fr/tel-00627432.
Texto completoBandhakavi, Anil. "Domain-specific lexicon generation for emotion detection from text". Thesis, Robert Gordon University, 2018. http://hdl.handle.net/10059/3103.
Texto completoPereira, Dennis V. "Automatic Lexicon Generation for Unsupervised Part-of-Speech Tagging Using Only Unannotated Text". Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/10094.
Texto completoMaster of Science
Abeyruwan, Saminda Wishwajith. "PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabilistic Methods". Scholarly Repository, 2010. http://scholarlyrepository.miami.edu/oa_theses/28.
Texto completoKozlowski, Raymond. "Uniform multilingual sentence generation using flexible lexico-grammatical resources". Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file 0.93 Mb., 213 p, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:3200536.
Texto completoCaink, Andrew David. "The lexical interface : closed class items in south Slavic and English". Thesis, Durham University, 1998. http://etheses.dur.ac.uk/5026/.
Texto completoArapinis, Alexandra. "Le Mot et la Chose Revisités: le Cas de la Polysémie Systématique". Phd thesis, Université Panthéon-Sorbonne - Paris I, 2009. http://tel.archives-ouvertes.fr/tel-00614536.
Texto completoThwaites, Peter. "Lexical and distributional influences on word association response generation". Thesis, Cardiff University, 2018. http://orca.cf.ac.uk/119182/.
Texto completoChiu, Pei-Wen Andy. "From Atoms to the Solar System: Generating Lexical Analogies from Text". Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/2943.
Texto completoThis thesis presents a novel system that generates lexical analogies from a corpus of text documents. The system is motivated by a well-established theory of analogy-making, and views lexical analogy generation as a series of three processes: identifying pairs of words that are semantically related, finding clues to characterize their relations, and generating lexical analogies by matching pairs of words with similar relations. The system uses a dependency grammar to characterize semantic relations, and applies machine learning techniques to determine their similarities. Empirical evaluation shows that the system performs remarkably well, generating lexical analogies at a precision of over 90%.
Mullen, Dana Shirley. "Issues in the morphology and phonology of Amharic the lexical generation of pronominal clitics". Thesis, University of Ottawa (Canada), 1986. http://hdl.handle.net/10393/5402.
Texto completoSchwanhäu[beta]er, Barbara. "Lexical tone perception and production the role of language and musical background /". View thesis, 2007. http://handle.uws.edu.au:8081/1959.7/31791.
Texto completo"A thesis submitted to the University of Western Sydney, College of Arts, MARCS Auditory Laboratories in fulfilment of the requirements for the degree of Doctor of Philosophy." Includes bibliography.
Booth, Hannah. "Expletives and clause structure : syntactic change in Icelandic". Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/expletives-and-clause-structure-syntactic-change-in-icelandic(7907d61b-4404-4964-bf8d-ce304c0fab8d).html.
Texto completoWalter, Sebastian [Verfasser] y Philipp [Akademischer Betreuer] Cimiano. "Generation of multilingual ontology lexica with M-ATOLL : a corpus-based approach for the induction of ontology lexica / Sebastian Walter ; Betreuer: Philipp Cimiano". Bielefeld : Universitätsbibliothek Bielefeld, 2017. http://d-nb.info/1123723729/34.
Texto completoHamed, Osama Amin [Verfasser] y Torsten [Akademischer Betreuer] Zesch. "Automatic generation of lexical recognition tests using natural language processing / Osama Amin Hamed ; Betreuer: Torsten Zesch". Duisburg, 2019. http://d-nb.info/1198111313/34.
Texto completoLaroui, Abdellatif. "Le composant lexical mediateur entre le composant conceptuel et le composant linguistique dans le cadre de la generation multilingue". Paris 6, 1993. http://www.theses.fr/1993PA066142.
Texto completoCruz, Adilson Góis da. "A expressão do argumento dativo no português escrito: um estudo comparativo entre o português brasileiro e o português europeu". Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/8/8142/tde-27112009-140208/.
Texto completoThis dissertation discusses, in a comparative perspective between Brazilian Portuguese (BP) and European Portuguese (EP), the expression of the dative argument of the third person in a formal writing corpus constituted by the Brazilian and European translations directly from Spanish of the book A Hundred Years of Solitude, by Gabriel Garcia Marques. The analysis considers the behaviour of three dative variants the clitic lhe/lhe, the PPs a/para ele(s)/ela(s) and the null pronoun in ditransitive, inaccusative, causative, incoative and inergative predicates. In the context of the Generative Theory and the Variation Theory, the goal is to show differences between BP and EP that can confirm, or not, the hypothesis that the two variants of Portuguese reveal distinct grammars.
IRAQUI, (ép SINACEUR) ZAKIA. "Etude lexicale des parlers arabes marocains". Paris 3, 1986. http://www.theses.fr/1986PA030068.
Texto completoLexical study of moroccan arabic on the basis of an important corpus, the g. S. Colin's file containing more than 5000 roots and some oral research. The structure of moroccan arabic is based on the root-pattern intercrossing and the use of a set of suffixes generating new forms. Examination of triliteral roots with all the patterns that have been detected by the systematic analysis of the eleven letters of the file. The study of each pattern refers to classical arabic considered as a standard. The lexicon of moroccan arabic consists of classical words which have followed some of laws of linguistic evolution. The contact of the dialect with other languages have resulted in the appearance of many foreign terms. Berber, turkish, spanish and french borrowings have been perfectly assimilated, cast into arabic moulds and submitted to the morphological laws of the receiving language: derivation, formation of plurals, diminutives. They have sometimes given birth to new roots. Sets of patterns corresponding to specific categories: masdars, adjectives, participles, trade nouns, plurals and diminutives can be brought to light in the lexicon. Moroccan arabic is undergoing a major change, not only lexical, but also phonological under the influence of mass media and arabicized education
Rein, Kellyn [Verfasser]. "I believe it's possible it might be so.... Exploiting Lexical Clues for the Automatic Generation of Evidentiality Weights for Information Extracted from English Text / Kellyn Rein". Bonn : Universitäts- und Landesbibliothek Bonn, 2016. http://d-nb.info/1119803217/34.
Texto completoMazón, Larson Erik Ramón. "Diferencias léxicas entreinmigrantes de distintageneración : Un estudio piloto sobre el cambiointergeneracional de conocimientos de español". Thesis, Linnéuniversitetet, Institutionen för språk (SPR), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-61018.
Texto completoSantos, Paola Junqueira Pinto dos. "Orações infinitivas : da seleção ao controle". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/21564.
Texto completoEn el idioma portugués, el sujeto de las oraciones no flexionado es ocupado por la categoría vacía PRO, que tiene, de acuerdo a la Teoría Generativa, naturaleza mixta, comportándose como un pronombre, con referencia libre; o como una anáfora, con referencia vinculada a algún argumento de la oración inmediatamente superior. Esta investigación tiene por objeto estudiar dos aspectos básicos de las oraciones infinitivas: (1) cuáles verbos las seleccionan, y (2) si las mismas clases de verbos condicionan la forma con la cual ocurre el control del PRO. Para eso, fue necesario un estudio sobre la complementarización en portugués, a fin de observar cuáles son los verbos que seleccionan infinitivo subordinado y cómo lo hacen. Finalmente, se busca establecer si el control es un fenómeno de orden sintáctico, como afirma Chomsky (1981/1982), o de orden semántico, involucrando la interpretación de los predicados básicos detrás de los verbos de control, como observan Culicover e Jackendoff (2003/2005). Con esta investigación, si tiene por objeto, también, contribuir con los estudios lingüísticos a través de la descripción, análisis y explicación de un fenómeno aún poco explorado en el portugués de Brasil.
Kyjovská, Linda. "Syntaktická analýza založená na multigenerování". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235439.
Texto completoDolíhal, Luděk. "Syntaktická analýza založená na řadě metod". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236688.
Texto completoLIU, CHIUNG-YI y 劉瓊怡. "Dynamic Generative Lexicon". Thesis, 2004. http://ndltd.ncl.edu.tw/handle/26194439541953200466.
Texto completoDuann, Ren-feng y 段人鳯. "When Embodiment Meets Generative Lexicon: The Human Body Part Metaphors in Taiwan Presidential Speeches". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/6v826m.
Texto completo國立臺灣大學
語言學研究所
103
This dissertation integrates embodiment with generative lexicon. By analyzing the metaphorically/metonymically used human body part terminology in the Taiwan Presidential Corpus, a representative sample of the Taiwanese leadership rhetoric, we reveal how these two theories complement each other on the one hand, and disclose how the changing political context leads to the discriminated uses of the corporeal terms on the other hand. We argue that the two theories can complement each other: Embodiment strengthens generative lexicon by spelling out the cognitive reasons which motivate meaning generation; and generative lexicon, specifically the qualia structure, reinforces embodiment by accounting for the reason underlying the selection of a particular body part for metaphorization. Choosing to analyze how the four body parts—血 xie ‘blood’, 肉 rou ‘flesh’, 骨 gu ‘bone’, 脈 mai ‘meridian’—behave in the Taiwan Presidential Corpus, this dissertation aims to answer the following questions: (1) How do embodiment and generative lexicon interact? Does the qualia role influence the metaphorical/metonymical use of the body part terms? Or does the metaphorical/metonymical use of the body part terms facilitate the retrieval of the qualia role? (2) What is the significance of qualia structure in constraining the selection of body parts for metaphorical/metonymical use? (3) What is the significance of the qualia structure and the generative mechanisms in the formulation and comprehension of the conceptual pairings involving body parts? (4) How are political ideas conceptualized by the country leadership’s use of corporeal terminology? In other words, how can we establish the association between the activation of certain body parts and a certain political context? This dissertation, built on the potentiality to incorporate embodiment and generative lexicon, investigates the body part metaphors/metonyms used in the leadership rhetoric in Taiwan. We hypothesize that different body parts are activated in different ways in political speeches due to their distinctive features and functions, and the visibility and telicity of a body part are the major reasons why the body part is chosen for metaphorical/metonymical use. Moreover, different political agenda are likely to be reflected in the particular uses of corporeal terms, and the change of the socio-political context should lead to the diverging uses of an identical body part referred to in the speeches. This dissertation will contribute to research on conceptual metaphor, generative lexicon, as well as political discourse. Methodologically, this research, modifying the metaphor identification procedure (Pragglejaz Group 2007), provides a better solution for metaphor identification in Chinese data. With the incorporation of generative lexicon, it furthermore facilitates the researcher to more accurately formulate the conceptual mappings involving body part terms, and to better comprehend metaphorically used body parts. Theoretically, taking generative lexicon into consideration, it establishes correlation between qualia roles and the conceptual mappings. Based on the findings, it also predicts that the visibility and telicity of a body part are the most dominant reasons which activate the choice of a body part for metaphorical/metonymical use. In the light of political discourse, it systematically analyzes how the human body parts are interweaved in the country leadership rhetoric, revealing the influence exerted by political context upon the use of corporeal terminology.
Šindlerová, Jana. "Slovesná valence v srovnávacím pohledu". Doctoral thesis, 2018. http://www.nusl.cz/ntk/nusl-391348.
Texto completoSUN, CHONG-TENG y 孫崇騰. "Lexicon-driven generation in machine translation". Thesis, 1991. http://ndltd.ncl.edu.tw/handle/76322452626244669374.
Texto completoYen-JenTai y 戴延任. "Automatic Domain-Specific Sentiment Lexicon Generation with Label Propagation". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/02422995122929581401.
Texto completo國立成功大學
資訊工程學系碩博士班
101
Nowadays, the advance of social media has led to the explosive growth of opinion data. Therefore, sentiment analysis has attracted a lot of attentions. Currently, sentiment analysis applications are divided into two main approaches, the lexicon-based approach and the machine-learning approach. However, both of them face the challenge of obtaining a large amount of human-labeled training data and corpus. For the lexicon-based approach, it requires a sentiment lexicon (sentiment dictionary) to determine the opinion polarity. There are many existing benchmark sentiment lexicons, but they cannot cover all the domain-specific words meanings. Thus, automatic generation of a domain-specific sentiment lexicon becomes an important task. In this paper, we propose a framework to automatically generate sentiment lexicon. First, we determine the semantic similarity between two words in the entire unlabeled corpus. We treat the words as nodes and similarities as weighted edges to construct word graphs. A graph-based semi-supervised label propagation method finally assigns the polarity to unlabeled words through the proposed propagation process. Experiments conducted on the microblog data, Twitter, show that our approach leads to a better performance than baseline approaches and general-purpose sentiment dictionaries.
Hsieh-WeiChen y 陳謝瑋. "Hierarchical Multi-Dimensional Subjectivity-Lexicon Generation Model for Opinion Analysis". Thesis, 2010. http://ndltd.ncl.edu.tw/handle/96129427338689528486.
Texto completo國立成功大學
資訊工程學系碩博士班
98
Opinion mining and sentiment analysis, an emerging area of information retrieval and natural language processing aims to opinion retrieval and subjectivity classification and clustering, has been attracting more and more attention from the academy and industry recently. Traditional approaches mainly focus on polarity classification, which the limitations are addressed in this thesis. As the limitations of the well-studied polarity opinion analysis, the traditional approaches are not adequate for criticism analysis which requires more refined analysis techniques and modeling. The five major contributions of this thesis are: first, a Multi-Dimensional Opinion Analysis (MDOA) framework for criticism analysis; second, an unsupervised Multi-Dimensional Subjectivity-Lexicon (MDSL) generation scheme; third, a semi-supervised Hierarchical MDSL (H-MDSL) generation model; forth, a modified Semi-Supervised Kernel k-Means clustering algorithm; fifth, a non-human-intervention-required evaluation scheme based on constraint agreement and violation quantification. The MDOA framework consists of four major steps: first, creating a dataset by crawling blog posts of reviews; secondly, creating a “subjectivity-term to object” matrix, with each subjectivity-term is modeled as a vector in a high dimensional space; thirdly, transforming each subjectivity-term into a new feature-space to create the final MDSL in which the feature-space should well-represent the subjectivity-terms; and fourthly, employing the learned MDSL for opinion analysis. In the experiments, first, the limitations of traditional polarity opinion analysis are addressed. Second, the entropy analysis of the learned MDSL and H-MDSL in the transformed feature space is performed. It shows that the improvement by the feature transformation can be up to 31% in terms of the entropy of the learned features. Third, the constraint agreement and violation evaluation of the proposed models and algorithms are performed, which shows the proposed model outperforms the others by at least 21% in error rate and hit rate. Fourth, comparison with traditional polarity approaches is also presented. In such comparison, it shows that the proposed framework is not only capable of traditional polarity classification but also more capable of providing meaningful semantic information in criticism analysis.
Dorr, Bonnie J. "Lexical Conceptual Structure and Generation in Machine Translation". 1989. http://hdl.handle.net/1721.1/6018.
Texto completoTeixeira, Joana Alexandra Vaz. "L2 Acquisition at the interfaces: Subject-verb inversion in L2 English and its pedagogical implications". Doctoral thesis, 2018. http://hdl.handle.net/10362/54381.
Texto completoA presente tese aborda dois tipos de interfaces que se tornaram recentemente áreas de interesse centrais na investigação desenvolvida em aquisição de língua segunda (L2) numa perspetiva generativista: (i) interfaces linguísticas – a interface sintaxe-discurso (o nosso foco principal de investigação) e a interface léxico-sintaxe na aquisição de L2 por adultos –, e (ii) uma interface interdisciplinar – a interface entre os domínios de aquisição e didática de L2. A tese pretende lançar nova luz sobre quatro questões que continuam a gerar muito debate no domínio de aquisição de L2: (i) Serão as propriedades “puramente” (léxico-)sintáticas completamente adquiríveis no estádio final de aquisição de L2, como a Hipótese de Interface (HI) (Sorace & Filiaci, 2006, Sorace, 2011b) propõe? (ii) Serão as propriedades na interface entre sintaxe e discurso necessariamente um locus de opcionalidade no estádio final de aquisição de L2, como a HI prediz? (iii) Quais são os papéis da influência da língua materna (L1), do input e de fatores de processamento na aquisição de L2 na interface sintaxe-discurso? (iv) Será que o ensino explícito ajuda os falantes de L2 a ultrapassarem problemas persistentes na aquisição de propriedades sintáticas e de sintático-discursivas? A fim de investigar estas questões, a tese debruça-se sobre um fenómeno linguístico ainda pouco investigado no domínio de aquisição de L2: a inversão sujeito-verbo (ISV) em inglês L2. Três tipos de ISV são considerados aqui: (i) a inversão “livre” (e sua correlação com sujeitos nulos), (ii) a inversão locativa e (iii) construções com there com verbos que não be (‘ser/estar’). A primeira é agramatical em inglês por um fator estritamente sintático: esta língua fixa o valor negativo para o parâmetro do sujeito nulo. Os dois últimos tipos de ISV, por seu lado, são possíveis em inglês em certas condições (léxico-)sintáticas e discursivas. A tese compreende dois estudos experimentais: (i) um estudo sobre a aquisição das propriedades lexicais, sintáticas e discursivas da ISV por falantes avançados e quase nativos de inglês que têm como L1 o francês (uma língua semelhante ao inglês nos aspetos relevantes) e o português europeu (uma língua diferente do inglês nos aspetos relevantes) e (ii) um estudo sobre o impacto do ensino explícito de gramática na aquisição de propriedades “estritamente” sintáticas e sintático-discursivas da ISV por falantes de português europeu com um nível intermédio e avançado em inglês L2. No primeiro estudo, os participantes são testados através de três tipos de tarefas: tarefas drag and drop não temporizadas, tarefas de priming sintático e tarefas de juízos de aceitabilidade rápidos. Em conjunto, os resultados destas tarefas confirmam que, como predito pela HI, as propriedades da ISV que são puramente (léxico-)sintáticas não são problemáticas no estádio final da aquisição de L2, mas aquelas que envolvem a interface entre sintaxe e discurso são um locus de opcionalidade permanente, mesmo quando a L1 é semelhante à L2. Os resultados são, além disso, consistentes com a proposta da HI de que a opcionalidade encontrada na interface sintaxe-discurso é causada (principalmente) por ineficiências de processamento associadas ao bilinguismo. Além de apresentar nova evidência experimental a favor da HI, este estudo mostra que o grau de opcionalidade que os falantes de L2 exibem na interface sintaxe-discurso é moderado pelas seguintes variáveis, que não têm sido (suficientemente) consideradas na literatura sobre a HI: (i) a frequência da construção na língua alvo (construção muito rara → mais opcionalidade), (ii) a quantidade e/ou distância das informações contextuais que o falante precisa processar (muitas informações contextuais no contexto inter-frásico → mais opcionalidade), (iii) o nível de proficiência na L2 (menor nível de proficiência → mais opcionalidade), e (iv) a (dis)semelhança entre a L1 e a L2 (L1 ≠ L2 → mais opcionalidade). O estudo de intervenção didática compreende um pré-teste e dois pós-testes após a intervenção e testa os participantes através de tarefas de juízos de aceitabilidade rápidos. Este estudo mostra que o ensino explícito da gramática pode resultar em ganhos duradouros para os aprendentes de L2, mas a sua eficácia é moderada por dois fatores: (i) o tipo de domínio(s) linguístico(s) em que propriedade alvo se situa e (ii) o grau de developmental readiness dos aprendentes para adquirirem a propriedade alvo. Em relação ao fator (i), os resultados deste estudo indicam que a área que constitui um locus de opcionalidade permanente na aquisição de L2 – a interface sintaxe-discurso – é muito menos permeável a efeitos de ensino do que a sintaxe “pura”. Em relação ao fator (ii), os resultados sugerem que o ensino explícito facilita a aquisição de L2 apenas quando os aprendentes atingiram um estádio de desenvolvimento em que já lhes é possível adquirir a propriedade alvo. Como estes resultados são relevantes não só para a teoria de aquisição de L2, mas também para o ensino de L2, a tese inclui uma análise da relevância e potenciais implicações dos seus resultados para o ensino da gramática em L2.
Dorr, Bonnie J. "A Lexical Conceptual Approach to Generation for Machine Translation". 1988. http://hdl.handle.net/1721.1/6482.
Texto completoRao, Leela A. "Verbal fluency as a measure of lexico-semantic access and cognitive control in bilingual aphasia". Thesis, 2018. https://hdl.handle.net/2144/31113.
Texto completoSchwanhäuβer, Barbara, University of Western Sydney, College of Arts y MARCS Auditory Laboratories. "Lexical tone perception and production : the role of language and musical background". 2007. http://handle.uws.edu.au:8081/1959.7/31791.
Texto completoDoctor of Philosophy (PhD)
Chang, Ren-Fen y 張仁芬. "Lexical Selection and Sentence Generation in an English-Chinese Machine Translation System: A Corpus-Based Approach". Thesis, 1994. http://ndltd.ncl.edu.tw/handle/48999564144185957888.
Texto completoMcKinney, Kellin Lee. "Lexical errors produced during category generation tasks by bilingual adults and bilingual typically developing and language-impaired seven to nine-year-old children". Thesis, 2009. http://hdl.handle.net/2152/ETD-UT-2009-12-562.
Texto completotext
Bílka, Ondřej. "Pattern matching in compilers". Master's thesis, 2012. http://www.nusl.cz/ntk/nusl-305136.
Texto completoGoláňová, Hana. "Nářeční slovník jihozápadního Vsetínska". Doctoral thesis, 2013. http://www.nusl.cz/ntk/nusl-322634.
Texto completoLambrey, Florie. "Implémentation des collocations pour la réalisation de texte multilingue". Thèse, 2016. http://hdl.handle.net/1866/18769.
Texto completoNatural Language Generation (NLG) produces text in natural language from non-linguistic content. NLG aims at developing generators that are reusable across languages and applications. In order to do so, these systems’ architecture is modular: while the deep generation module determines the content of the message to be expressed, the text realization module maps the message into its most appropriate linguistic form. Multilingual text realization requires to model the core linguistic phenomena that one finds in language. Collocations represent one of the core linguistic phenomena that remain problematic not only in NLG, but also in Natural Language Processing in general. The Meaning-Text theory analyses collocations as constraints on lexical selection. In other words, a collocation is made up of three constituents: (i) the base, (ii) the collocate, chosen according to (iii) a semantico-lexical relation. Some of these semantico-lexical relations are systematic and shared by many collocations. Lexical functions are a system for modeling these relations. In fact, collocations such as heavy rain or strong preference instantiate the same relation, intensity, can be described with the lexical function Magn: Magn(RAIN) = HEAVY, Magn(PREFERENCE) = STRONG, etc. There are hundreds of lexical functions. Our work presents a methodology for the implementation of collocations in a multilingual text realization engine, GÉCO, that relies on simple and complex syntagmatic standard lexical functions. The principal aspect of the methodology consists of regrouping lexical functions that show a similar behavior into generic patterns. As a result, 26 000 lexical functions have been implemented, which is a considerable progress in the treatment of collocations in multilingual text realization.