Dissertations / Theses: 'Coreference'

1

Nicol, Janet Lee. "Coreference processing during sentence comprehension." Thesis, Massachusetts Institute of Technology, 1988. http://hdl.handle.net/1721.1/14421.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Yoon, Chulmin. "Essays on De Jure Coreference." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1595589314146257.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

HERNANDEZ, ADRIEL GARCIA. "COREFERENCE RESOLUTION FOR THE ENGLISH LANGUAGE." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2017. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30730@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
FUNDAÇÃO DE APOIO À PESQUISA DO ESTADO DO RIO DE JANEIRO
PROGRAMA DE EXCELENCIA ACADEMICA
BOLSA NOTA 10
Um dos problemas encontrados nos sistemas de processamento de linguagem natural é a dificuldade em identificar elementos textuais que se referem à mesma entidade. Este fenômeno é chamado de correferência. Resolver esse problema é parte integrante da compreensão do discurso, permitindo que os usuários da linguagem conectem as partes da informação de fala relativas à mesma entidade. Por conseguinte, a resolução de correferência é um importante foco de atenção no processamento da linguagem natural.Apesar da riqueza das pesquisas existentes, o desempenho atual dos sistemas de resolução de correferência ainda não atingiu um nível satisfatório. Neste trabalho, descrevemos um sistema de aprendizado estruturado para resolução de correferências em restrições que explora duas técnicas: árvores de correferência latente e indução automática de atributos guiadas por entropia. A modelagem de árvore latente torna o problema de aprendizagem computacionalmente viável porque incorpora uma estrutura escondida relevante. Além disso, utilizando um método automático de indução de recursos, podemos construir eficientemente modelos não-lineares, usando algoritmos de aprendizado de modelo linear como, por exemplo, o algoritmo de perceptron estruturado e esparso.Nós avaliamos o sistema para textos em inglês, utilizando o conjunto de dados da CoNLL-2012 Shared Task. Para a língua inglesa, nosso sistema obteve um valor de 62.24 por cento no score oficial dessa competição. Este resultado está abaixo do desempenho no estado da arte para esta tarefa que é de 65.73 por cento. No entanto, nossa solução reduz significativamente o tempo de obtenção dos clusters dos documentos, pois, nosso sistema leva 0.35 segundos por documento no conjunto de testes, enquanto no estado da arte, leva 5 segundos para cada um.
One of the problems found in natural language processing systems, is the difficulty to identify textual elements referring to the same entity, this task is called coreference. Solving this problem is an integral part of discourse comprehension since it allows language users to connect the pieces of speech information concerning to the same entity. Consequently, coreference resolution is a key task in natural language processing.Despite the large efforts of existing research, the current performance of coreference resolution systems has not reached a satisfactory level yet. In this work, we describe a structure learning system for unrestricted coreferencere solution that explores two techniques: latent coreference trees and automatic entropy-guided feature induction. The latent tree modeling makes the learning problem computationally feasible,since it incorporates are levant hidden structure. Additionally,using an automatic feature induction method, we can efciently build enhanced non-linear models using linear model learning algorithms, namely, the structure dandsparse perceptron algorithm. We evaluate the system on the CoNLL-2012 Shared Task closed track data set, for the English portion. The proposed system obtains a 62.24 per cent value on the competition s official score. This result is be low the 65.73 per cent, the state-of-the-art performance for this task. Nevertheless, our solution significantly reduces the time to obtain the clusters of adocument, since, our system takes 0.35 seconds per document in the testing set, while in the state-of-the-art, it takes 5 seconds for each one.

APA, Harvard, Vancouver, ISO, and other styles

4

WERNER, ENEIDA FIGUEIRA DE ALMEIDA. "REVISION IN WRITING AND COREFERENCE ISSUES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2018. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=36163@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
O objetivo desta tese é investigar o processo de revisão da escrita e o processo de estabelecimento da correferência quanto à forma como são monitorados por grupos com diferentes graus de experiência em escrita. A pesquisa insere-se no quadro dos estudos sobre processamento da escrita, focalizando o processo da produção, e ancora-se, teoricamente, no tocante à pesquisa em escrita, no modelo de processamento cognitivo da escrita de Flower e Hayes (1980) e no modelo de revisão de Hayes (1987). Nos estudos da correferência, consideram-se as principais teorias voltadas para a investigação da influência de fatores que favorecem a acessibilidade à memória para seu estabelecimento, a Teoria da Acessibilidade (Ariel, 1990), a Teoria da Centralização (Grosz, Joshi e Weinstein, 1995) e a Hipótese da Carga Informacional (Almor, 1999). Relacionamos as questões teóricas aos dados de natureza cognitiva obtidos por meio de metodologia experimental. O laboratório utilizado foi o LAPAL, na PUC-Rio. Os experimentos conduzidos basearam-se em tarefas de produção e revisão de textos. Foi utilizada a ferramenta de keystroke logging Inputlog (http://www.inputlog.net/) para gravação e análise dos dados. Os participantes eram alunos de graduação e de pós-graduação de uma instituição pública e de uma instituição privada no Rio de Janeiro. No primeiro experimento foram analisados dados de natureza global do processamento da escrita e do processamento da correferência a partir de imagens-estímulos de duas histórias em quadrinhos, sem material verbal. No que tange ao comportamento global do processamento de escrita, foram verificadas medidas relativas ao processo e ao produto do texto produzido (em termos de número de caracteres e de palavras) e também relativas a pausas e tipos de revisões realizadas. No âmbito das medidas voltadas especificamente para o processamento da correferência, foramanalisados dados relacionados aos tipos de expressões referenciais selecionadas para introduzir e retomar entidades discursivas, bem como quanto ao momento em que elementos de retomada foram revistos (revisão do tipo imediata ou posterior) e à natureza do tipo de alteração implementada no que tange ao grau de especificidade do termo usado na substituição (mais/menos específico). O segundo experimento objetivou investigar os fatores que influenciam a escolha de uma expressão referencial anafórica a partir da informação contida no antecedente. Foi conduzida tarefa de revisão com quatro textos de mesmo tipo narrativo. Em cada tipo de texto avaliou-se os tipos de retomadas anafóricas das expressões referenciais em função do grau de ativação de informação na memória favorecido pela acessibilidade ao antecedente. Foram tomadas como variáveis independentes a função sintática do antecedente (mais suj.; menos suj.), o papel temático (mais agente; menos agente), e a distância entre o antecedente e o elemento de retomada (igual período; diferente período). No primeiro experimento os resultados apontaram divergências entre os tipos de revisões efetuadas (imediatas/posteriores) e quanto à proporção de revisões efetuadas (apagamentos/inserções) indicando que o grupo de alunos de pós-graduação empregou mais qualitativamente estratégias e recursos de revisão no monitoramento de seus textos do que os alunos de graduação. No segundo experimento, na análise estatística conduzida para cada grupo separadamente, foi verificado efeito principal de posição sintática (nos 2 grupos), distância (nos 2 grupos), e papel temático (no grupo de pós-graduação). Além disso, verificaram-se efeitos de interação entre posição e distância, e entre posição, papel temático e distância (grupo de graduação) e de posição e distância (grupo de pós-graduação). A qualidade das revisões efetuadas foi diferente, tendo o grupo de alunos de pós-graduação efetuado mais revisões do tipo posterior e percentualmente mais revisões que implicaram modificações na qualidade textual. Em conjunt
The purpose of this doctoral thesis is to investigate the writing process and the process of establishing coreference as to how they are monitored by groups of different degrees of writing experience. The research is part of the study of writing processing, focusing on the production process, and is theoretically anchored in writing research related to the Cognitive Writing Model of Hayes and Flower (1980) and in Hayes s Writing Revision Model (1987). In the studies of coreference, we consider the main theories that investigate the influence of factors that favour accessibility to memory, Accessibility Theory (Ariel, 1990), the Centering Theory (Grosz, Joshi and Weinstein, 1995) and the Information Load Hypothesis (Almor, 1999). We related the theoretical questions to the data captured by means of experimental methodology. The laboratory used was LAPAL, at PUC-Rio. The experiments conducted were based on writing production and revision tasks and we used the technological tool of keystroke logging Inputlog (http://www.inputlog.net/) to record and analyse data. Participants were graduate and post graduate students of public and private institutions in Rio de Janeiro.In the first experiment the data analysed related to production of writing and coreference processing from image-stimuli of two comic strips without verbal material. Concerning the measures related to writing production, we analysed the relation between the process and product in terms of the number of characters and words as well as pauses and the types of revisions made. Regarding the measures of coreference processing, we examined the types of of referential expressions selected to introduce and to establish coreference within discourse entities, as well as data related to the moment when correferential elements were revised (immediate or delayed revisions) and the degree of specificity implied in the alterations worked out. The second experiment aimed to investigate the factors that influence the choice of anaphoric referential expressions from the type of information contained in the antecedent. We conducted an experiment of writing revision consisting of four different texts of the same discursive genre. In each of them we took into account the degree of activation in memory provided by information that favours accessibility to memory stored items. The independent variables were the syntactic function of the antecedent(more subject/less subject), the thematic role of the antecedent (more agent/less agent) and the the distance between the antecedent and the anaphoric referential expression (equal period/different period). Results from the first experiment pointed out differences between the types of revisions (immediate/delayed) and the proportion of revisions made (deletions/insertions) indicating that post-graduate group used more revision strategy resources while monitoring their production as compared to the group of graduates. In the second experiment, statistical analysis conducted for each group separately revealed effects of the factors considered as for syntactic position (in the 2 groups), thematic role (in the post-graduates group) and distance (in both groups). In addition, interaction effects between distance and syntactic position and between position, thematic role and distance (graduates group) and position and distance (post-graduates group) were significant. The quality of the revisions was proven diverse, having post-graduates proceeded to more delayed revisions that imply alteration in overall text quality than the group of graduates. As a whole, the experiments conducted allowed us to identify differences between the experimental groups and suggest evidence that schooling level plays an important role in writing and in the choices made in for coreference processing.

APA, Harvard, Vancouver, ISO, and other styles

5

Bodnari, Andreea. "Joint multilingual learning for coreference resolution." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/91126.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
98
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 112-120).
Natural language is a pervasive human skill not yet fully achievable by automated computing systems. The main challenge is understanding how to computationally model both the depth and the breadth of natural languages. In this thesis, I present two probabilistic models that systematically model both the depth and the breadth of natural languages for two different linguistic tasks: syntactic parsing and joint learning of named entity recognition and coreference resolution. The syntactic parsing model outperforms current state-of-the-art models by discovering linguistic information shared across languages at the granular level of a sentence. The coreference resolution system is one of the first attempts at joint multilingual modeling of named entity recognition and coreference resolution with limited linguistic resources. It performs second best on three out of four languages when compared to state-of-the-art systems built with rich linguistic resources. I show that we can simultaneously model both the depth and the breadth of natural languages using the underlying linguistic structure shared across languages.
by Andreea Bodnari.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

6

Webster, Kellie. "Improved Coreference Resolution Using Cognitive Insights." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/15468.

Full text

Abstract:

Coreference resolution is the task of extracting referential expressions, or mentions, in text and clustering these by the entity or concept they refer to. The sustained research interest in the task reflects the richness of reference expression usage in natural language and the difficulty in encoding insights from linguistic and cognitive theories effectively. In this thesis, we design and implement LIMERIC, a state-of-the-art coreference resolution engine. LIMERIC naturally incorporates both non-local decoding and entity-level modelling to achieve the highly competitive benchmark performance of 64.22% and 59.99% on the CoNLL-2012 benchmark with a simple model and a baseline feature set. As well as strong performance, a key contribution of this work is a reconceptualisation of the coreference task. We draw an analogy between shift-reduce parsing and coreference resolution to develop an algorithm which naturally mimics cognitive models of human discourse processing. In our feature development work, we leverage insights from cognitive theories to improve our modelling. Each contribution achieves statistically significant improvements and sum to gains of 1.65% and 1.66% on the CoNLL-2012 benchmark, yielding performance values of 65.76% and 61.27%. For each novel feature we propose, we contribute an accompanying analysis so as to better understand how cognitive theories apply to real language data. LIMERIC is at once a platform for exploring cognitive insights into coreference and a viable alternative to current systems. We are excited by the promise of incorporating our and further cognitive insights into more complex frameworks since this has the potential to both improve the performance of computational models, as well as our understanding of the mechanisms underpinning human reference resolution.

APA, Harvard, Vancouver, ISO, and other styles

7

Corazza, Michele. "Coreference Resoultion basata su reti neurali deep." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14554/.

Full text

Abstract:

L’utilizzo di reti neurali deep nell’ambito dell’elaborazione del linguaggio naturale sta conducendo negli ultimi anni a risultati significativi in task molto disparati, dalla speech recognition all’analisi semantica. La ragione di tali innovazioni risiede nelle capacità computazionali odierne, in grado di supportare l’utilizzo di reti neurali con molti livelli nascosti, dette appunto deep, e di strumenti innovativi quali le recurrent neural network, convolutional neural network e la possibilità di costruire word embedding tramite word2vec o strumenti analoghi. Fra i task irrisolti nell’ambito delle reti neurali, è di particolare interesse lo studio della coreference resolution. In tale task l’obiettivo è quello di risolvere le coreferenze in un testo, ovvero associare menzioni che si riferiscono ad una stessa entità. Il fenomeno in esame risulta particolarmente interessante, in quanto comprende aspetti semantici e sintattici del linguaggio, che devono essere utilizzati per giungere a buoni risultati. Un ulteriore caratteristica della coreference è la relazione di tale fenomeno con il concetto di “contesto linguistico”. È infatti dal contesto che circonda una menzione che è possibile intuire a quale entità esso si riferisca. Si presenta con questa tesi un solver per la coreference basato su reti neurali deep, che sfrutti reti recurrent per trattare il problema. La proposta si basa sulla supposizione che sia necessario introdurre delle componenti della rete che siano in grado di fornire una rappresentazione delle menzioni, in modo da poter utilizzare tali risultati per affrontare il problema della coreference resolution.

APA, Harvard, Vancouver, ISO, and other styles

8

Nilsson, Kristina. "Hybrid Methods for Coreference Resolution in Swedish." Doctoral thesis, Stockholm : Department of Linguistics, Stockholm University, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-38395.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Christiansen, Thomas Wulstan. "Coreference and noun phrase selection in Italian." Thesis, University of Salford, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365982.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Kunz, Jenny. "Neural Language Models with Explicit Coreference Decision." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-371827.

Full text

Abstract:

Coreference is an important and frequent concept in any form of discourse, and Coreference Resolution (CR) a widely used task in Natural Language Understanding (NLU). In this thesis, we implement and explore two recent models that include the concept of coreference in Recurrent Neural Network (RNN)-based Language Models (LM). Entity and reference decisions are modeled explicitly in these models using attention mechanisms. Both models learn to save the previously observed entities in a set and to decide if the next token created by the LM is a mention of one of the entities in the set, an entity that has not been observed yet, or not an entity. After a theoretical analysis where we compare the two LMs to each other and to a state of the art Coreference Resolution system, we perform an extensive quantitative and qualitative analysis. For this purpose, we train the two models and a classical RNN-LM as the baseline model on the OntoNotes 5.0 corpus with coreference annotation. While we do not reach the baseline in the perplexity metric, we show that the models’ relative performance on entity tokens has the potential to improve when including the explicit entity modeling. We show that the most challenging point in the systems is the decision if the next token is an entity token, while the decision which entity the next token refers to performs comparatively well. Our analysis in the context of a text generation task shows that a wide-spread error source for the mention creation process is the confusion of tokens that refer to related but different entities in the real world, presumably a result of the context-based word representations in the models. Our re-implementation of the DeepMind model by Yang et al. 2016 performs notably better than the re-implementation of the EntityNLM model by Ji et al. 2017 with a perplexity of 107 compared to a perplexity of 131.

APA, Harvard, Vancouver, ISO, and other styles

11

Rolih, Gabi. "Applying Coreference Resolution for Usage in Dialog Systems." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-353730.

Full text

Abstract:

Using references in language is a major part of communication, and understanding them is not a challenge for humans. Recent years have seen increased usage of dialog systems that interact with humans in natural language to assist them in various tasks, but even the most sophisticated systems still struggle with understanding references. In this thesis, we adapt a coreference resolution system for usage in dialog systems and try to understand what is needed for an efficient understanding of references in dialog systems. We annotate a portion of logs from a customer service system and perform an analysis of the most common coreferring expressions appearing in this type of data. This analysis shows that most coreferring expressions are nominal and pronominal, and they usually appear within two sentences of each other. We implement Stanford's Multi-Pass Sieve with some adaptations and dialog-specific changes and integrate it into a dialog system framework. The preprocessing pipeline makes use of already existing NLP-tools, while some new ones are added, such as a chunker, a head-finding algorithm and a NER-like system. To analyze both user input and output of the system, we deploy two separate coreference resolution systems that interact with each other. An evaluation is performed on the system and its separate parts in five most common evaluation metrics. The system does not achieve state-of-the art numbers, but because of its domain-specific nature that is expected. Some parts of the system do not have any effect on the performance, while the dialog-specific changes contribute to it greatly. An error analysis is concluded and reveals some problems with the implementation, but more importantly, it shows how the system could be further improved by using other types of knowledge and dialog-specific features.

APA, Harvard, Vancouver, ISO, and other styles

12

Patel, Chandankumar Johakhim. "A Performance Analysis Framework for Coreference Resolution Algorithms." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1471954403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Shrimpton, Luke William. "Efficient techniques for streaming cross document coreference resolution." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28895.

Full text

Abstract:

Large text streams are commonplace; news organisations are constantly producing stories and people are constantly writing social media posts. These streams should be analysed in real-time so useful information can be extracted and acted upon instantly. When natural disasters occur people want to be informed, when companies announce new products financial institutions want to know and when celebrities do things their legions of fans want to feel involved. In all these examples people care about getting information in real-time (low latency). These streams are massively varied, people’s interests are typically classified by the entities they are interested in. Organising a stream by the entity being referred to would help people extract the information useful to them. This is a difficult task: fans of ‘Captain America’ films will not want to be incorrectly told that ‘Chris Evans’ (the main actor) was appointed to host ‘Top Gear’ when it was a different ‘Chris Evans’. People who use local idiosyncrasies such as referring to their home county (‘Cornwall’) as ‘Kernow’ (the Cornish for ‘Cornwall’ that has entered the local lexicon) should not be forced to change their language when finding out information about their home. This thesis addresses a core problem for real-time entity-specific NLP: Streaming cross document coreference resolution (CDC), how to automatically identify all the entities mentioned in a stream in real-time. This thesis address two significant problems for streaming CDC: There is no representative dataset and existing systems consume more resources over time. A new technique to create datasets is introduced and it was applied to social media (Twitter) to create a large (6M mentions) and challenging new CDC dataset that contains a much more variend range of entities than typical newswire streams. Existing systems are not able to keep up with large data streams. This problem is addressed with a streaming CDC system that stores a constant sized set of mentions. New techniques to maintain the sample are introduced significantly out-performing existing ones maintaining 95% of the performance of a non-streaming system while only using 20% of the memory.

APA, Harvard, Vancouver, ISO, and other styles

14

Sapena, Masip Emili. "A constraint-based hypergraph partitioning approach to coreference resolution." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/83904.

Full text

Abstract:

The objectives of this thesis are focused on research in machine learning for coreference resolution. Coreference resolution is a natural language processing task that consists of determining the expressions in a discourse that mention or refer to the same entity. The main contributions of this thesis are (i) a new approach to coreference resolution based on constraint satisfaction, using a hypergraph to represent the problem and solving it by relaxation labeling; and (ii) research towards improving coreference resolution performance using world knowledge extracted from Wikipedia. The developed approach is able to use entity-mention classi cation model with more expressiveness than the pair-based ones, and overcome the weaknesses of previous approaches in the state of the art such as linking contradictions, classi cations without context and lack of information evaluating pairs. Furthermore, the approach allows the incorporation of new information by adding constraints, and a research has been done in order to use world knowledge to improve performances. RelaxCor, the implementation of the approach, achieved results in the state of the art, and participated in international competitions: SemEval-2010 and CoNLL-2011. RelaxCor achieved second position in CoNLL-2011.
La resolució de correferències és una tasca de processament del llenguatge natural que consisteix en determinar les expressions d'un discurs que es refereixen a la mateixa entitat del mon real. La tasca té un efecte directe en la minería de textos així com en moltes tasques de llenguatge natural que requereixin interpretació del discurs com resumidors, responedors de preguntes o traducció automàtica. Resoldre les correferències és essencial si es vol poder “entendre” un text o un discurs. Els objectius d'aquesta tesi es centren en la recerca en resolució de correferències amb aprenentatge automàtic. Concretament, els objectius de la recerca es centren en els següents camps: + Models de classificació: Els models de classificació més comuns a l'estat de l'art estan basats en la classificació independent de parelles de mencions. Més recentment han aparegut models que classifiquen grups de mencions. Un dels objectius de la tesi és incorporar el model entity-mention a l'aproximació desenvolupada. + Representació del problema: Encara no hi ha una representació definitiva del problema. En aquesta tesi es presenta una representació en hypergraf. + Algorismes de resolució. Depenent de la representació del problema i del model de classificació, els algorismes de ressolució poden ser molt diversos. Un dels objectius d'aquesta tesi és trobar un algorisme de resolució capaç d'utilitzar els models de classificació en la representació d'hypergraf. + Representació del coneixement: Per poder administrar coneixement de diverses fonts, cal una representació simbòlica i expressiva d'aquest coneixement. En aquesta tesi es proposa l'ús de restriccions. + Incorporació de coneixement del mon: Algunes correferències no es poden resoldre només amb informació lingüística. Sovint cal sentit comú i coneixement del mon per poder resoldre coreferències. En aquesta tesi es proposa un mètode per extreure coneixement del mon de Wikipedia i incorporar-lo al sistem de resolució. Les contribucions principals d'aquesta tesi son (i) una nova aproximació al problema de resolució de correferències basada en satisfacció de restriccions, fent servir un hypergraf per representar el problema, i resolent-ho amb l'algorisme relaxation labeling; i (ii) una recerca per millorar els resultats afegint informació del mon extreta de la Wikipedia. L'aproximació presentada pot fer servir els models mention-pair i entity-mention de forma combinada evitant així els problemes que es troben moltes altres aproximacions de l'estat de l'art com per exemple: contradiccions de classificacions independents, falta de context i falta d'informació. A més a més, l'aproximació presentada permet incorporar informació afegint restriccions i s'ha fet recerca per aconseguir afegir informació del mon que millori els resultats. RelaxCor, el sistema que ha estat implementat durant la tesi per experimentar amb l'aproximació proposada, ha aconseguit uns resultats comparables als millors que hi ha a l'estat de l'art. S'ha participat a les competicions internacionals SemEval-2010 i CoNLL-2011. RelaxCor va obtenir la segona posició al CoNLL-2010.

APA, Harvard, Vancouver, ISO, and other styles

15

Shyu, Eric. "Latent tree structure learning for cross-document coreference resolution." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/91867.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 77-79).
Cross Document Coreference Resolution (CDCR) is the problem of learning which mentions, coming from several different documents, correspond to the same entity. This thesis approaches the CDCR problem by first turning it into a structure learning problem. A latent tree structure, in which leaves correspond to observed mentions and internal nodes correspond to latent sub-entities, is learned. A greedy clustering heuristic can then be used to select subtrees from the learned tree structure as entities. As with other structure learning problems, it is prudent to envoke Occam's razor and perform regularization to obtain the simplest hypothesis. When the state space consists of tree structures, we can impose a bias on the possible structure. Different aspects of tree structure (i.e. number of edges, depth of the leaves, etc.) can be penalized in these models to improve the generalization of thes models. This thesis draws upon these ideas to provide a new model for CDCR. To learn parameters, we implement a parameter estimation algorithm based on existing stochastic gradient-descent based algorithms and show how to further tune regularization parameters. The latent tree structure is then learned using MCMC inference. We show how structural regularization plays a critical role in the inference procedure. Finally, we empirically show that our model out-performs previous work, without using a sophisticated set of features.
by Eric Shyu.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

16

Martschat, Sebastian [Verfasser], and Michael [Akademischer Betreuer] Strube. "Structured Representations for Coreference Resolution / Sebastian Martschat ; Betreuer: Michael Strube." Heidelberg : Universitätsbibliothek Heidelberg, 2017. http://d-nb.info/1178009653/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Cai, Jie [Verfasser], and Michael [Akademischer Betreuer] Strube. "Coreference Resolution via Hypergraph Partitioning / Jie Cai ; Betreuer: Michael Strube." Heidelberg : Universitätsbibliothek Heidelberg, 2013. http://d-nb.info/1179924339/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

He, Tian Ye. "Coreference resolution on entities and events for hospital discharge summaries." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/45977.

Full text

Abstract:

Includes bibliographical references (p. 76-80).
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
The wealth of medical information contained in electronic medical records (EMRs) and Natural Language Processing (NLP) technologies that can automatically extract information from them have opened the doors to automatic patient-care quality monitoring and medical- assist question answering systems. This thesis studies coreference resolution, an information extraction (IE) subtask that links together specific mentions to each entity. Coreference resolution enables us to find changes in the state of entities and makes it possible to answer questions regarding the information thus obtained. We perform coreference resolution on a specific type of EMR, the hospital discharge summary. We treat coreference resolution as a binary classification problem. Our approach yields insights into the critical features for coreference resolution for entities that fall into five medical semantic categories that commonly appear in discharge summaries.
by Tian Ye He.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

19

Moosavi, Nafise Sadat [Verfasser], and Michael [Akademischer Betreuer] Strube. "Robustness in Coreference Resolution / Nafise Sadat Moosavi ; Betreuer: Michael Strube." Heidelberg : Universitätsbibliothek Heidelberg, 2020. http://d-nb.info/1205210539/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Kobdani, Hamidreza [Verfasser], and Hinrich [Akademischer Betreuer] Schütze. "A modular framework for coreference resolution / Hamidreza Kobdani. Betreuer: Hinrich Schütze." Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2012. http://d-nb.info/1021923303/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Teixeira, ElisÃngela Nogueira. "Syntactic and semantic preferences on coreference processing: Evidence from eye movements." Universidade Federal do CearÃ, 2013. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=9870.

Full text

Abstract:

CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior
Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico
Esta tese tem como objetivo principal contribuir com o desenvolvimento dos estudos psicolinguÃsticos que procuram demonstrar experimentalmente conjecturas teÃricas a respeito do processamento anafÃrico. Tomando por base a Teoria da Acessibilidade (ARIEL, 1991, 2001), a Teoria da CentralizaÃÃo (GROSZ; JOSHI; WEINSTEIN, 1995), os trabalhos em torno da tipicidade do termo antecedente (GARROD; SANFORD, 1977; VAN GOMPEL; LIVERSEDGE; PEARSON, 2004), a HipÃtese da Carga Informacional (ALMOR, 1999) e a HipÃtese da PosiÃÃo do Antecedente (CARMINATI, 2002), trabalhamos com a hipÃtese de que, em perÃodos complexos por coordenaÃÃo e subordinaÃÃo, formados por no mÃximo duas oraÃÃes, a saliÃncia da posiÃÃo sintÃtica de sujeito Ã o principal fator para a resoluÃÃo anafÃrica em lÃngua portuguesa. Fazendo uso de metodologia experimental on-line e off-line, procuramos evidÃncias para nossa hipÃtese em um conjunto formado por quatro estudos, composto por (i) um experimento de compreensÃo de perÃodos complexos por coordenaÃÃo, em que foram manipulados a posiÃÃo do antecedente e o tipo de relaÃÃo semÃntica entre antecedente e anÃfora; (ii) um experimento de compreensÃo de perÃodos complexos por subordinaÃÃo, em que foram manipulados o tipo da correferÃncia anafÃrica, sob a forma de pronome pleno ou nulo, e a posiÃÃo da correferÃncia, anafÃrica ou catafÃrica; (iii) uma sondagem de produÃÃo de perÃodos complexos com uso de pronomes plenos ou nulos como correferentes; e (iv) uma anÃlise dos movimentos oculares durante a leitura de textos autÃnticos em lÃngua portuguesa com o objetivo de encontrar padrÃes de fixaÃÃo oculares. Os estudos foram realizados em um rastreador ocular de 120 Hz que registrou a cada 8 ms a movimentaÃÃo ocular dos participantes durante a leitura dos estÃmulos. As variÃveis dependentes de movimentaÃÃo ocular analisadas foram: (i) o nÃmero de fixaÃÃes; (ii) o tempo da primeira fixaÃÃo; (iii) a duraÃÃo mÃdia da fixaÃÃo ocular; e (iv) o tempo total de fixaÃÃo. A anÃlise conjunta dos resultados dos experimentos sugere que a resoluÃÃo da anÃfora correferencial nos perÃodos complexos estudados Ã uma funÃÃo da proeminÃncia sintÃtica da posiÃÃo de sujeito e que a carga de informaÃÃo das expressÃes anafÃricas com conteÃdo semÃntico parece levar a um aumento de custo durante o processamento anafÃrico de um antecedente altamente acessÃvel.
In this dissertation, our main objective is to contribute for the development and understanding of psycholinguistics studies that attempt to experimentally demonstrate relevant theoretical conjectures about anaphoric processing. Under the conceptual frameworks of the Theory of Accessibility (ARIEL, 1991, 2001), the Theory of Centering (GROSZ; JOSHI; WEINSTEIN, 1995), the studies on the typicality of the antecedent term (GARROD; SANFORD, 1977; VAN GOMPEL; LIVERSEDGE; PEARSON, 2004), the Informational Load Hypothesis (ALMOR, 1999), and the Position of Antecedent Hypothesis (CARMINATI, 2002), we propose that the prominence of the syntactic position in complex sentences plays a major role on the anaphoric resolution in the Portuguese language. Adopting a psycholinguistic methodology based on on-line (tracking of eye movements) as well as off-line observations, we searched for evidence to support our hypothesis from the results of the following set of studies: (i) an experiment to evaluate the comprehension of complex sentences due to coordination, in which both the position of the antecedent and the type of semantic relationship between antecedent and anaphora are manipulated; (ii) an experiment to evaluate the comprehension of complex sentences due to subordination, in which both the type of anaphoric coreference, in the form of a plain or null pronoun, and the position of the coreference, anaphoric or cataphoric, are manipulated; (iii) an experiment for generation of complex sentences, using plain or null pronouns as coreferentials; and (iv) a reading experiment of non-manipulated texts to establish a comparative standard for reading flux in Brazilian Portuguese. Our on-line experiments were performed with an eye-tracker of 120 Hz, which allowed eye movements to be recorded at each 8 milliseconds. The following dependent variables related with the eye movement have been analyzed: (i) the number of fixations; (ii) the duration time of the first fixation; (iii) the average duration of the fixations; and (iv) the total time of fixation. The overall analysis of our results, based on the investigation of complex sentences, suggests that the resolution of the coreferential anaphora is a function of the prominence of the subject position. Moreover, the information load of anaphoric expressions with semantic content seems to increase the cost of the anaphoric processing of a highly accessible antecedent.

APA, Harvard, Vancouver, ISO, and other styles

22

Tomadaki, Eleftheria. "Cross-document coreference between different types of collateral texts for films." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/844096/.

Full text

Abstract:

Recent systems merge information from texts describing video content for video annotation by employing cross-document coreference techniques, mostly realised between the same text genres or in texts including restricted sets of events. We introduce a new, interesting and challenging scenario - film and the variety of collateral text genres narrating its content, including unrestricted sets of events. In particular, cross-document coreference between plot summaries and audio description is challenging, as these two texts differ significantly. The resulting cross-referencing can potentially enrich video annotation. We address the questions of how plot summaries and audio description refer to events depicted in films, whether the same events are expressed by lexical regularities in both texts and how solutions to the cross-document coreference task can be extended to deal with different text genres and unconstrained sets of events. This thesis introduces a new research domain for information extraction and cross-document coreference, reports on a corpus based analysis of the language used in plot summaries and audio description focusing on how events are expressed, proposes and evaluates solutions to the cross-document coreference task for an unconstrained set of events in different text types and provides two data sets for information extraction related research. We make three claims. First, plot summaries and audio description use lexical regularities, such as frequent open class words occurring more frequently than in general language, to describe film content. Second, these two texts use similar terms in referring to entities, but different terms in referring to events, i.e. different frequent, verbs. Frequent plot summary events are referred to by a very few lexical regularities in audio description. Third, the task of cross-document coreference between plot summary and audio description can be automated achieving at least 50% Precision and 33% Recall, by matching nouns, functional roles and some verbs, and taking into account the event temporal aspect. The Recall may be improved mostly by resolving all references to entities, while the Precision may be increased when treating a restricted set of events.

APA, Harvard, Vancouver, ISO, and other styles

23

Lassalle, Emmanuel. "Structured learning with latent trees : a joint approach to coreference resolution." Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCC273.

Full text

Abstract:

Nous explorons différentes manières de définir des systèmes de résolution de la coréférence utilisant des techniques d'apprentissage statistique structuré. Nous mettons au point des modèles supervisés qui apprennent à construire des classes d'équivalence de coréférence à partir de texte brut : notre principal objectif est de définir des modèles capables de traiter les documents de manière globale et structurée afin de créer des sorties cohérentes Nos modèles sont entraînés et évalués sur la partie anglaise du corpus de la Shared Task CoNLL-2012. Nous effectuons des comparaisons détaillées de différentes versions des modèles afin de mettre au point un système complet de résolution de la coréférence
This thesis explores ways to define automated coreference resolution systems by using structured machine leaming techniques. We design supervised models that leam to build coreference clusters from raw text: our main objective is to get model able to process documents globally, in a structurel fashion, to ensure coherent outputs. Our models are trained and evaluated on the English part of die CoNLL-2012 Shared Task annotated corpus with standard metrics. We carry out detailed comparisons of different settings so as to refine our models and design a complete end-to-end coreference resolver. Specifically, we first carry out a preliminary work on improving the way features are employed by linear models for classification: we extend existing work on separating different types of mention pairs to define more accurate classifiers of coreference links. We then define varions structured models based on latent trees to learn to build clusters globally, and not only from die predictions of a mention pair classifier. We study different latent representations (varions shapes and sparsity) and show empirically that die best suited structure is some restricted class of trees related to the best-first rule for selecting coreference links. We further improve this latent representation by integrating anaphoricity modelling jointly with coreference, designing a global (structured at the document level) and joint model outperforming existing models on gold mentions evaluation. We finally design a complete end-to-end resolver and evaluate the improvement obtained by our new modela on detected mentions, a more realistic setting for coreference resolution

APA, Harvard, Vancouver, ISO, and other styles

24

Rösiger, Ina [Verfasser], and Jonas [Akademischer Betreuer] Kuhn. "Computational modelling of coreference and bridging resolution / Ina Rösiger ; Betreuer: Jonas Kuhn." Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2019. http://d-nb.info/1184277826/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Simeonov, Dimitar N. "The use of coreference resolution for understanding manipulation commands for the PR2 Robot." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/77077.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 81-84).
Natural language interaction can enable us to interface with robots such as the Personal Robot 2 (PR2), without the need for a special training or equipment. Programming such a robot to follow commands is challenging because natural language has a complex structure and semantics, a model for which needs to be based on linguistic knowledge or learned from examples. In this thesis we first enable the PR2 robot to follow manipulation commands expressed in natural language by applying the Generalized Grounding Graph (G3 ). We model the PR2's actions and their trajectories in the physical environment, define the state-action space and learn a grounding model from an annotated corpus of robot actions aligned with commands. We achieved lower overall performance than previous implementations of G3 had reported. After that, we present an approach for using the linguistic technique of coreference resolution to improve the robot's ability to understand commands consisting of multiple clauses. We constrain the groundings for coreferent phrases to be identical by merging their nodes in the grounding graph. We show that using coreference information increases the robot ability to infer the right action sequence. This brings the robotic capabilities of modeling and understanding natural language closer to our theoretical understanding of discourse.
by Dimitar N. Simeonov.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

26

Jaffe, Evan. "The Role of Coreference Resolution in Memory- and Expectation-based Models of Human Sentence Processing." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619104248552177.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Zhekova, Desislava Verfasser], Sandra [Akademischer Betreuer] [Kübler, and John A. [Akademischer Betreuer] Bateman. "Towards Multilingual Coreference Resolution / Desislava Zhekova. Gutachter: John Bateman ; Sandra Kübler. Betreuer: Sandra Kübler." Bremen : Staats- und Universitätsbibliothek Bremen, 2013. http://d-nb.info/1072078791/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Griest, Kenneth Campbell. "An analysis of features used to train entity mention detection and coreference resolution classifiers." Connect to online resource, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1447653.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Tourille, Julien. "Extracting Clinical Event Timelines : Temporal Information Extraction and Coreference Resolution in Electronic Health Records." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS603/document.

Full text

Abstract:

Les dossiers patients électroniques contiennent des informations importantes pour la santé publique. La majeure partie de ces informations est contenue dans des documents rédigés en langue naturelle. Bien que le texte texte soit pertinent pour décrire des concepts médicaux complexes, il est difficile d'utiliser cette source de données pour l'aide à la décision, la recherche clinique ou l'analyse statistique.Parmi toutes les informations cliniques intéressantes présentes dans ces dossiers, la chronologie médicale du patient est l'une des plus importantes. Être capable d'extraire automatiquement cette chronologie permettrait d'acquérir une meilleure connaissance de certains phénomènes cliniques tels que la progression des maladies et les effets à long-terme des médicaments. De plus, cela permettrait d'améliorer la qualité des systèmes de question--réponse et de prédiction de résultats cliniques. Par ailleurs, accéder aux chronologiesmédicales est nécessaire pour évaluer la qualité du parcours de soins en le comparant aux recommandations officielles et pour mettre en lumière les étapes de ce parcours auxquelles une attention particulière doit être portée.Dans notre thèse, nous nous concentrons sur la création de ces chronologies médicales en abordant deux questions connexes en traitement automatique des langues: l'extraction d'informations temporelles et la résolution de la coréférence dans des documents cliniques.Concernant l'extraction d'informations temporelles, nous présentons une approche générique pour l'extraction de relations temporelles basée sur des traits catégoriels. Cette approche peut être appliquée sur des documents écrits en anglais ou en français. Puis, nous décrivons une approche neuronale pour l'extraction d'informations temporelles qui inclut des traits catégoriels.La deuxième partie de notre thèse porte sur la résolution de la coréférence. Nous décrivons une approche neuronale pour la résolution de la coréférence dans les documents cliniques. Nous menons une étude empirique visant à mesurer l'effet de différents composants neuronaux, tels que les mécanismes d'attention ou les représentations au niveau des caractères, sur la performance de notre approche
Important information for public health is contained within Electronic Health Records (EHRs). The vast majority of clinical data available in these records takes the form of narratives written in natural language. Although free text is convenient to describe complex medical concepts, it is difficult to use for medical decision support, clinical research or statistical analysis.Among all the clinical aspects that are of interest in these records, the patient timeline is one of the most important. Being able to retrieve clinical timelines would allow for a better understanding of some clinical phenomena such as disease progression and longitudinal effects of medications. It would also allow to improve medical question answering and clinical outcome prediction systems. Accessing the clinical timeline is needed to evaluate the quality of the healthcare pathway by comparing it to clinical guidelines, and to highlight the steps of the pathway where specific care should be provided.In this thesis, we focus on building such timelines by addressing two related natural language processing topics which are temporal information extraction and clinical event coreference resolution.Our main contributions include a generic feature-based approach for temporal relation extraction that can be applied to documents written in English and in French. We devise a neural based approach for temporal information extraction which includes categorical features.We present a neural entity-based approach for coreference resolution in clinical narratives. We perform an empirical study to evaluate how categorical features and neural network components such as attention mechanisms and token character-level representations influence the performance of our coreference resolution approach

APA, Harvard, Vancouver, ISO, and other styles

30

Grishina, Yulia [Verfasser], Manfred [Akademischer Betreuer] Stede, Manfred Gutachter] Stede, and Heike [Gutachter] [Zinsmeister. "Assessing the applicability of annotation projection methods for coreference relations / Yulia Grishina ; Gutachter: Manfred Stede, Heike Zinsmeister ; Betreuer: Manfred Stede." Potsdam : Universität Potsdam, 2019. http://nbn-resolving.de/urn:nbn:de:kobv:517-opus4-425378.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Grishina, Yulia [Verfasser], Manfred [Akademischer Betreuer] Stede, Manfred [Gutachter] Stede, and Heike [Gutachter] Zinsmeister. "Assessing the applicability of annotation projection methods for coreference relations / Yulia Grishina ; Gutachter: Manfred Stede, Heike Zinsmeister ; Betreuer: Manfred Stede." Potsdam : Universität Potsdam, 2019. http://d-nb.info/1218404442/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Lenas, Erik. "Prerequisites for Extracting Entity Relations from Swedish Texts." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281275.

Full text

Abstract:

Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework.
Natural Language Processing (NLP) är ett stort och aktuellt forskningsområde idag med många praktiska tillämpningar som sentimentanalys, textkategoriser- ing, maskinöversättning och automatisk textsummering. Forskningen är för när- varande mest inriktad på det engelska språket, men många andra språkområ- den försöker komma ikapp. Det här arbetet fokuserar på ett område inom NLP som kallas informationsextraktion, och mer specifikt relationsextrahering, det vill säga att extrahera relationer mellan namngivna entiteter i en text. Vad det här ar- betet försöker göra är att använda olika maskininlärningstekniker för att skapa en svensk Language Processing Pipeline bestående av part-of-speech tagging, de- pendency parsing, named entity recognition och coreference resolution. Denna pipeline är sedan tänkt att användas som en bas for senare relationsextrahering från svenskt arkivmaterial. Den uppenbara svårigheten med detta ligger i att det är ont om stora, annoterade svenska dataset. Till exempel så finns det inget till- räckligt stort svenskt dataset för coreference resolution. En stor del av detta arbete går därför ut på att skapa en svensk coreference solver genom att implementera distantly supervised machine learning, med vilket menas att använda en engelsk coreference solver på ett oannoterat engelskt-svenskt corpus, och sen använda en word-aligner för att översätta detta maskinannoterade engelska dataset till ett svenskt, och sen träna en svensk coreference solver på detta dataset. Det här arbetet använder Allen NLP:s end-to-end coreference solver, både för att skapa det svenska datasetet, och för att träna den svenska modellen, och uppnår en F1-score på 0.5. Vad gäller named entity recognition så använder det här arbetet Kungliga Bibliotekets BERT-modeller som bas, och uppnår genom detta en F1- score på 0.95. Spacy används som ett enande ramverk för att samla alla dessa NLP-komponenter inom en enda pipeline.

APA, Harvard, Vancouver, ISO, and other styles

33

Ritz, Julia. "Discourse-givenness of noun phrases : theoretical and computational models." Phd thesis, Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2014/7081/.

Full text

Abstract:

This thesis gives formal definitions of discourse-givenness, coreference and reference, and reports on experiments with computational models of discourse-givenness of noun phrases for English and German. Definitions are based on Bach's (1987) work on reference, Kibble and van Deemter's (2000) work on coreference, and Kamp and Reyle's Discourse Representation Theory (1993). For the experiments, the following corpora with coreference annotation were used: MUC-7, OntoNotes and ARRAU for Englisch, and TueBa-D/Z for German. As for classification algorithms, they cover J48 decision trees, the rule based learner Ripper, and linear support vector machines. New features are suggested, representing the noun phrase's specificity as well as its context, which lead to a significant improvement of classification quality.
Die vorliegende Arbeit gibt formale Definitionen der Konzepte Diskursgegebenheit, Koreferenz und Referenz. Zudem wird über Experimente berichtet, Nominalphrasen im Deutschen und Englischen hinsichtlich ihrer Diskursgegebenheit zu klassifizieren. Die Definitionen basieren auf Arbeiten von Bach (1987) zu Referenz, Kibble und van Deemter (2000) zu Koreferenz und der Diskursrepräsentationstheorie (Kamp und Reyle, 1993). In den Experimenten wurden die koreferenzannotierten Korpora MUC-7, OntoNotes und ARRAU (Englisch) und TüBa-D/Z (Deutsch) verwendet. Sie umfassen die Klassifikationsalgorithmen J48 (Entscheidungsbäume), Ripper (regelbasiertes Lernen) und lineare Support Vector Machines. Mehrere neue Klassifikationsmerkmale werden vorgeschlagen, die die Spezifizität der Nominalphrase messen, sowie ihren Kontext abbilden. Mit Hilfe dieser Merkmale kann eine signifikante Verbesserung der Klassifikation erreicht werden.

APA, Harvard, Vancouver, ISO, and other styles

34

Goodsell, Thea. "Mental files." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:7d7a1146-f770-4951-81a2-2b5dc42d2ecc.

Full text

Abstract:

It is often supposed that we can make progress understanding singular thought about objects by claiming that thinkers use ‘mental files’. However, the proposal is rarely subject to sustained critical evaluation. This thesis aims to clarify and critique the claim that thinkers use mental files. In my introductory first chapter, I motivate my subsequent discussion by introducing the claim that thinkers deploy modes of presentation in their thought about objects, and lay out some of my assumptions and terminology. In the second chapter, I introduce mental files, responding to the somewhat fragmented files literature by setting out a core account of files, and outlining different ways of implementing the claim that thinkers use mental files. I highlight pressing questions about the synchronic and diachronic individuation conditions for files. In chapters three and four, I explore whether ‘de jure coreference’ can be used to give synchronic individuation conditions on mental files. I explore existing characterisations of de jure coreference before presenting my own, but conclude that de jure coreference does not give a useful account of the synchronic individuation conditions on files. In chapter five, I consider the proposal that thinkers must sometimes trade on the coreference of their mental representations, and argue that we can give synchronic individuation conditions on files in terms of trading on coreference. In chapter six, I bring together the account of files developed so far, compare it to the most developed theory of mental files published to date, and defend my account from the objection that it is circular. In chapter seven, I explore routes for giving diachronic individuation conditions on mental files. In my concluding chapter, I distinguish the core account of files from the idea that the file metaphor should be taken seriously. I suggest that my investigation of the consequences of the core account has shown that the file metaphor is unhelpful, and I outline reasons to exercise caution when using ‘files’ terminology.

APA, Harvard, Vancouver, ISO, and other styles

35

Batista-Navarro, Riza Theresa Bautista. "Information extraction from pharmaceutical literature." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/information-extraction-from-pharmaceutical-literature(3f8322b6-8b8d-44eb-a8cd-899026b267b9).html.

Full text

Abstract:

With the constantly growing amount of biomedical literature, methods for automatically distilling information from unstructured data, collectively known as information extraction, have become indispensable. Whilst most biomedical information extraction efforts in the last decade have focussed on the identification of gene products and interactions between them, the biomedical text mining community has recently extended their scope to capture associations between biomedical and chemical entities with the aim of supporting applications in drug discovery. This thesis is the first comprehensive study focussing on information extraction from pharmaceutical chemistry literature. In this research, we describe our work on (1) recognising names of chemical compounds and drugs, facilitated by the incorporation of domain knowledge; (2) exploring different coreference resolution paradigms in order to recognise co-referring expressions given a full-text article; and (3) defining drug-target interactions as events and distilling them from pharmaceutical chemistry literature using event extraction methods.

APA, Harvard, Vancouver, ISO, and other styles

36

Raghavan, Preethi. "MEDICAL EVENT TIMELINE GENERATION FROM CLINICAL NARRATIVES." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397651496.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Konstantinova, Natalia. "Knowledge acquisition from user reviews for interactive question answering." Thesis, University of Wolverhampton, 2013. http://hdl.handle.net/2436/297401.

Full text

Abstract:

Nowadays, the effective management of information is extremely important for all spheres of our lives and applications such as search engines and question answering systems help users to find the information that they need. However, even when assisted by these various applications, people sometimes struggle to find what they want. For example, when choosing a product customers can be confused by the need to consider many features before they can reach a decision. Interactive question answering (IQA) systems can help customers in this process, by answering questions about products and initiating a dialogue with the customers when their needs are not clearly defined. The focus of this thesis is how to design an interactive question answering system that will assist users in choosing a product they are looking for, in an optimal way, when a large number of similar products are available. Such an IQA system will be based on selecting a set of characteristics (also referred to as product features in this thesis), that describe the relevant product, and narrowing the search space. We believe that the order in which these characteristics are presented in terms of these IQA sessions is of high importance. Therefore, they need to be ranked in order to have a dialogue which selects the product in an efficient manner. The research question investigated in this thesis is whether product characteristics mentioned in user reviews are important for a person who is likely to purchase a product and can therefore be used when designing an IQA system. We focus our attention on products such as mobile phones; however, the proposed techniques can be adapted for other types of products if the data is available. Methods from natural language processing (NLP) fields such as coreference resolution, relation extraction and opinion mining are combined to produce various rankings of phone features. The research presented in this thesis employs two corpora which contain texts related to mobile phones specifically collected for this thesis: a corpus of Wikipedia articles about mobile phones and a corpus of mobile phone reviews published on the Epinions.com website. Parts of these corpora were manually annotated with coreference relations, mobile phone features and relations between mentions of the phone and its features. The annotation is used to develop a coreference resolution module as well as a machine learning-based relation extractor. Rule-based methods for identification of coreference chains describing the phone are designed and thoroughly evaluated against the annotated gold standard. Machine learning is used to find links between mentions of the phone (identified by coreference resolution) and phone features. It determines whether some phone feature belong to the phone mentioned in the same sentence or not. In order to find the best rankings, this thesis investigates several settings. One of the hypotheses tested here is that the relatively low results of the proposed baseline are caused by noise introduced by sentences which are not directly related to the phone and phone feature. To test this hypothesis, only sentences which contained mentions of the mobile phone and a phone feature linked to it were processed to produce rankings of the phones features. Selection of the relevant sentences is based on the results of coreference resolution and relation extraction. Another hypothesis is that opinionated sentences are a good source for ranking the phone features. In order to investigate this, a sentiment classification system is also employed to distinguish between features mentioned in positive and negative contexts. The detailed evaluation and error analysis of the methods proposed form an important part of this research and ensure that the results provided in this thesis are reliable.

APA, Harvard, Vancouver, ISO, and other styles

38

Castaño, André Casado. "Populando ontologias através de informações em HTML - o caso do currículo lattes." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-12082008-130204/.

Full text

Abstract:

A Plataforma Lattes é, hoje, a principal base de currículos dos pesquisadores brasileiros. Os currículos da Plataforma Lattes armazenam de forma padronizada dados profissionais, acadêmicos, de produções bibliográficas e outras informações dos pesquisadores. Através de uma base de Currículos Lattes, podem ser gerados vários tipos de relatórios consolidados. As ferramentas existentes da Plataforma Lattes não são capazes de detectar alguns problemas que aparecem na geração dos relatórios consolidados como duplicidades de citações ou produções bibliográficas classificadas de maneiras distintas por cada autor, gerando um número total de publicações errado. Esse problema faz com que os relatórios gerados necessitem ser revistos pelos pesquisadores e essas falhas deste processo são a principal inspiração deste projeto. Neste trabalho, utilizamos como fonte de informações currículos da Plataforma Lattes para popular uma ontologia e utilizá-la principalmente como uma base de dados a ser consultada para geração de relatórios. Analisamos todo o processo de extração de informações a partir de arquivos HTML e seu posterior processamento para inserí-las corretamente dentro da ontologia, de acordo com sua semântica. Com a ontologia corretamente populada, mostramos também algumas consultas que podem ser realizadas e fazemos uma análise dos métodos e abordagens utilizadas em todo processo, comentando seus pontos fracos e fortes, visando detalhar todas as dificuldades existentes no processo de população (instanciação) automática de uma ontologia.
Lattes Platform is the main database of Brazilian researchers resumés in use nowadays. It stores in a standardized form professional, academic, bibliographical productions and other data from these researchers. From these Lattes resumés database, several types of reports can be generated. The tools available for Lattes platform are unable to detect some of the problems that emerge when generating consolidated reports, such as citation duplicity or bibliographical productions misclassified by their authors, generating an incorrect number of publications. This problem demands a revision performed by the researcher on the reports generated, and the flaws of this process are the main inspiration for this project. In this work we use the Lattes platform resumés database as the source for populating an ontology that is intended to be used to generate reports. We analyze the whole process of information gathering from HTML files and its post-processing to insert them correctly in the ontology, according to its semantics. With this ontology correctly populated, we show some new reports that can be generated and we perform also an analysis of the methods and approaches used in the whole process, highlighting their strengths and weaknesses, detailing the dificulties faced in the automated populating process (instantiation) of an ontology.

APA, Harvard, Vancouver, ISO, and other styles

39

Lima, Juciane Nóbrega. "Paralelismo e foco estrutural no processamento da correferência de pronomes e de nomes repetidos." Universidade Federal da Paraíba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/8418.

Full text

Abstract:

Submitted by Maike Costa (maiksebas@gmail.com) on 2016-07-21T13:47:59Z No. of bitstreams: 1 arquivo total.pdf: 1030830 bytes, checksum: affa607bacbace5e6fc95c51c5a74efe (MD5)
Made available in DSpace on 2016-07-21T13:48:00Z (GMT). No. of bitstreams: 1 arquivo total.pdf: 1030830 bytes, checksum: affa607bacbace5e6fc95c51c5a74efe (MD5) Previous issue date: 2014-03-25
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
The present study has aims to investigate the intrasentencial coreference processing, observing how the processing of coreference of pronouns and repeated names occurs in relation to the focus of their antecedents. We take as initial hypothesis that repeated names take an extra cost to be processed than pronouns, despite the antecedent is more salient or not. This is nominated by the Repeated Name Penalty, postulated by the Informational Load Hypothesis. Almor (1999, 2000). We performed two online self-paced reading, through Psyscope program. The dependent variable of the two experiments was the reading time of the critical segment (repeated name or pronoun) and the independent variables were the type of resumption (pronoun or repeated name) and the position of the antecedent (focused or unfocused) . The difference between the two experiments was that at the first we controlled all experimental sentences to contain pronouns and repeated names in the same position and syntactic function of its background, so in parallel . The second was controlled in order not to contain the conditions in parallel. 28 students from UFPB participated in each experiment. The results of the first experiment show a lower reading time for the pronouns in relation to repeated names regardless if its antecedent was focused or not .The structural focus showed no significant effect on any of the experimental conditions. A possible explanation would be that the effect of structural parallelism overlapped the effect of focus. That's what the results of the second experiment showed. The resumption and antecedent not in parallel time resulted in an significant effect of structural focus. The reading time was faster when the antecedent was focused than when it was not. It was also confirmed Repeated Name Penalty also in this second experiment.
O presente estudo tem como objeto de investigação o processamento correferencial intersentencial, procurando observar como se dá o processamento da correferência de pronomes e de nomes repetidos em relação ao foco dos seus respectivos antecedentes. Tomamos como hipótese inicial que nomes repetidos teriam o processamento mais custoso do que os pronomes, independente da saliência do antecedente. Ou seja, haveria Penalidade do Nome Repetido, postulada pela Hipótese da Carga Informacional de Almor (1999; 2000). Para isso, realizamos dois experimentos com uma tarefa on-line de leitura automonitorada (self-paced reading), por meio do programa Psyscope. A variável dependente dos dois experimentos foi o tempo de leitura do segmento crítico (nome repetido ou pronome). E as variáveis independentes foram: o tipo de retomada (pronome ou nome repetido) e a posição do antecedente (focalizado ou não focalizado). A diferença entre os dois experimentos foi que no primeiro controlamos para que em todas as frases experimentais contivessem pronomes e nomes repetidos na mesma posição e função sintática de seus antecedentes, ou seja, em paralelo. Já no segundo controlamos para que em nenhuma das condições tivessem antecedente e retomada em paralelo. O total de participantes voluntários foi de 28 estudantes da UFPB em cada experimento. Os resultados do primeiro experimento mostram menor tempo de leitura para os pronomes em relação aos nomes repetidos independentemente se o seu antecedente estivesse focalizado ou não. Já o foco estrutural não mostrou efeito significativo em nenhuma das condições experimentais. Uma possível explicação seria a de que o efeito do paralelismo estrutural se sobrepôs ao efeito do foco. Foi o que os resultados do segundo experimento demonstraram. Dessa vez, com retomada e antecedente não paralelo, o efeito de foco estrutural se mostrou significativo, ou seja, a leitura foi mais rápida quando o antecedente estava focalizado do que quando não estava. E foi confirmada Penalidade do Nome Repetido também nesse segundo experimento.

APA, Harvard, Vancouver, ISO, and other styles

40

Kaumanns, Franz David [Verfasser], and Hinrich [Akademischer Betreuer] Schütze. "Assessment and analysis of the applicability of recurrent neural networks to natural language understanding with a focus on the problem of coreference resolution / Franz David Kaumanns ; Betreuer: Hinrich Schütze." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2016. http://d-nb.info/1121507999/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Silva, Jefferson Fontinele da. "Resolução de correferência em múltiplos documentos utilizando aprendizado não supervisionado." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19072011-144521/.

Full text

Abstract:

Um dos problemas encontrados em sistemas de Processamento de Línguas Naturais (PLN) é a dificuldade de se identificar que elementos textuais referem-se à mesma entidade. Esse fenômeno, no qual o conjunto de elementos textuais remete a uma mesma entidade, é denominado de correferência. Sistemas de resolução de correferência podem melhorar o desempenho de diversas aplicações do PLN, como: sumarização, extração de informação, sistemas de perguntas e respostas. Recentemente, pesquisas em PLN têm explorado a possibilidade de identificar os elementos correferentes em múltiplos documentos. Neste contexto, este trabalho tem como foco o desenvolvimento de um método aprendizado não supervisionado para resolução de correferência em múltiplos documentos, utilizando como língua-alvo o português. Não se conhece, até o momento, nenhum sistema com essa finalidade para o português. Os resultados dos experimentos feitos com o sistema sugerem que o método desenvolvido é superior a métodos baseados em concordância de cadeias de caracteres
One of the problems found in Natural Language Processing (NLP) systems is the difficulty of identifying textual elements that refer to the same entity. This phenomenon, in which the set of textual elements refers to a single entity, is called coreference. Coreference resolution systems can improve the performance of various NLP applications, such as automatic summarization, information extraction systems, question answering systems. Recently, research in NLP has explored the possibility of identifying the coreferent elements in multiple documents. In this context, this work focuses on the development of an unsupervised method for coreference resolution in multiple documents, using Portuguese as the target language. Until now, it is not known any system for this purpose for the Portuguese. The results of the experiments with the system suggest that the developed method is superior to methods based on string matching

APA, Harvard, Vancouver, ISO, and other styles

42

Huang, Yin Jou. "Event Centric Approaches in Natural Language Processing." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/265210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Fonseca, Evandro Brasil. "Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2018. http://tede2.pucrs.br/tede2/handle/tede/8169.

Full text

Abstract:

Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-06-19T11:37:24Z No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)
Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-26T14:40:39Z (GMT) No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)
Made available in DSpace on 2018-06-26T14:48:46Z (GMT). No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5) Previous issue date: 2018-03-19
Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence ? France is refusing. The country is one of the first in the ranking... ? we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.
A tarefa de Resolu??o de Correfer?ncia ? um grande desafio para a ?rea de Processamento da Linguagem Natural, tendo em vista o conhecimento lingu?stico exigido e a sofistica??o das t?cnicas de processamento da l?ngua empregados. Mesmo sendo uma tarefa desafiadora, um fator motivador do estudo deste fen?meno se d? pela sua utilidade. Basicamente, v?rias tarefas de Processamento da Linguagem Natural podem se beneficiar de seus resultados, como, por exemplo, o reconhecimento de entidades nomeadas, extra??o de rela??o entre entidades nomeadas, sumariza??o, an?lise de sentimentos, entre outras. A Resolu??o de Correfer?ncia ? um processo que consiste em identificar determinados termos e express?es que remetem a uma mesma entidade. Por exemplo, na senten?a ?A Fran?a est? resistindo. O pa?s ? um dos primeiros no ranking...? podemos dizer que [o pa?s] ? uma correfer?ncia de [A Fran?a]. Realizando o agrupamento desses termos referenciais, formamos grupos de men??es correferentes, mais conhecidos como cadeias de correfer?ncia. Esta tese prop?e um processo para a resolu??o de correfer?ncia entre sintagmas nominais para a l?ngua portuguesa, tendo como foco a utiliza??o do conhecimento sem?ntico. Nossa abordagem proposta ? baseada em regras lingu?sticas sint?tico-sem?nticas. Ou seja, combinamos diferentes n?veis de processamento lingu?stico utilizando rela??es sem?nticas como apoio, de forma a inferir rela??es referenciais entre men??es. Modelos baseados em regras lingu?sticas t?m sido aplicados eficientemente em outros idiomas como o ingl?s, o espanhol e o galego. Esses modelos mostram-se mais eficientes que os baseados em aprendizado de m?quina quando lidamos com idiomas menos providos de recursos, dado que a aus?ncia de corpora ricos em amostras pode prejudicar o treino desses modelos. O modelo proposto nesta tese ? o primeiro voltado para a resolu??o de correfer?ncia em portugu?s que faz uso de conhecimento sem?ntico. Dessa forma, tomamos este fator como a principal contribui??o deste trabalho.

APA, Harvard, Vancouver, ISO, and other styles

44

Adamček, Adam. "Metody extrakce informací." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234967.

Full text

Abstract:

The goal of information extraction is to retrieve relational data from texts written in natural human language. Applications of such obtained information is wide - from text summarization, through ontology creation up to answering questions by QA systems. This work describes design and implementation of a system working in computer cluster which transforms a dump of Wikipedia articles to a set of extracted information that is stored in distributed RDF database with a possibility to query it using created user interface.

APA, Harvard, Vancouver, ISO, and other styles

45

Correia, Débora Vasconcelos. "Relações entre memória procedimental e linguagem em pessoas que gaguejam: um estudo com base no processamento da correferência anafórica em português brasileiro." Universidade Federal da Paraíba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/6426.

Full text

Abstract:

Made available in DSpace on 2015-05-14T12:43:03Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 1310227 bytes, checksum: 04da33952d4cd23496fa53ef618ff840 (MD5) Previous issue date: 2014-03-26
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
This dissertation aims to explain how is the processing of coreference in people who stutter (PWS), reflecting on the possibility of an association between stuttering and the presence of difficulties in procedural memory, from the relationship between Alm's Dual Premotor Model (2005) and Ullman´s Declarative/Procedural Model (2001). It is proposed, then, a hypothesis about the connection between the presence of dysfunctions in procedural memory and the linguistic processing of PQG, which was investigated through the ASRT test (Alternating Serial Reaction Time) of procedural memory and two experiments of self-paced reading to the investigation of the phenomenon of inter and intrasentential coreference. In the ASRT test (experiment 1) performed to measure the degree of implicit learning of the participants, the findings suggested a tendency of the groups (PQC and FF) to behave distinctively. PQG showed a pattern of ascending curve, with a positive Spearman's coefficient for the variable cycle, expressing an increase in time of reaction as it increased the number of cycles (stimuli). Which we interpreted as a possible difficulty in the PQG in implicit learning of motor sequences. And the FF showed a descending curve, confirmed by a negative Spearman's coefficient for the variable cycle. Demonstrating that the procedural learning for this group occurred quickly, i.e., the reaction time of the FF reduced as there was an increase in the number of cycles. With these indications that PQG present difficulties in procedural memory, which could interfere in the processing of grammatical aspects according to our hypothesis, we set out to the investigation of the linguistic processing. In experiment 2, the intersentential coreference, performed with the aim at investigating the processing of lexical pronoun (PR) and the repeated name (NR) in the object position between FF and PQC, the results showed that there is no difference in this type of processing between FF and PQC, since both groups showed similar patterns in the average reading time of the critical segment. However, there were a significant effect for the variable tipo de retomada, showing that PR are processed faster than the NR, as previously found by Leitão (2005). Thus, in order to investigate how was grammar functioning in PQG and to attest the hypothesis defended in this dissertation, we set out to the analysis of the phenomenon of coreference in the intrassentential level, in order to isolate the grammatical aspect and eliminate possible interference from the pragmatic and contextual factors. The results pointed to the absence of main effect for the variable group, however, we found a marginally significant interaction effect between the variables group and type of sentence. This interaction can be explained by the fact that the groups react differently to the conditions, departing from the observation that there is an inverse behavior between them, i.e., to the extent that FF are faster in the grammatical condition and slower in agramatical condition, PQG show the opposite pattern. Which corroborates our hypothesis that PQG would have difficulties in perception of breach of grammatical principle. This possibility, confirmed by the statistical evidence foreseen for our findings with the increase of sample, that it directs our search for rejecting the null hypothesis.
Esta dissertação tem por objetivo explanar como se dá o processamento da correferência em pessoas que gaguejam (PQG), refletindo sobre a possibilidade de associação entre a gagueira e a presença de dificuldades na memória procedimental, a partir da relação entre o Modelo Pré-Motor Duplo de Alm (2005) e o Modelo Declarativo/Procedimental de Ullman (2001). Lança-se, então, uma hipótese acerca da conexão entre a presença de disfunções na memória procedimental e o processamento linguístico das PQG, investigada por meio do teste ASRT (Alternating Serial Reaction Time) de memória procedimental e dois experimentos de leitura automonitorada para a investigação do fenômeno da correferência inter e intrassentencial. No teste ASRT (experimento 1) realizado para medir o grau de aprendizagem implícita dos participantes, os resultados encontrados apontaram para uma tendência dos grupos (PQG e FF) a comportarem-se de maneira distinta. As PQG evidenciaram um padrão de curva ascendente, com coeficiente de Spearman positivo para a variável ciclo, expressando um aumento do tempo de reação à medida que se aumentava o número de ciclos (estímulos). O que interpretamos como uma possível dificuldade das PQG na aprendizagem implícita das sequências motoras. E os FF evidenciaram uma curva descendente, confirmada pelo coeficiente de Spearman negativo para a variável ciclo. Demonstrando que a aprendizagem procedimental para este grupo ocorreu de maneira mais rápida, ou seja, o tempo de reação dos FF reduzia à medida que se aumentava o número de ciclos. De posse desses indícios de que as PQG apresentam dificuldades na memória procedimental, o que poderia interferir no processamento dos aspectos gramaticais de acordo com a nossa hipótese, partimos para a investigação do processamento linguístico. No experimento 2, de correferência intersentencial, realizado com o intuito de investigar o processamento do pronome lexical (PR) e do nome repetido (NR) em posição de objeto entre FF e PQG, os resultados obtidos evidenciaram que não há diferença nesse tipo de processamento entre FF e PQG, uma vez que ambos os grupos apresentaram padrões semelhantes no tempo médio de leitura do segmento crítico. No entanto, houve efeito significativo para a variável tipo de retomada, constatando que os PR são mais rapidamente processados do que o NR, conforme já encontrado em Leitão (2005). Dessa forma, a fim de investigar como se dava o funcionamento da gramática nas PQG e atestar de modo mais categórico a hipótese defendida nesta dissertação, partimos para a análise do fenômeno da correferência em nível intrassentencial, objetivando isolar o aspecto gramatical e eliminar as possíveis interferências dos fatores pragmáticos e contextuais. Os resultados obtidos apontaram a ausência de efeito principal para a variável grupo, no entanto, constatou-se um efeito de interação marginalmente significativo entre as variáveis grupo e tipo de sentença. Essa interação pode ser explicada pelo fato de os grupos reagirem diferentemente às condições, partindo da observação que há um comportamento invertido entre eles, ou seja, na medida em que os FF s são mais rápidos na condição gramatical e mais lentos na condição agramatical, as PQG apresentam o padrão oposto. O que corrobora com a nossa hipótese de que as PQG teriam dificuldades na percepção da violação do princípio gramatical. Possibilidade essa, confirmada por meio das evidências estatísticas previstas para os nossos resultados com o aumento da amostra, que direciona a nossa pesquisa para a rejeição da hipótese nula.

APA, Harvard, Vancouver, ISO, and other styles

46

Wetzel, Dominikus Emanuel. "Entity-based coherence in statistical machine translation : a modelling and evaluation perspective." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31522.

Full text

Abstract:

Natural language documents exhibit coherence and cohesion by means of interrelated structures both within and across sentences. Sentences do not stand in isolation from each other and only a coherent structure makes them understandable and sound natural to humans. In Statistical Machine Translation (SMT) only little research exists on translating a document from a source language into a coherent document in the target language. The dominant paradigm is still one that considers sentences independently from each other. There is both a need for a deeper understanding of how to handle specific discourse phenomena, and for automatic evaluation of how well these phenomena are handled in SMT. In this thesis we explore an approach how to treat sentences as dependent on each other by focussing on the problem of pronoun translation as an instance of a discourse-related non-local phenomenon. We direct our attention to pronoun translation in the form of cross-lingual pronoun prediction (CLPP) and develop a model to tackle this problem. We obtain state-of-the-art results exhibiting the benefit of having access to the antecedent of a pronoun for predicting the right translation of that pronoun. Experiments also showed that features from the target side are more informative than features from the source side, confirming linguistic knowledge that referential pronouns need to agree in gender and number with their target-side antecedent. We show our approach to be applicable across the two language pairs English-French and English-German. The experimental setting for CLPP is artificially restricted, both to enable automatic evaluation and to provide a controlled environment. This is a limitation which does not yet allow us to test the full potential of CLPP systems within a more realistic setting that is closer to a full SMT scenario. We provide an annotation scheme, a tool and a corpus that enable evaluation of pronoun prediction in a more realistic setting. The annotated corpus consists of parallel documents translated by a state-of-the-art neural machine translation (NMT) system, where the appropriate target-side pronouns have been chosen by annotators. With this corpus, we exhibit a weakness of our current CLPP systems in that they are outperformed by a state-of-the-art NMT system in this more realistic context. This corpus provides a basis for future CLPP shared tasks and allows the research community to further understand and test their methods. The lack of appropriate evaluation metrics that explicitly capture non-local phenomena is one of the main reasons why handling non-local phenomena has not yet been widely adopted in SMT. To overcome this obstacle and evaluate the coherence of translated documents, we define a bilingual model of entity-based coherence, inspired by work on monolingual coherence modelling, and frame it as a learning-to-rank problem. We first evaluate this model on a corpus where we artificially introduce coherence errors based on typical errors CLPP systems make. This allows us to assess the quality of the model in a controlled environment with automatically provided gold coherence rankings. Results show that this model can distinguish with high accuracy between a human-authored translation and one with coherence errors, that it can also distinguish between document pairs from two corpora with different degrees of coherence errors, and that the learnt model can be successfully applied when the test set distribution of errors comes from a different one than the one from the training data, showing its generalization potentials. To test our bilingual model of coherence as a discourse-aware SMT evaluation metric, we apply it to more realistic data. We use it to evaluate a state-of-the-art NMT system against post-editing systems with pronouns corrected by our CLPP systems. For verifying our metric, we reuse our annotated parallel corpus and consider the pronoun annotations as proxy for human document-level coherence judgements. Experiments show far lower accuracy in ranking translations according to their entity-based coherence than on the artificial corpus, suggesting that the metric has difficulties generalizing to a more realistic setting. Analysis reveals that the system translations in our test corpus do not differ in their pronoun translations in almost half of the document pairs. To circumvent this data sparsity issue, and to remove the need for parameter learning, we define a score-based SMT evaluation metric which directly uses features from our bilingual coherence model.

APA, Harvard, Vancouver, ISO, and other styles

47

Shankaranarayanan, S. "Detection of Coreferences in Automatic Specifications Analysis." Thesis, Virginia Tech, 1994. http://hdl.handle.net/10919/42360.

Full text

Abstract:

Specifications on digital hardware systems typically contain descriptions and requirements expressed in natural language and diagrams of various types. The objective of the research reported here is the automatic detection of common references ("coreferences") to objects in natural language specification statements in order to permit automatic integration of requirements. This thesis describes a prototype system for detecting coreferences. First, the natural language statements are translated into conceptual graphs (semantic nets). Then, these graphs are scanned by a rule-based system to determine whether each concept that is encountered is the definition of a new concept or a reference to a previously defined concept. Tests performed on the system developed indicate a high percentage rate of correct classifications.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

48

Gonçalves, Patrícia Nunes. "CorrefSum: revisão da coesão referencial em sumários extrativos." Universidade do Vale do Rio do Sinos, 2008. http://www.repositorio.jesuita.org.br/handle/UNISINOS/2264.

Full text

Abstract:

Made available in DSpace on 2015-03-05T13:59:43Z (GMT). No. of bitstreams: 0 Previous issue date: 28
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Com o avanço da Internet, cada vez mais convivemos com a sobrecarga de informação. É nesse contexto que a área de sumarização automática de textos tem se tornado uma área proeminente de pesquisa. A sumarização é o processo de discernir as informações mais importantes dos textos para produzir uma versão resumida. Sumarizadores extrativos escolhem as sentenças mais relevantes do texto e as reagrupam para formar o sumário. Muitas vezes, as frases selecionadas do texto não preservam a coesão referencial necessária para o entendimento do texto. O foco deste trabalho é, portanto, na análise e recuperação da coesão referencial desses sumários. O objetivo é desenvolver um sistema que realiza a manutenção da coesão referencial dos sumários extrativos usando como fonte de informação as cadeias de correferência presentes no texto-fonte. Para experimentos e avaliação dos resultados foram utilizados dois sumarizadores: Gist-Summ e SuPor-2. Foram utilizadas duas formas de avaliação: automática e subjetiva. Os resultados
With the advance of Internet technology we see the problem of information overload. In this context, automatic summarization is an important research area. Summarization is the process of identifying the most relevant information brought about in a text and on that basis to rewrite a short version of it. Extractive summarizers choose the most relevant sentences in a text and regroup them to form the summary. Usually the juxtaposition of the selected sentences violate the referential cohesion that is needed for the interpretation of the text. This work focuses on the analysis and recovery of referential cohesion of extractive summaries on the basis of knowledge about correference chains as presented in the source text. Some experiments were undertaken considering the summarizers GistSumm and SuPor-2. Evaluation was done in two ways, automatically and subjectively. The results indicate that this is a promising area of work and ways of advancing in this research are discussed

APA, Harvard, Vancouver, ISO, and other styles

49

BOURGEOIS, ROBERT. "Iceo. Intension, coreferences et objets dans la federation de formalismes de specification." Paris 6, 1990. http://www.theses.fr/1990PA066425.

Full text

Abstract:

Dans cette these nous presentons la conception et la realisation d'un systeme de representation des connaissances interactif, ecrit en smalltalk-80, ayant les caracteristiques exigees pour acquerir et exploiter les informations pendant la phase de specification d'un systeme. Ce systeme de representation des connaissances, appele iceo, offre une structure d'accueil pour: construire des editeurs (graphiques et textuels); constituer une base de connaissances permettant d'analyser et de comprendre des descriptions donnees dans differents formalismes; lier semantiquement les diverses descriptions d'un meme systeme; interfacer des moyens d'interpretation symbolique, de validation et d'animation. Iceo inclut des mecanismes de representation pour traiter des problemes de perception, de qualification et de coreference apparaissant dans la comprehension de descriptions exprimees dans differents langages, dont les systemes de representation des connaissances existants font plus ou moins defaut ou abstraction

APA, Harvard, Vancouver, ISO, and other styles

50

Versley, Yannick [Verfasser], and Erhard [Akademischer Betreuer] Hinrichs. "Resolving Coreferent Bridging in German Newspaper Text / Yannick Versley ; Betreuer: Erhard Hinrichs." Tübingen : Universitätsbibliothek Tübingen, 2010. http://d-nb.info/1161803114/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Coreference'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles