Dissertations / Theses on the topic 'Corpus linguistics'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Corpus linguistics.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Atwell, Eric Steven. "Corpus linguistics and language learning : bootstrapping linguistic knowledge and resources from text." Thesis, University of Leeds, 2008. http://etheses.whiterose.ac.uk/7504/.
Full textHarvey, Kevin. "Adolescent health communication: a corpus linguistics approach." Thesis, University of Nottingham, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.491000.
Full textWiesner, Susan L. "Framing dance writing : a corpus linguistics approach." Thesis, University of Surrey, 2007. http://epubs.surrey.ac.uk/974/.
Full textCheung, Mei Ling Lisa. "Merging corpus linguistics and collaborative knowledge construction." Thesis, University of Birmingham, 2009. http://etheses.bham.ac.uk//id/eprint/464/.
Full textDoyle, Paul G. "Replicating corpus linguistics : a corpus-driven investigation of lexical networks in text." Thesis, Lancaster University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418685.
Full textTagg, Caroline. "A corpus linguistics study of SMS text messaging." Thesis, University of Birmingham, 2009. http://etheses.bham.ac.uk//id/eprint/253/.
Full textAlruwaili, Awatif. "Integrating corpus linguistics in second language vocabulary acquisition." Thesis, University of Nottingham, 2018. http://eprints.nottingham.ac.uk/51589/.
Full textKORTE, MATTHEW. "Corpus Methods in Interlanguage Analysis." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1218835515.
Full textRodrigues, Agnes dos Santos Scaramuzzi. "Posicionamento e linguística forense: uma análise mediada pela Linguística de Corpus." Pontifícia Universidade Católica de São Paulo, 2016. https://tede2.pucsp.br/handle/handle/18899.
Full textMade available in DSpace on 2016-08-18T13:23:04Z (GMT). No. of bitstreams: 1 Agnes dos Santos Scaramuzzi-Rodrigues.pdf: 1966394 bytes, checksum: 422e8077709ab2c24354f481450b6ef1 (MD5) Previous issue date: 2016-06-09
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
The overall purpose of this research was to investigate the verbal language collected and compiled in an electronic corpus, extracted from a Criminal Lawsuit - Special Part - Crimes Against the Person, Chapter I, Crimes Against Life, tried at the Legal Complex Minister Mario Guimarães Forum in the capital of São Paulo, in 2011. The relevant issue was the homicidal domestic violence. We opted for a murder case perpetrated by a male defendant against his former spouse and mother of his child, staged at the residence of the victim. The object of my study was the set of hearings that were held in two separate instances: Preliminary Hearing and Trial. Our assumption is that in a criminal proceeding, language actors fall into three distinct groups, according to their function in the proceedings: (a) charge, (b) defend and (c) judge. Because they have no access to the acts per se, only to their representations, the actors used language, largely verbal, to report their knowledge of the occurrence. In doing so, they positioned themselves in accordance with from their view of the occurrence and imbue the impact that the events have had on their lives into their accounts. Thus, there are linguistic differences in the speech of actors, according to their function in the proceedings, which can be revealed by the investigation into the stance. Revealing them can elucidate the prosecution‘s and the defense‘s versions of, for example, the profile of the victim and the defendant. Our analysis of these differences is innovative, as we did not find any study on the stance at hearings in Portuguese. As such, we seek to bridge this gap. The specific objectives were: (A) to reveal the different uses of stance by explaining their categories in two parts of speech; (A) adjectives and (b) adverbs ending in ―ly‖; and (B) to find out if there are differences in the uses of stance, according with the role the linguistic actor plays in the proceedings: (a) charge, (b) defend (c) and judge. Research questions were: 1. What is the evidence of the use of stance and its categories in the different parts of speech: (a) adjectives and (b) adverbs ending in ―ly‖? 2. Are there differences in the uses of stance in relation to the role of the linguistic actor played in the proceedings: (a) charge, (b) defend (c) and judge? The theoretical background is based on: (A) Applied Linguistics; (B) Corpus Linguistics, extracting evidence from the use of verbal language through corpus analysed electronically; (C) analysis of stance in light of Biber and Finegan (1988); Biber et al, (1999) and Biber (2006a and 2006b), defined as an expression of feeling, attitudes and judgments that the authors explains about their message; and (D) Forensic Linguistics investigating the language used in the forums. The methodology included fieldwork, use of the electronic tool WORDS and qualitative analysis. The results indicated the following: for the first question, that two stances were discovered in the two parts of speech, and for the second question, that there are differences in the uses of stances related to functions in the proceedings. Given these responses, we conclude that: (a) it is important to investigate the linguistic characteristics of each function in the proceedings in order to understand what is the language used in each of these functions; and (b) identifying different stances in forensic corpus may be useful to assess the quality of the information relayed, for example, by witnesses who impregnates their speech with personal feelings, attitudes and level of knowledge about the fact on trial. We hope to have contributed to the development of new Corpus Linguistics studies, focused on the uses of stances in forensic speech from the research methodology developed herein. We also hope to have contributed to the development of Forensic Linguistics in Brazil by offering our methodology and results, as we adopted the required rigor in our practices. Our final considerations discuss the following: the limitations, developments, future research and proposals for educational applications
O objetivo geral desta pesquisa foi o de investigar a linguagem verbal coletada e compilada em corpus eletrônico oriundo de um Processo Penal - Parte Especial - Dos Crimes Contra a Pessoa, capítulo I, Dos Crimes Contra a Vida, julgado no Complexo Judiciário Fórum Ministro Mário Guimarães, na capital de São Paulo, em 2011. A relevante problemática foi a violência doméstica homicida. Optamos por um processo de homicídio perpetrado por réu do sexo masculino contra sua antiga cônjuge e mãe de seu filho cujo palco do crime fora a residência da vítima. O objeto de estudo foi o conjunto das oitivas processuais que aconteceram em dois momentos: Audiência Preliminar e Julgamento. Nossa pressuposição é que em um Processo Penal, os atores linguísticos compõem três grupos distintos de acordo com sua função processual: (a) acusar, (b) defender e (c) julgar. Porque não se têm acesso aos atos em si, apenas às representações deles, os atores optam por usar a linguagem, em especial, a verbal ao explicitar seu conhecimento sobre a ocorrência. Ao fazê-lo eles se posicionam de modo diferente a partir de suas crenças sobre a ocorrência e impregnam em sua fala o impacto que os fatos tiveram sobre suas vidas. Diante disso, há diferenças linguísticas na fala dos atores de acordo com sua função processual que podem ser reveladas pela investigação do posicionamento. Revelá-las pode elucidar as versões da acusação e da defesa sobre, por exemplo, o perfil da vítima e do réu. Nossa análise dessas diferenças é inovadora, já que não encontramos nenhum estudo do posicionamento em oitivas processuais no Português, assim, buscamos preencher essa lacuna. Os objetivos específicos foram: (A) Revelar os usos de posicionamento explicitando suas categorias em duas classes gramaticais; (a) adjetivos e (b) advérbios terminados em mente; e (B) Descobrir se há diferenças de usos de posicionamento de acordo com a função processual que o ator linguístico exercer: (a) acusar, (b) defender e (c) julgar. As questões de pesquisa foram: 1- Quais são as evidências de usos de posicionamento e suas categorias nas classes gramaticais dos: (a) adjetivos e (b) advérbios terminados em mente? e 2- Há diferenças de usos de posicionamento em relação à função processual que o ator linguístico exercer: (a) acusar, (b) defender e (c) julgar? Adotamos a seguinte fundamentação teórica: (A) Linguística Aplicada; (B) Linguística de Corpus, extraindo evidências de uso da linguagem verbal por meio de corpus analisado eletronicamente; (C) Análise do posicionamento à luz de Biber e Finegan (1988); Biber et al., (1999) e Biber (2006a e 2006b) definido como uma expressão de sentimento, atitudes e julgamentos que o ator explicita sobre sua mensagem; e (D) Linguística Forense que investiga a linguagem que acontece nos fóruns. A metodologia incluiu pesquisa de campo, uso da ferramenta eletrônica PALAVRAS e análise qualitativa. Os resultados indicaram: para a primeira pergunta, que foram descobertos usos de posicionamento nas duas classes gramaticais e, para a segunda, que há diferenças de usos de posicionamento em relação às funções processuais. Diante dessas respostas, concluímos que: (a) é importante investigar as diferenças linguísticas de cada função processual a fim de entendermos qual é a linguagem usada em cada uma dessas funções; e (b) identificar o posicionamento em corpus forense pode ser útil ao avaliar a qualidade da informação transmitida, por exemplo, por uma testemunha que impregna sua fala com seus sentimentos, atitudes e o grau de conhecimento frente ao fato que se julga. Esperamos ter contribuído com novos estudos da Linguística de Corpus focados nos usos do posicionamento no discurso forense a partir da metodologia desenvolvida nesta pesquisa. Almejamos também, ter contribuído com o desenvolvimento da Linguística Forense no Brasil ofertando nossa metodologia e resultados, já que adotamos em nossas práticas o rigor exigido. Nossas considerações finais discutem: as limitações, desdobramentos, pesquisas futuras e, ainda, propostas de aplicações pedagógicas
Rizomilioti, Vassiliki. "Epistemic modality in academic writing : a corpus-linguistic approach." Thesis, University of Birmingham, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288688.
Full textMartins, Francimary MacÃdo. "CompilaÃÃo, anotaÃÃo e anÃlise linguÃstico-computacional de um corpus de textos literÃrios dos sÃculos XIX e XX: corpus Coelho Neto." Universidade Federal do CearÃ, 2014. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=15313.
Full textEsta tese à a compilaÃÃo, anotaÃÃo morfossintÃtica e anÃlise linguÃstico-computacional de um corpus de textos literÃrios dos sÃc. XIX e XX: o Corpus Coelho Netto (CCN), contendo textos dos romances A Conquista e TurbilhÃo e contos do livro SertÃo. O trabalho està na interface da LinguÃstica de Corpus e da LinguÃstica Computacional (BERBER SARDINHA, 2000, 2003, 2004, 2005, 2009; BERBER SARDINHA; ALMEIDA, 2008; OLIVEIRA, 2009; BIDERMAN, 1998, 2001; ALUÃSIO; ALMEIDA, 2006; SHEPHERD, 2012; MACENERY E WILSON, 2001; LEECH, 2004; ALVES; TAGNIN, 2012; ALENCAR, 2009, 2010a, 2010b, 2011a, 2011b, 2013a, 2013b). O CCN contÃm 53.080 (cinquenta e trÃs mil e oitenta) tokens (pontuaÃÃo e palavras). A compilaÃÃo consiste nas etapas de seleÃÃo, coleta de textos e manipulaÃÃo; nesta sÃo realizadas a limpeza, ediÃÃo e atualizaÃÃo dos textos (ALUÃSIO; ALMEIDA, 2006), para depois ser submetido à anotaÃÃo morfossintÃtica e anÃlise linguÃstico-computacional, com o objetivo de obter dados que comprovem ou nÃo o uso âexcessivoâ de adjetivos, de verbos e de advÃrbios em âmente, demonstrando a diversidade lexical nos textos de Coelho Netto, constatando se o que a crÃtica modernista dizia a respeito do escritor era procedente. A anotaÃÃo morfossintÃtica foi realizada pelo etiquetador automÃtico Aelius, modelo AeliusHunPos, um software livre em Python que utiliza a biblioteca Natural Language Toolkit â NLTK (BIRD; KLEIN; LOPER, 2009), no prÃ-processamento de textos, na construÃÃo de etiquetador morfossintÃtico e na anotaÃÃo de corpora com auxÃlio de revisÃo humana (ALENCAR, 2010a, 2013a, 2013b), e que foi treinado no Corpus HistÃrico do PortuguÃs Tycho Brahe (CHPTB). A compilaÃÃo e anotaÃÃo do CCN envolve outras aÃÃes como a reavaliaÃÃo da acurÃcia desse etiquetador em textos literÃrios. Os resultados da pesquisa revelaram que: o AeliusHunpos ao anotar os textos do CCN demonstrou maior acurÃcia que em outros textos jà anotados, de 97,9%; que o modelo AeliusHunPos mostrou um desempenho muito alÃm ao anotar os corpora que com o modelo AeliusMaxEnt; e que, apÃs a seleÃÃo e correÃÃo manual dos 10% dos corpora anotados e gerados arquivos padrÃo gold, sugerimos um melhoramento dos aproximados 3% de erros cometidos pelo etiquetador, visando o aumento de sua acurÃcia. Quanto Ãs analises realizadas com os dados obtidos no CCN constatamos que: a diversidade lexical, especificamente quanto a verbos, adjetivos e advÃrbios em âmente, declarada como exagerada pela crÃtica à Coelho Netto nÃo procede, pois seus textos sÃo ricos, mas quando comparados aos textos de AluÃsio Azevedo e Camilo Castelo Branco, o Corpus de ComparaÃÃo, apresentam riqueza vocabular similar ao CCN, como expostos nos resultados.
This thesis is the compilation, morphosyntactic annotation and linguistic and computational analysis of a corpus of literary texts of 19th and 20th centuries: Corpus Coelho Netto (CCN), containing texts of the novels A Conquista and TurbilhÃo and short stories of the book SertÃo. The work is in the Corpus Linguistics and Computational Linguistics interface (BERBER SARDINHA, 2000, 2003, 2004, 2005, 2009; BERBER SARDINHA; ALMEIDA, 2008; OLIVEIRA, 2009; BIDERMAN, 1998, 2001; ALUÃSIO; ALMEIDA, 2006; SHEPHERD, 2012; MACENERY AND WILSON, 2001; LEECH, 2004; ALVES; TAGNIN, 2012; ALENCAR, 2009, 2010a, 2010b, 2011a, 2011b, 2013a, 2013b). The CCN contains 53.080 (fifty-three thousand and eighty) tokens. The compilation consists of the steps selection, collection off texts and handling; in which cleaning, editing and updating of texts (ALUÃSIO; ALMEIDA, 2006), and then be submitted to the morphosyntactic annotation and linguistic-computational analysis, with the goal of obtaining data to show whether or not the "excessive" use of adjectives, verbs and adverbs in ââmenteâ, demonstrating the lexical diversity in Coelho NettoÂs texts, noting if what the modernist critics said about the writer was correct. The annotation was performed by automatic tagger Aelius, AeliusHunPos model, free software in Python that uses the Natural Language Toolkit â NLTK library (BIRD; KLEIN; LOPER, 2009), in the pre-processing of texts, in the construction of morphosyntactic tagger and the automatic annotation of corpora with the help of human review (ALENCAR, 2010a, 2013a, 2013b), and it was trained in the Historical Corpus of Tycho Brahe Portuguese (CHPTB). The compilation and annotation CCN involves other actions such as revaluation the accuracy of this tagger in literary texts. The search results indicated that: AeliusHunpos demonstrated better performance than other texts already noted (97.9 %); AeliusHunPos model showed a far beyond performance by annotating corpora with AeliusMaxEnt model; and that, after selection and manual correction of 10% annotated corpora and generated gold standard files, it is suggested an improvement of the approximate 3% of errors by the tagger, in order to increase its accuracy. Regarding the analyzes performed with the CCN, it was found that: lexical diversity - about verbs, adjectives and adverbs in ââmenteâ considered exaggerated by critics to Coelho Netto unfounded, because his texts are rich, but when compared to the texts by AluÃsio Azevedo and Camilo Castelo Branco, comparison of corpus, present vocabulary richness similar to CCN, as exposed in the results.
Tang, Haijiang. "Building phrase based language model from large corpus /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20TANG.
Full textIncludes bibliographical references (leaves 74-79). Also available in electronic version. Access restricted to campus users.
Bridle, Marcus. "Error correction through corpus consultation in EAP writing : an analysis of corpus use in a pre-sessional context." Thesis, University of Huddersfield, 2015. http://eprints.hud.ac.uk/id/eprint/24848/.
Full textGonçalves, Marcos Antônio. "As formações x-inho nas modalidades oral e escrita: um estudo contrastivo baseado na lingüística de corpus." Universidade do Estado do Rio de Janeiro, 2006. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=59.
Full textThe items ending in -inho are described in the majority of grammars of Portuguese as conveying two notions, namely affect and dimension. However, the same grammars do not seem to include either the extralinguistic or contextual factors in which speakers are inserted when they opt for a word ending in -inho. The aim of the present work thus is to investigate the productivity of such items in two electronic corpora: one of an oral nature which is further subdivided into two sub-corpora containing narratives and descriptions and a second one compiled exclusively from the various sections of a widely read quality newspaper. The dissertation quantifies the various instances of items ending in inho in each of the corpora. Next, each of these occurrences is analysed and classified to check which notion (dimentio,positive affect, negative affect, intensification) they convey. Last the results of both frequency and dispension counts are contrasted for each of the corpora. The methodology of our analyses is centered on the area known as Corpus Linguistics, which provides a basis for the data to be compiled and interpreted.
Caldwell, Joshua Marrinor. "Iconic Semantics in Phonology: A Corpus Study of Japanese Mimetics." BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2368.
Full textThomas, Penelope Leith. "Facebook in the Australian News: a corpus linguistic approach." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/18747.
Full textSouza, Adílio Junior de. "Lexicalização e neologismo: análise funcional em corpus digital." Universidade Federal da Paraíba, 2015. http://tede.biblioteca.ufpb.br:8080/handle/tede/8403.
Full textMade available in DSpace on 2016-07-19T13:59:09Z (GMT). No. of bitstreams: 1 arquivo total.pdf: 3655935 bytes, checksum: c32c80be0b5d66b04eb2ca7b1e59f308 (MD5) Previous issue date: 2015-12-04
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
This dissertation points out how the appearances of the neologisms in a language, by lexicalization, can contribute to enrichment and updating of the lexicon of the same language. Therefore, it looked for: (i) expose the main concepts about lexicon, neologism and lexicalization, based on the Usage-Based Linguistics (UBL), (ii) it presents 13 lexical items selected from the digital corpus and (iii) present the real relevance of the lexicalization for the formation of new words, for to understand how this affects/changes the multi-system. The corpus used was the one of the Project AC/DC: corpo Corpus Brasileiro, which has about one billion words employed in the most varied use contexts. For the fundamentation of the dissertation, some scholars were consulted, among them we highlight: Martelotta (2011), Gonçalves (2011), Contiero and Ferraz (2014), Correia and Almeida (2012), Carvalho (2009a), Biderman (1981), Câmara Jr. (2011), Pontes-Ribeiro (2007), Castilho (2003a; 2003b; 2008), Cunha (2011), Mendes and Seabra (2006), Ferraz (2006; 2007) and Fortunato (2008). The methodology consists in three stages: a) select of lexical elements samples in the corpus, b) extraction of this samples and compilations of them in tables and c) analyses of collected data. The results revealed that some of the 13 lexicalized words/neologisms, possibly, appeared to fulfill an existing space of linguistic signs in the multi-system, others acquired new meanings when used in new contexts of use and many others are in process of disappearance. The frequency of use was determining in the change of meaning.
Esta dissertação aponta como o surgimento dos neologismos em uma língua, pela lexicalização, pode contribuir para o enriquecimento e atualização do léxico desta mesma língua. Deste modo, buscou-se: (i) expor os principais conceitos sobre léxico, neologismo e lexicalização, com base na Linguística Centrada no Uso (LCU), (ii) apresentar 13 itens lexicais selecionados a partir do corpus digital e (iii) discutir a relevância da lexicalização para a formação de novas palavras, para entender como isso afeta/altera o multissistema. O corpus utilizado foi o Projeto AC/DC: corpo Corpus Brasileiro, que contém cerca de um bilhão de palavras empregadas nos mais variados contextos de uso. Para a fundamentação da dissertação, alguns estudiosos foram consultados, entre os quais se destacam: Martelotta (2011), Gonçalves (2011), Contiero e Ferraz (2014), Correia e Almeida (2012), Carvalho (2009a), Biderman (1978; 1981), Câmara Jr. (2011), Pontes-Ribeiro (2007), Castilho (2003a; 2003b; 2008), Cunha (2011), Mendes e Seabra (2006), Ferraz (2006; 2007) e Fortunato (2008). A metodologia consistiu em três etapas: a) coleta de amostras de itens lexicais no corpus, b) extração dessas amostras e compilação em tabelas e c) análise dos dados coletados. Os resultados revelaram que alguns dos 13 neologismos/palavras lexicalizadas, possivelmente, surgiram para preencher um vazio de signos linguísticos no multissistema, outros adquiriram novos sentidos ao serem empregados em novos contextos de uso e outros tantos estão em processo de desaparecimento. A frequência de uso foi determinante para a mudança no sentido.
Trklja, Aleksandar. "A corpus linguistics study of translation correspondences in English and German." Thesis, University of Birmingham, 2014. http://etheses.bham.ac.uk//id/eprint/4785/.
Full textCosta, Danilo Duarte. "Linking adverbials in applied linguistics research articles: a corpus-based study." Universidade Federal de Minas Gerais, 2015. http://hdl.handle.net/1843/MGSS-9VKN7F.
Full textEste estudo se propõe a investigar o uso de linking adverbials (Biber et al., 1999) em artigos científicos de linguística aplica escritos em inglês por brasileiros, em comparação com aqueles escritos por falantes nativos de inglês. Dois corpora comparáveis foram compilados para este estudo, a saber, CRAB (Corpus of Research Articles written by Brazilians) e CRAN (Corpus of Research Articles written by Natives), ambos com mais de 300.000 palavras. O processo de compilação dos corpora seguiu rigorosos procedimentos metodológicos embasados em Biber (1993) e McEnery et al. (2006). Os dados, depois de submetidos ao teste estatístico Log-Likelihood, foram analisados utilizando o software AntConc 3.4.2 para uma análise qualitativa. Sete diferentes categorias semânticas dos linking adverbials foram investigados de forma encontrar semelhanças e diferenças na utilização desses elementos linguísticos nos dois corpora. Os resultados mostram que existem diferenças significativas no uso de linking adverbials na escrita acadêmica dos brasileiros em comparação à dos falantes nativos. Essas diferenças dizem respeito tanto à frequência de uso (sobre e sub-uso de algumas formas), quanto à maneira pela qual tais elementos são empregados em textos. Além disso, foi observado que existem linking adverbials, por vezes, mal utilizados nos textos escritos pelos profissionais brasileiros.
Vogel, Ralf, and Marco Zugck. "Counting Markedness : a corpus investigation on German free relative constructions." Universität Potsdam, 2003. http://opus.kobv.de/ubp/volltexte/2009/3247/.
Full textZinsmeister, Heike, and Eva Smolka. "Corpus-based evidence for approximating semantic transparency of complex verbs." Universität Potsdam, 2012. http://opus.kobv.de/ubp/volltexte/2012/6235/.
Full textNOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.
Full textCorpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.
NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.
Full textCorpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.
Lúcio, Denise Delegá. "A variação entre textos argumentativos e o material didático de inglês: aplicações da análise multidimensional e do Corpus Internacional de Aprendizes de Inglês (ICLE)." Pontifícia Universidade Católica de São Paulo, 2013. https://tede2.pucsp.br/handle/handle/13640.
Full textCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
This thesis aims to check the way how argumentative texts produced by English learners vary and, by means of this knowledge, suggest procedures for developing activities for English teaching material. The research resorts to the theoretical framework of Corpus Linguistics, Learner Corpus Linguistics, and Multidimensional Analysis. Our study corpora were the International Corpus of Learner English (ICLE), the Brazilian International Corpus of Learner English (BrICLE), and the Louvain Corpus of Native English Essays (LOCNESS). In the first phase of this research, we checked the way how variation in learner s essays was distributed along the dimensions of English variation proposed by Biber (1988). In the second phase, we identified the specific variation dimensions in leaner s essays, something which resulted in 4 dimensions of variation: dimension 1 literate writing versus narrativelike and oral-like writing; dimension 2 description-driven writing versus action-driven writing; dimension 3 writing focused on thought and report; and dimension 4 qualifying writing. In the third phase, we addressed the linguistic characteristics observed in the dimension literate writing versus narrative-like and oral-like writing to find contents for the teaching activities about variation in texts. In addition to the suggested activities, we present the procedures needed to use results from researches like this for producing language teaching materials
Esta tese tem por objetivo verificar o modo como textos argumentativos produzidos por alunos de inglês variam e, a partir desse conhecimento, sugerir procedimentos para o desenvolvimento de atividades para material didático de inglês. A pesquisa recorre ao arcabouço teórico da Linguística de Corpus, Linguística de Corpus de Aprendiz e Análise Multidimensional. Nossos corpora de estudo foram o International Corpus of Learner English (ICLE), o Brazilian International Corpus of Learner English (BrICLE) e o Louvain Corpus of Native English Essays (LOCNESS). Na primeira fase desta pesquisa, verificamos o modo como a variação nas redações de aprendizes se distribuía nas dimensões de variação do inglês propostas por Biber (1988). Na segunda fase, identificamos as dimensões de variação específicas nas redações de aprendizes, o que resultou em 4 dimensões de variação: dimensão 1 escrita letrada versus escrita narrativizada e oralizada; dimensão 2 escrita com foco na descrição versus escrita com foco no agir; dimensão 3 escrita com foco no pensamento e no relato; e dimensão 4 escrita qualificativa. Na terceira fase, partimos das características linguísticas observadas na dimensão escrita letrada versus escrita narrativizada e oralizada para encontrar conteúdos para as atividades didáticas sobre a variação em textos. Além das atividades sugeridas, apresentamos os procedimentos necessários para utilizar resultados de pesquisas como esta para a produção de materiais didáticos para ensino de línguas
Abe, Mariko. "Syntactic variation across proficiency levels in Japanese EFL learner speech." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/350754.
Full textEd.D.
Overall patterns of language use variation across oral proficiency levels of 1,243 Japanese EFL learners and 20 native speakers of English using the linguistic features set from Biber (1988) were investigated in this study. The approach combined learner corpora, language processing techniques, visual inspection of descriptive statistics, and multivariate statistical analysis to identify characteristics of learner language use. The largest spoken learner corpus in Japan, the National Institute of Information and Communications Technology Japanese Learner English (NICT JLE) Corpus was used for the analysis. It consists of over one million running words of L2 spoken English with oral proficiency level information. The level of the material in the corpus is approximately equal to a Test of English for International Communication (TOEIC) range of 356 to 921. It also includes data gathered from 20 native speakers who performed identical speaking tasks as the learners. The 58 linguistic features (e.g., grammatical features) were taken from the original list of 67 linguistic features in Biber (1988) to explore the variation of learner language. The following research questions were addressed. First, what linguistic features characterize different oral proficiency levels? Second, to what degree do the language features appearing in the spoken production of high proficiency learners match those of native speakers who perform the same task? Third, is the oral production of Japanese EFL learners rich enough to display the full range of features used by Biber? Grammatical features alone would not be enough to comprehensively distinguish oral proficiency levels, but the results of the study show that various types of grammatical features can be used to describe differences in the levels. First, frequency change patterns (i.e., a rising, a falling, a combination of rising, falling, and a plateauing) across the oral proficiency levels were shown through linguistic features from a wide range of categories: (a) part-of-speech (noun, pronoun it, first person pronoun, demonstrative pronoun, indefinite pronoun, possibility modal, adverb, causative adverb), (b) stance markers (emphatic, hedge, amplifier), (c) reduced forms (contraction, stranded preposition), (d) specialized verb class (private verb), complementation (infinitive), (e) coordination (phrasal coordination), (f) passive (agentless passive), and (g) possibly tense and aspect markers (past tense, perfect aspect). In addition, there is a noticeable gap between native and non-native speakers of English. There are six items that native speakers of English use more frequently than the most advanced learners (perfect aspect, place adverb, pronoun it, stranded preposition, synthetic negation, emphatic) and five items that native speakers use less frequently (past tense, first person pronoun, infinitive, possibility modal, analytic negation). Other linguistic features are used with similar frequency across the levels. What is clear is that the speaking tasks and the time allowed for provided ample opportunity for most of Biber’s features to be used across the levels. The results of this study show that various linguistic features can be used to distinguish different oral proficiency levels, and to distinguish the oral language use of native and non-native speakers of English.
Temple University--Theses
White, Sara LuAnne. "Applying Corpus-Assisted Critical Discourse Analysis to an Unrestricted Corpus: A Case Study in Indonesian and Malay Newspapers." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/6478.
Full textMansouri, Aous. "Stative and Stativizing Constructions in Arabic News Reports| A Corpus-Based Study." Thesis, University of Colorado at Boulder, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10108811.
Full textThis dissertation uses a corpus of tokens retrieved from broadcast news stories and print news articles to examine the array of constructions used to encode stative predications in Modern Standard Arabic. A state is defined as a situation that includes its reference time, whether that time is encoding time or another time of orientation. A range of stativity diagnostics are implemented. The constructions analyzed include both those that select for the class of states and those that yield various stative construals of otherwise dynamic predications. The constructions examined range from inflectional constructions to verb-headed phrasal patterns to verbless predicates; a lexicalist implementation of Construction Grammar, Sign Based Construction Grammar, provides a uniform format for representing the constructions as feature-structure descriptions. The constructions include: the p(refix)-stem verb, an inflectional construction exhibiting considerable semantic and syntactic flexibility; participles, including both the Active Participle, which typically yields a progressive reading and sometimes a perfect reading, and the Passive Participle, which yields a perfect reading; non-verbal predicates, which denote various stative relations, including existence, property attribution, possession and deontic modality; and phrasal constructions headed by the auxiliary k?na, which are used to convey past states, irrealis states and resultant states, while serving as a copula in syntactic contexts requiring a copula. A final case study underlines the formal and semantic heterogeneity of the class of Arabic stativizers by examining an emergent idiomatic pattern, the yatimmu construction, which has either a progressive function or a perfect function, depending primarily on subordination. The dissertation shows that in Arabic news narratives, users deploy distinct stative constructions in distinct contexts to convey whatever state is relevant in the context. It demonstrates that constructions convey both tense-based notions (like state ongoing at encoding time) and aspectual notions (state ongoing at the time of another event invoked by the text). In addition, it demonstrates that aspectual constructions are not ‘merely’ aspectual, but instead have constraints relating to argument structure, valency and subordination.
Kirsten, Johanita. "Laaste spore van Nederlands in Afrikaanse werkwoorde / J. Kirsten." Thesis, North-West University, 2013. http://hdl.handle.net/10394/10193.
Full textThesis (M.A. (Afrikaans and Dutch))--North-West University, Vaal Triangle Campus, 2013
Jones, Warwick Alfred. "A corpus-linguistic approach to foreign/second language learning: an experimental study of a new pedagogicmodel for integrating linguistic knowledge with corpus technology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46053372.
Full textBarron, Andrew T. "Exposing Deep-rooted Anger: A Metaphor Pattern Analysis of Mixed Anger Metaphors." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84170/.
Full textAlmujaiwel, Sultan Nasser. "Contrastive lexicology and comparable English-Arabic corpora-based analysis of vague and mistranslated Arabic equivalence : the case of the modern English-Arabic dictionary of al-Mawrid." Thesis, University of Exeter, 2012. http://hdl.handle.net/10871/13141.
Full textBotley, Simon Philip. "Corpora and discourse anaphora : using corpus evidence to test theoretical claims." Thesis, Lancaster University, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322510.
Full textDeignan, Alice. "A corpus-based study of some linguistic features of metaphor." Thesis, University of Birmingham, 1998. http://etheses.bham.ac.uk//id/eprint/831/.
Full textLuzorio, Camilla Canella Moraes. "Gramaticalização e Preposições Complexas do Português: um estudo baseado em corpus." Universidade do Estado do Rio de Janeiro, 2008. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=578.
Full textThe present dissertation introduces a study which applies the theory of Grammaticalization to a digital diachronic corpus, with a view to mapping some of the changes which have taken place in certain structures of Portuguese, the so-called prepositional phrases. The objectives of the research were threefold. First, the study aimed at investigating the complex prepositions em face de, em face a, face a, em vista de, em frente de, em frente a e frente a, in order to understand their syntactic and semantic development and, in turn, to evaluate whether they are undergoing a process of grammaticalization. Secondly, the study sought to examine texts from a variety of historical periods, so as to map a possible trajectory taken by the afore mentioned forms between the 14th and the 20th centuries. Thirdly, the study intended to verify whether the items frente a e face a may be considered reductions of em frente a and em face a, respectively. The theoretical framework for the study has been taken from Grammaticalization, a theory which explains phenomena which affect linguistic items. The process of grammaticalization may consists in one item, lexical or grammatical, becoming more grammatical. The triggering factor in this case is said to be the frequency of use. Corpus Linguistics has provided a methodology for the compilation, extraction and treatment of the textual data in this dissertation. Similarly to Hoffman (2005) the investigation here was based on electronic corpora. The study corpus was the Corpus do Português, which consists of texts in Portuguese, written between the 14th and the 20th century, available at http://www.corpusdoportugues.org/. The study suggests that the complex prepositions analysed have become increasingly grammaticalised, because they have acquired additional abstract meanings. It has also been observed that, in many ways, these abstract meanings coexist as layers. However, there seems to be a tendency for one form to become the preferred way of expressing each of these new meanings
Karageorgou, Ioanna. "Fitness Discourse on Instagram: A Corpus Linguistic Analysis." Thesis, Malmö universitet, Fakulteten för kultur och samhälle (KS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-21671.
Full textYoon, Hyunsook. "An investigation of students' experiences with corpus technology in second language academic writing." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1109806353.
Full textDocument formatted into pages; contains 307 p. Includes bibliographical references. Abstract available online via OhioLINK's ETD Center; full text release delayed at author's request until 2006 March 7.
Dornelas, Aline Bisotti. "Construções de movimento fictivo em Português do Brasil: cognição e corpus." Universidade Federal de Juiz de Fora (UFJF), 2014. https://repositorio.ufjf.br/jspui/handle/ufjf/4638.
Full textApproved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-22T21:41:35Z (GMT) No. of bitstreams: 1 alinebisottidornelas.pdf: 1984615 bytes, checksum: be8ee6306dbf0bfe5a77968a2802f00e (MD5)
Made available in DSpace on 2017-05-22T21:41:35Z (GMT). No. of bitstreams: 1 alinebisottidornelas.pdf: 1984615 bytes, checksum: be8ee6306dbf0bfe5a77968a2802f00e (MD5) Previous issue date: 2014-02-06
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
O presente estudo tem como objetivo descrever e analisar Construções de Movimento Fictivo do Português do Brasil (CMF), do tipo “A estrada vai até a praça...” e “A veia percorre toda a extensão do braço...”. Tais construções utilizam um verbo de movimento associado a um tema estático. Como base teórica, utilizamos pressupostos da Linguística Cognitiva (TALMY, 2000; LANGACKER, 1987, 1999, 2008; FAUCONNIER, 1997; FAUCONNIER; TURNER, 2002) e dos Modelos de Gramática Baseados no Uso (LANGACKER, 1987, 1999, 2008; GOLDBERG, 1995, 2006; GOLDBERG; JACKENDOFF, 2004). Como aporte metodológico, elegemos instrumentos da Linguística de Corpus(SARDINHA, 2004; SILVA, 2008), que forneceram condições para a formação de um corpus específicos das CMF, com 536 ocorrências. A análise subsequente revelou dois padrões formais mais produtivos: (1)[XSNEYVM(ZSP)] (...o cabelo(SNE)ia(VM)até o pé(SP)) e (2) [XSNE YVM ZSN] (A artéria vertebral(SNE) (...) percorre(VM)o restante da coluna(SN)). O padrão (1), com variações, apresentou 34 tipos e 372 ocorrências; o padrão (2), com variações, 16 tipos e 164 ocorrências. Postula–se que a motivação cognitiva das CMF advémdo processo de mesclagem conceptual entreum domínio de experiência de movimento e outrorelacionado visualmente à extensão, o que promove um escaneamento visual da extensão. Essa motivação faz com que, no polo semântico–pragmático, as CMF evoquem uma matriz dominial caracterizadora de espaço físico, focalizando domínios conceptuais de área, dimensão, localização, formato, posição e direção. Pragmaticamente, possuem função descritiva, possibilitando a reconstrução mental da cena estática em questão. Quanto ao ambiente discursivo, as CMF se encontram em maior número nos gêneros ficção e acadêmico e estão relacionadas a tópicos conversacionais como anatomia, turismo, geografia, urbanismo, construção, vestuário e explicação de rotas, que têm como centro a descrição de trajetórias ou outros objetos que são conceptualizados como trajetórias.Assim, nossa análise coloca as CMF como mais um nódulo na rede de construções do PB e procura contribuir com a descrição de nova rede – a rede construcional do movimento. A análise das CMF traz à tona a atuação da mesclagem conceptual na formação de novas construções. Atesta, ainda, a relevância da abordagem da linguagem corporificada proposta pela Linguística Cognitiva e a visão da língua como inventário de construções moldadas pelo uso discursivo.
The present work aims at describing and analyzing the Fictive Motion Constructions of Brazilian Portuguese(FMC) such as “A Estrada vaiaté a praça…” and “A veiapercorretoda a extensão do braço…”. These constructions use a motion verb with a static theme. As theoretical basis we use the constructs of Cognitive Linguistics (TALMY, 2000; LANGACKER, 1987, 1999, 2008; FAUCONNIER, 1997; FAUCONNIER; TURNER, 2002) and the Usage–based Models of Grammar (LANGACKER, 1987, 1999, 2008; GOLDBERG, 1995, 2006; GOLDBERG; JACKENDOFF, 2004). For methodology, we chose Corpus Linguistics instruments (SARDINHA, 2004; SILVA, 2008) that provided conditions for the construction of a specific corpus, containing 536 examples of FM constructions. The analysisledustotwomain formal patterns: (1) [XNPSYVM (ZPP)] (...o cabelo(NPS)ia(VM)até o pé(PP)) e (2) [XNPS YVM ZNP] (A artéria vertebral(NPS) (...) percorre(VM)o restante da coluna(NP)). The first one and its variations presented 34 types and 372 occurrences; the second one, and its variations, 16 types and 164 occurrences. It’s assumed thatCMFs cognitive motivation comes from conceptual blending processes which integrate an experience of motion domain to a visual domain related to the extension described. This integration promotes a visual scanning of this extension. The conceptual motivation allows the FMC to evocate, in its semantic–pragmatic pole, a space qualifier conceptual matrix which focuses on area, dimension, location, shape, position and direction domains. In pragmatic dimension, FM constructions have descriptive function and make possible the mental reconstruction of static scenes. About discursive environment, we found great number of FMC in genres academic and fiction. They are also related to conversational topics such as anatomy, tourism, geography, urbanism, construction, clothing and routes explanations, because these topics have, as its central subject, trajectories or extensions conceptualized as trajectories. Therefore, our analysis locates FMC as a specific construction standard inside the construction network of Brazilian Portuguese. Besides, our work aims at contributing for the description of a new construction network, related to movement verbs. The analysis of FMC brings out the role of conceptual blending at new constructions building. It also attests the relevance of Cognitive Linguistics embodied language approachand the vision of language as an inventory of constructions shaped in discourse.
Celebi, Hatice. "Extracting And Analyzing Impoliteness In Corpora A Study Based On Thebritish National Corpus And The Spoken Turkish Corpus." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615309/index.pdf.
Full textextraction and analysis. Within the CDL framework, the theory or model of impoliteness behind the analysis will be forced by the findings gathered from the extraction of impoliteness. At the extraction level, among the spoken texts in both in BNC and the databases of STC, for the purposes of this study, dialogues that include a conflict or an offending event will be selected. In order to select such dialogues, various methods will be applied. First, spoken texts will be scanned through an initial word query, collocation query, question sentences and tags query, query for imperatives and possible queries that allow for searching for prosodic nuances, as well as interruptions and overlaps to the extent the corpora and the focus of the study allow. Second, metapragmatics comments, conventionalized impoliteness formulae, cues for non-conventionalized implicational impoliteness,conversational patterns, and other cues such as semantic prosody coming into play in the co-text and context are taken into consideration. Once the selection is completed, the insights gathered from the extracted instances of impoliteness will be applied to analyze the data. Impoliteness in both languages will be examined in regards to how impoliteness is triggered, how the progression of impolite exchanges takes place, and how those instances of impoliteness are resolved. Other considerations such as context-determined impoliteness, intentionality of the speaker, and perception of the hearer will be discussed.
Teixeira, Rosana de Barros Silva e. "Termos de (onco)mastologia: uma abordagem mediada por corpus." Pontifícia Universidade Católica de São Paulo, 2011. https://tede2.pucsp.br/handle/handle/13496.
Full textConselho Nacional de Desenvolvimento Científico e Tecnológico
Limited to the research field of Applied Linguistics, articulating area of multiple domains of knowledge, this research, by adding the theoretical and methodological basis of Terminology-communicational language (Communicative Theory of Terminology CTT) and Corpus Linguistics, has the purpose of achieving two goals. The first objective aims to organize a monolingual glossary (same title of the research) designed to scientific journalists. The glossary s purpose is to help these professionals make the scientific terminology understood by non-scientific ones. This initiative is based on the fact that breast cancer causes the most deaths among women in Brazil each year, about 22% of new cases are diagnosed according to Health Institute. In order to get language in use, Corpus Linguistics has been chosen to go to that specialty language by observing empirical data, i.e., in vivo perspective, from a corpus of 563,482 words, according to WordSmith Tools 3.0. To do so, taking into consideration computer softwares available to corpus text, I have decided as a second objective to check the achievement accurancy of four tools (Corpógrafo 4.0, WordSmith Tools 3.0, e-Terms and ZExtractor) in relation to index of positive-candidates (terms). As pointed data, Corpógrafo 4.0 leads this ranking, with 27.56% of accurancy, followed respectively by ZExtractor (26.05%), WordSmith Tools 3.0 (21.77%) and e-Terms (14.44%). In order to make it feasible, it was developed a methodology based on the usage of Microsoft Office Excel 2007 to filter the common candidates extracted among all tools and exclusive ones of each. This data cutting, besides offering support to results achievement, provided the recognition of this methodology as a possible resource in terms of optimizing the extraction of terminology groups, starting from processed lists by two or more programs, since all of them are limited. In this way, 237 terms obtained by unigrams were listed, among which 104 were elected to head the entries that are more relevant in terms of conception
Circunscrita ao campo de investigação da Linguística Aplicada, área articuladora de múltiplos domínios do saber, esta pesquisa, ao agregar pressupostos teórico-metodológicos da Terminologia de base linguístico-comunicacional (Teoria Comunicativa da Terminologia TCT) e da Linguística de Corpus, procurou atingir dois objetivos: o primeiro deles visa à confecção de um glossário monolíngue, cujo título é homônimo ao desta pesquisa, para jornalistas científicos, uma vez que cabe a esses profissionais a tarefa de transformar em inteligível, para o público leigo, a linguagem hermética da ciência. Essa iniciativa baseia-se no fato de ser o câncer de mama o que mais provoca mortes entre as mulheres no Brasil a cada ano, cerca de 22% de novos casos são constatados, segundo o Ministério da Saúde. A fim de partir da língua em uso, a Linguística de Corpus foi escolhida para aceder a essa linguagem de especialidade por meio da observação empírica dos dados, ou seja, numa perspectiva in vivo, a partir de um corpus de 563.482 palavras, segundo o programa WordSmith Tools 3.0. Para tanto, tendo em vista alguns dos programas computacionais disponíveis para processamento de corpus textual, estabeleci, como segundo objetivo, a verificação da acuidade de quatro dessas ferramentas (Corpógrafo 4.0, WordSmith Tools 3.0, e-Termos e ZExtractor) no que tange ao índice de acerto de termos, propriamente, isto é, almejei saber qual delas era mais eficiente na extração de candidatos verdadeiro-positivos. Conforme indicam os dados, o Corpógrafo 4.0 lidera esse ranking, com 27,56% de acerto, seguido, respectivamente, pelo ZExtractor (26,05%), WordSmith Tools 3.0 (21,77%) e e-Termos (14,44%). Com vistas a tornar factível o exame dos candidatos, posto que o total de dados obtidos com as listas geradas pelos programas abrangia milhares de palavras (mais de 10 mil), foi desenvolvida uma metodologia com o auxílio do Microsoft Office Excel 2007 para filtragem dos candidatos comuns entre todas as ferramentas e exclusivos de cada uma. Esse recorte nos dados, além de oferecer subsídios para obtenção dos resultados, propiciou o reconhecimento dessa metodologia como um recurso possivelmente viável, no sentido de otimizar a extração de conjuntos terminológicos a partir de listas processadas por dois ou mais programas, já que, como apontou a análise dos resultados, todos mostraram limitações. Dessa forma, 237 termos, obtidos por meio de unigramas (uma lexia), foram elencados, dentre os quais 104 foram eleitos para encabeçar os verbetes que integram o glossário devido à relevância conceitual que demonstraram comportar
Haertel, Robbie A. "MayanWiki : an online, consensus-based linguistic corpus of the Mayan hieroglyphs /." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2212.pdf.
Full textCrapo, Robert Nishan. "Pun Strategies Across Joke Schemata: A Corpus-Based Study." BYU ScholarsArchive, 2018. https://scholarsarchive.byu.edu/etd/6739.
Full textHnin, Tun San San. "Discourse marking in Burmese and English : a corpus-based approach." Thesis, University of Nottingham, 2006. http://eprints.nottingham.ac.uk/11963/.
Full textPlappert, Gary Lee. "Phraseology and epistemology in scientific writing : a corpus-driven approach." Thesis, University of Birmingham, 2012. http://etheses.bham.ac.uk//id/eprint/3884/.
Full textHe, Yuan William. "A corpus-assisted study on modal verbs in consecutive interpreting." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3953519.
Full textSilveira, Gustavo Estef Lino da. "Análise de quadrigramas na escrita em inglês como língua estrangeira: um estudo baseado em corpus." Universidade do Estado do Rio de Janeiro, 2014. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=6869.
Full textThis study seeks to trace the profile of lexico-grammatical choices of a group of apprentice writers in the city of Rio de Janeiro, between 2009 and 2012. To this end it analyses the apprentices production of 4-grams (or rather blocks of four lexical items used with relative frequency by a number of apprentices) in written compositions, as part of their final assessment. Specifically, the research aimed to analyse whether the 4-grams produced by the apprentices had been taught previously as part of their composition lessons or whether they belonged to some other category. In other words, namely 4-grams already internalized as part of their language use of erroneous 4-grams used frequently and extensively by the subjects investigated. Thus, compositions written by apprentices at the same proficiency level were collected at various branches of a private English school in the city of Rio de Janeiro. Subsequently, these compositions were typed and tagged in order to compile a digital corpus easily identified in terms of type and textual genre, apprentice profile, branch and area of the city of Rio de Janeiro. The study makes use of precepts and methods of Corpus Linguistics, an area of Linguistics that collects large quantities of texts and from them extracts data with the help of a computer programme in order to map use, frequency, distribution and range of a certain linguistic or discursive phenomena. The results demonstrate that the apprentices studied made little use of 4-grams that had been taught them and, collectively, they preferred to use other n-grams that had not been taught in the specific lessons of the level. The study has also shown that when the textual genre is part of ones personal life, the apprentices seem to make use of more previously taught 4-grams. This may lead to believe that the genre may influence the choice of correct lexico-grammatical items. The study creates a research space for the understanding of the importance of lexico-grammatical chunks in L2 writing as a means of ensuring fluency and accuracy in the target language. In addition, it also suggests that more opportunities of practice should be offered to learners so that they become aware of the use of such chunks
Lüdeling, Anke. "Heterogeneity and standardization in data, use, and annotation : a diachronic corpus of german." Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2006/864/.
Full textSuch highly het-erogeneous texts must be standardized to allow for comparative re-search without (too much) loss of information.
Li, Lu. "Copular and complex-transitive constructions in modern written English : a corpus-based study." Thesis, Lancaster University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334660.
Full textStewart, Miranda Mary. "Personal reference and politeness strategies in French and Spanish : a corpus-based approach." Thesis, Heriot-Watt University, 1992. http://hdl.handle.net/10399/1508.
Full textTerry, Devon K. "Linguistics of Russian Media During the 2016 US Election: A Corpus-Based Study." BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/9154.
Full textTolle, Kristin M. "Domain-independent semantic concept extraction using corpus linguistics, statistics and artificial intelligence techniques." Diss., The University of Arizona, 2003. http://hdl.handle.net/10150/280502.
Full text