To see the other types of publications on this topic, follow the link: Corpus linguistics.

Dissertations / Theses on the topic 'Corpus linguistics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Corpus linguistics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Atwell, Eric Steven. "Corpus linguistics and language learning : bootstrapping linguistic knowledge and resources from text." Thesis, University of Leeds, 2008. http://etheses.whiterose.ac.uk/7504/.

Full text
Abstract:
This submission for the award of the degree of PhD by published work must: “make a contribution to knowledge in a coherent and related subject area; demonstrate originality and independent critical ability; satisfy the examiners that it is of sufficient merit to qualify for the award of the degree of PhD.” It includes a selection of my work as a Lecturer (and later, Senior Lecturer) at Leeds University, from 1984 to the present. The overall theme of my research has been bootstrapping linguistic knowledge and resources from text. A persistent strand of interest has been unsupervised and semi-supervised machine learning of linguistic knowledge from textual sources; the attraction of this approach is that I could start with English, but go on to apply analogous techniques to other languages, in particular Arabic. This theme covers a broad range of research over more than 20 years at Leeds University which I have divided into 8 sub-topics: A: Constituent-Likelihood statistical modelling of English grammar; B: Machine Learning of grammatical patterns from a corpus; C: Detecting grammatical errors in English text; D: Evaluation of English grammatical annotation models; E: Machine Learning of semantic language models; F: Applications in English language teaching; G: Arabic corpus linguistics; H: Applications in Computing teaching and research. The first section builds on my early years as a lecturer at Leeds University, when my research was essentially a progression from my previous work at Lancaster University on the LOB Corpus Part-of-Speech Tagging project (which resulted in the Tagged LOB Corpus, a resource for Corpus Linguistics research still in use today); I investigated a range of ideas for extending and/or applying techniques related to Part-of-Speech tagging in Corpus Linguistics. The second section covers a range of co-authored papers representing grant-funded research projects in Corpus Linguistics; in this mode of research, I had to come up with the original ideas and guide the project, but much of the detailed implementation was down to research assistant staff. Another highly productive mode of research has been supervision of research students, leading to further jointly-authored research papers. I helped formulate the research plans, and guided and advised the students; as with research-grant projects, the detailed implementation of the research has been down to the research students. The third section includes a few of the most significant of these jointly-authored Corpus Linguistics research papers. A “standard” PhD generally includes a survey of the field to put the work in context; so as a fourth section, I include some survey papers aimed at introducing new developments in corpus linguistics to a wider audience.
APA, Harvard, Vancouver, ISO, and other styles
2

Harvey, Kevin. "Adolescent health communication: a corpus linguistics approach." Thesis, University of Nottingham, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.491000.

Full text
Abstract:
This study reports on a corpus analysis of a one million word collection of adolescent health emails submitted to an online health forum, the Teenage Health Freak, a UK-based website which provides evidence-based health advice and information for young people. The corpus approach to linguistic analysis integrates both quantitative and qualitative techniques, affording a reliable means of identifying trends and patterns of communication. By examining the common ways in which adolescents construct their health concerns to professionals online, this study aims to describe commonalities in young people's accounts of health, specifically sexual and mental health, thereby giving voice to an age group whose subjective experiences of health and illness nave often been overlooked in favour of older generations.
APA, Harvard, Vancouver, ISO, and other styles
3

Wiesner, Susan L. "Framing dance writing : a corpus linguistics approach." Thesis, University of Surrey, 2007. http://epubs.surrey.ac.uk/974/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cheung, Mei Ling Lisa. "Merging corpus linguistics and collaborative knowledge construction." Thesis, University of Birmingham, 2009. http://etheses.bham.ac.uk//id/eprint/464/.

Full text
Abstract:
This study relates corpus-driven discourse analysis to the concept of collaborative knowledge construction. It demonstrates that the traditional synchronic perspective of meaning in corpus linguistics needs to be complemented by a diachronic dimension. The fundamental assumption underlying this work is that knowledge is understood not within the traditional epistemological framework but from a radical social epistemological perspective, and that incremental knowledge about an object of the discourse corresponds to continual change of meaning of the lexical item that stands for it. This stance is based on the assumption of the discourse as a self-referential system that uses paraphrase as a key device to construct new knowledge. Knowledge is thus seen as the result of collaboration between the members of a discourse community. The thesis presents, in great detail, case studies of asynchronous computer-mediated communication that allow a comprehensive categorisation of a wide range of paraphrase types. It also investigates overt and covert signs of intertextuality linking a new paraphrase to previous contributions. The study then discusses ways in which these new insights concerning the process of collaborative knowledge construction can have an impact on teaching methodologies.
APA, Harvard, Vancouver, ISO, and other styles
5

Doyle, Paul G. "Replicating corpus linguistics : a corpus-driven investigation of lexical networks in text." Thesis, Lancaster University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418685.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Tagg, Caroline. "A corpus linguistics study of SMS text messaging." Thesis, University of Birmingham, 2009. http://etheses.bham.ac.uk//id/eprint/253/.

Full text
Abstract:
This thesis reports a study using a corpus of text messages in English (CorTxt) to explore linguistic features which define texting as a language variety. It focuses on how the language of texting, Txt, is shaped by texters actively fulfilling interpersonal goals. The thesis starts with an overview of the literature on texting, which indicates the need for thorough linguistic investigation of Txt based on a large dataset. It then places texting within the tradition of research into the speech-writing continuum, which highlights limitations of focusing on mode at the expense of other user-variables. The thesis also argues the need for inductive investigation alongside the quantitative corpus-based frameworks that dominate the field. A number of studies are then reported which explore the unconventional nature of Txt. Firstly, drawing on the argument that respelling constitutes a meaning-making resource, spelling variants are retrieved using word-frequency lists and categorised according to form and function. Secondly, identification of everyday creativity in CorTxt challenges studies focusing solely on spelling as a creative resource, and suggests that creativity plays an important role in texting because of, rather than despite, physical constraints. Thirdly, word frequency analysis suggests that the distinct order of the most frequent words in CorTxt can be explained with reference to the frequent phrases in which they occur. Finally, application of a spoken grammar model reveals similarities and differences between spoken and texted interaction. The distinct strands of investigation highlight, on the one hand, the extent to which texting differs from speech and, on the other, the role of user agency, awareness and choice in shaping Txt. The argument is made that this can be explained through performativity and, in particular, the observation that texters perform brevity, speech-like informality and group deviance in construing identities through Txt.
APA, Harvard, Vancouver, ISO, and other styles
7

Alruwaili, Awatif. "Integrating corpus linguistics in second language vocabulary acquisition." Thesis, University of Nottingham, 2018. http://eprints.nottingham.ac.uk/51589/.

Full text
Abstract:
Corpus linguistics has been used for over three decades in language teaching but not until now has it become a mainstream approach to language learning in the classroom. Thus, this thesis explores how the use of corpora can be successfully integrated into the English Foreign Language classroom, specifically in the Saudi classroom context. The integration is explored through two studies. Study One addresses the learners’ actual use of corpora in the classroom for learning general verbs patterns. General verbs patterns are selected through a multi-level approach which consists of a corpus-based approach as a first level, a phraseological approach as a second level and a pedagogical approach as a third level. The study relies on data collected from 51 participants who were at the intermediate level studying general English in the foundation year. The study ran for five weeks and included three training sessions, in which the learners were trained in how to use the corpus resource and how to read and analyse concordance lines and two testing sessions. The participants were tracked via software tracker in both training and testing sessions. The data were collected through tracking logs, activity sheets, reflective forms and interviews. The findings of Study One show that the intermediate-level learners were able to use the corpus resource in the same way as they had been trained, which indicates that the training was successful. The learners were also able to identify general verbs patterns through the use of concordance lines. Most participants had a positive attitude towards the use of corpora in the classroom besides identifying a few difficulties related to the use of corpora. Study Two investigates teachers’ attitudes towards the use of corpora in the classroom which included 56 in-service teachers who attended a training course on the uses of corpora in the classroom. The data collected included questionnaires (pre-course and post-course questionnaires) and interviews. The findings show that the questionnaires had a good reliability value and the teachers’ attitudes were moderately positive towards the use of corpora in the classroom. In addition, Study Two finds that there are some factors that seem to influence teachers’ attitudes, such as the training course, the level of computer literacy and the teachers’ perceptions of their role and learners’ roles within the communicative approach. The interviews constitute an in-depth investigation of teachers’ views about the use of corpora in the classroom by listing possible factors that facilitate or hinder the implementation of corpora in everyday teaching practice. Through the discussion of these findings from Study One and Study Two, a full integration of corpus linguistics into the Saudi classroom is possible taking into consideration the hindrances. These difficulties can be overcome through the offered proposal for implementing the use of corpora in the classroom.
APA, Harvard, Vancouver, ISO, and other styles
8

KORTE, MATTHEW. "Corpus Methods in Interlanguage Analysis." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1218835515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rodrigues, Agnes dos Santos Scaramuzzi. "Posicionamento e linguística forense: uma análise mediada pela Linguística de Corpus." Pontifícia Universidade Católica de São Paulo, 2016. https://tede2.pucsp.br/handle/handle/18899.

Full text
Abstract:
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2016-08-18T13:23:04Z No. of bitstreams: 1 Agnes dos Santos Scaramuzzi-Rodrigues.pdf: 1966394 bytes, checksum: 422e8077709ab2c24354f481450b6ef1 (MD5)
Made available in DSpace on 2016-08-18T13:23:04Z (GMT). No. of bitstreams: 1 Agnes dos Santos Scaramuzzi-Rodrigues.pdf: 1966394 bytes, checksum: 422e8077709ab2c24354f481450b6ef1 (MD5) Previous issue date: 2016-06-09
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
The overall purpose of this research was to investigate the verbal language collected and compiled in an electronic corpus, extracted from a Criminal Lawsuit - Special Part - Crimes Against the Person, Chapter I, Crimes Against Life, tried at the Legal Complex Minister Mario Guimarães Forum in the capital of São Paulo, in 2011. The relevant issue was the homicidal domestic violence. We opted for a murder case perpetrated by a male defendant against his former spouse and mother of his child, staged at the residence of the victim. The object of my study was the set of hearings that were held in two separate instances: Preliminary Hearing and Trial. Our assumption is that in a criminal proceeding, language actors fall into three distinct groups, according to their function in the proceedings: (a) charge, (b) defend and (c) judge. Because they have no access to the acts per se, only to their representations, the actors used language, largely verbal, to report their knowledge of the occurrence. In doing so, they positioned themselves in accordance with from their view of the occurrence and imbue the impact that the events have had on their lives into their accounts. Thus, there are linguistic differences in the speech of actors, according to their function in the proceedings, which can be revealed by the investigation into the stance. Revealing them can elucidate the prosecution‘s and the defense‘s versions of, for example, the profile of the victim and the defendant. Our analysis of these differences is innovative, as we did not find any study on the stance at hearings in Portuguese. As such, we seek to bridge this gap. The specific objectives were: (A) to reveal the different uses of stance by explaining their categories in two parts of speech; (A) adjectives and (b) adverbs ending in ―ly‖; and (B) to find out if there are differences in the uses of stance, according with the role the linguistic actor plays in the proceedings: (a) charge, (b) defend (c) and judge. Research questions were: 1. What is the evidence of the use of stance and its categories in the different parts of speech: (a) adjectives and (b) adverbs ending in ―ly‖? 2. Are there differences in the uses of stance in relation to the role of the linguistic actor played in the proceedings: (a) charge, (b) defend (c) and judge? The theoretical background is based on: (A) Applied Linguistics; (B) Corpus Linguistics, extracting evidence from the use of verbal language through corpus analysed electronically; (C) analysis of stance in light of Biber and Finegan (1988); Biber et al, (1999) and Biber (2006a and 2006b), defined as an expression of feeling, attitudes and judgments that the authors explains about their message; and (D) Forensic Linguistics investigating the language used in the forums. The methodology included fieldwork, use of the electronic tool WORDS and qualitative analysis. The results indicated the following: for the first question, that two stances were discovered in the two parts of speech, and for the second question, that there are differences in the uses of stances related to functions in the proceedings. Given these responses, we conclude that: (a) it is important to investigate the linguistic characteristics of each function in the proceedings in order to understand what is the language used in each of these functions; and (b) identifying different stances in forensic corpus may be useful to assess the quality of the information relayed, for example, by witnesses who impregnates their speech with personal feelings, attitudes and level of knowledge about the fact on trial. We hope to have contributed to the development of new Corpus Linguistics studies, focused on the uses of stances in forensic speech from the research methodology developed herein. We also hope to have contributed to the development of Forensic Linguistics in Brazil by offering our methodology and results, as we adopted the required rigor in our practices. Our final considerations discuss the following: the limitations, developments, future research and proposals for educational applications
O objetivo geral desta pesquisa foi o de investigar a linguagem verbal coletada e compilada em corpus eletrônico oriundo de um Processo Penal - Parte Especial - Dos Crimes Contra a Pessoa, capítulo I, Dos Crimes Contra a Vida, julgado no Complexo Judiciário Fórum Ministro Mário Guimarães, na capital de São Paulo, em 2011. A relevante problemática foi a violência doméstica homicida. Optamos por um processo de homicídio perpetrado por réu do sexo masculino contra sua antiga cônjuge e mãe de seu filho cujo palco do crime fora a residência da vítima. O objeto de estudo foi o conjunto das oitivas processuais que aconteceram em dois momentos: Audiência Preliminar e Julgamento. Nossa pressuposição é que em um Processo Penal, os atores linguísticos compõem três grupos distintos de acordo com sua função processual: (a) acusar, (b) defender e (c) julgar. Porque não se têm acesso aos atos em si, apenas às representações deles, os atores optam por usar a linguagem, em especial, a verbal ao explicitar seu conhecimento sobre a ocorrência. Ao fazê-lo eles se posicionam de modo diferente a partir de suas crenças sobre a ocorrência e impregnam em sua fala o impacto que os fatos tiveram sobre suas vidas. Diante disso, há diferenças linguísticas na fala dos atores de acordo com sua função processual que podem ser reveladas pela investigação do posicionamento. Revelá-las pode elucidar as versões da acusação e da defesa sobre, por exemplo, o perfil da vítima e do réu. Nossa análise dessas diferenças é inovadora, já que não encontramos nenhum estudo do posicionamento em oitivas processuais no Português, assim, buscamos preencher essa lacuna. Os objetivos específicos foram: (A) Revelar os usos de posicionamento explicitando suas categorias em duas classes gramaticais; (a) adjetivos e (b) advérbios terminados em mente; e (B) Descobrir se há diferenças de usos de posicionamento de acordo com a função processual que o ator linguístico exercer: (a) acusar, (b) defender e (c) julgar. As questões de pesquisa foram: 1- Quais são as evidências de usos de posicionamento e suas categorias nas classes gramaticais dos: (a) adjetivos e (b) advérbios terminados em mente? e 2- Há diferenças de usos de posicionamento em relação à função processual que o ator linguístico exercer: (a) acusar, (b) defender e (c) julgar? Adotamos a seguinte fundamentação teórica: (A) Linguística Aplicada; (B) Linguística de Corpus, extraindo evidências de uso da linguagem verbal por meio de corpus analisado eletronicamente; (C) Análise do posicionamento à luz de Biber e Finegan (1988); Biber et al., (1999) e Biber (2006a e 2006b) definido como uma expressão de sentimento, atitudes e julgamentos que o ator explicita sobre sua mensagem; e (D) Linguística Forense que investiga a linguagem que acontece nos fóruns. A metodologia incluiu pesquisa de campo, uso da ferramenta eletrônica PALAVRAS e análise qualitativa. Os resultados indicaram: para a primeira pergunta, que foram descobertos usos de posicionamento nas duas classes gramaticais e, para a segunda, que há diferenças de usos de posicionamento em relação às funções processuais. Diante dessas respostas, concluímos que: (a) é importante investigar as diferenças linguísticas de cada função processual a fim de entendermos qual é a linguagem usada em cada uma dessas funções; e (b) identificar o posicionamento em corpus forense pode ser útil ao avaliar a qualidade da informação transmitida, por exemplo, por uma testemunha que impregna sua fala com seus sentimentos, atitudes e o grau de conhecimento frente ao fato que se julga. Esperamos ter contribuído com novos estudos da Linguística de Corpus focados nos usos do posicionamento no discurso forense a partir da metodologia desenvolvida nesta pesquisa. Almejamos também, ter contribuído com o desenvolvimento da Linguística Forense no Brasil ofertando nossa metodologia e resultados, já que adotamos em nossas práticas o rigor exigido. Nossas considerações finais discutem: as limitações, desdobramentos, pesquisas futuras e, ainda, propostas de aplicações pedagógicas
APA, Harvard, Vancouver, ISO, and other styles
10

Rizomilioti, Vassiliki. "Epistemic modality in academic writing : a corpus-linguistic approach." Thesis, University of Birmingham, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288688.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Martins, Francimary MacÃdo. "CompilaÃÃo, anotaÃÃo e anÃlise linguÃstico-computacional de um corpus de textos literÃrios dos sÃculos XIX e XX: corpus Coelho Neto." Universidade Federal do CearÃ, 2014. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=15313.

Full text
Abstract:
nÃo hÃ
Esta tese à a compilaÃÃo, anotaÃÃo morfossintÃtica e anÃlise linguÃstico-computacional de um corpus de textos literÃrios dos sÃc. XIX e XX: o Corpus Coelho Netto (CCN), contendo textos dos romances A Conquista e TurbilhÃo e contos do livro SertÃo. O trabalho està na interface da LinguÃstica de Corpus e da LinguÃstica Computacional (BERBER SARDINHA, 2000, 2003, 2004, 2005, 2009; BERBER SARDINHA; ALMEIDA, 2008; OLIVEIRA, 2009; BIDERMAN, 1998, 2001; ALUÃSIO; ALMEIDA, 2006; SHEPHERD, 2012; MACENERY E WILSON, 2001; LEECH, 2004; ALVES; TAGNIN, 2012; ALENCAR, 2009, 2010a, 2010b, 2011a, 2011b, 2013a, 2013b). O CCN contÃm 53.080 (cinquenta e trÃs mil e oitenta) tokens (pontuaÃÃo e palavras). A compilaÃÃo consiste nas etapas de seleÃÃo, coleta de textos e manipulaÃÃo; nesta sÃo realizadas a limpeza, ediÃÃo e atualizaÃÃo dos textos (ALUÃSIO; ALMEIDA, 2006), para depois ser submetido à anotaÃÃo morfossintÃtica e anÃlise linguÃstico-computacional, com o objetivo de obter dados que comprovem ou nÃo o uso âexcessivoâ de adjetivos, de verbos e de advÃrbios em âmente, demonstrando a diversidade lexical nos textos de Coelho Netto, constatando se o que a crÃtica modernista dizia a respeito do escritor era procedente. A anotaÃÃo morfossintÃtica foi realizada pelo etiquetador automÃtico Aelius, modelo AeliusHunPos, um software livre em Python que utiliza a biblioteca Natural Language Toolkit â NLTK (BIRD; KLEIN; LOPER, 2009), no prÃ-processamento de textos, na construÃÃo de etiquetador morfossintÃtico e na anotaÃÃo de corpora com auxÃlio de revisÃo humana (ALENCAR, 2010a, 2013a, 2013b), e que foi treinado no Corpus HistÃrico do PortuguÃs Tycho Brahe (CHPTB). A compilaÃÃo e anotaÃÃo do CCN envolve outras aÃÃes como a reavaliaÃÃo da acurÃcia desse etiquetador em textos literÃrios. Os resultados da pesquisa revelaram que: o AeliusHunpos ao anotar os textos do CCN demonstrou maior acurÃcia que em outros textos jà anotados, de 97,9%; que o modelo AeliusHunPos mostrou um desempenho muito alÃm ao anotar os corpora que com o modelo AeliusMaxEnt; e que, apÃs a seleÃÃo e correÃÃo manual dos 10% dos corpora anotados e gerados arquivos padrÃo gold, sugerimos um melhoramento dos aproximados 3% de erros cometidos pelo etiquetador, visando o aumento de sua acurÃcia. Quanto Ãs analises realizadas com os dados obtidos no CCN constatamos que: a diversidade lexical, especificamente quanto a verbos, adjetivos e advÃrbios em âmente, declarada como exagerada pela crÃtica à Coelho Netto nÃo procede, pois seus textos sÃo ricos, mas quando comparados aos textos de AluÃsio Azevedo e Camilo Castelo Branco, o Corpus de ComparaÃÃo, apresentam riqueza vocabular similar ao CCN, como expostos nos resultados.
This thesis is the compilation, morphosyntactic annotation and linguistic and computational analysis of a corpus of literary texts of 19th and 20th centuries: Corpus Coelho Netto (CCN), containing texts of the novels A Conquista and TurbilhÃo and short stories of the book SertÃo. The work is in the Corpus Linguistics and Computational Linguistics interface (BERBER SARDINHA, 2000, 2003, 2004, 2005, 2009; BERBER SARDINHA; ALMEIDA, 2008; OLIVEIRA, 2009; BIDERMAN, 1998, 2001; ALUÃSIO; ALMEIDA, 2006; SHEPHERD, 2012; MACENERY AND WILSON, 2001; LEECH, 2004; ALVES; TAGNIN, 2012; ALENCAR, 2009, 2010a, 2010b, 2011a, 2011b, 2013a, 2013b). The CCN contains 53.080 (fifty-three thousand and eighty) tokens. The compilation consists of the steps selection, collection off texts and handling; in which cleaning, editing and updating of texts (ALUÃSIO; ALMEIDA, 2006), and then be submitted to the morphosyntactic annotation and linguistic-computational analysis, with the goal of obtaining data to show whether or not the "excessive" use of adjectives, verbs and adverbs in ââmenteâ, demonstrating the lexical diversity in Coelho NettoÂs texts, noting if what the modernist critics said about the writer was correct. The annotation was performed by automatic tagger Aelius, AeliusHunPos model, free software in Python that uses the Natural Language Toolkit â NLTK library (BIRD; KLEIN; LOPER, 2009), in the pre-processing of texts, in the construction of morphosyntactic tagger and the automatic annotation of corpora with the help of human review (ALENCAR, 2010a, 2013a, 2013b), and it was trained in the Historical Corpus of Tycho Brahe Portuguese (CHPTB). The compilation and annotation CCN involves other actions such as revaluation the accuracy of this tagger in literary texts. The search results indicated that: AeliusHunpos demonstrated better performance than other texts already noted (97.9 %); AeliusHunPos model showed a far beyond performance by annotating corpora with AeliusMaxEnt model; and that, after selection and manual correction of 10% annotated corpora and generated gold standard files, it is suggested an improvement of the approximate 3% of errors by the tagger, in order to increase its accuracy. Regarding the analyzes performed with the CCN, it was found that: lexical diversity - about verbs, adjectives and adverbs in ââmenteâ considered exaggerated by critics to Coelho Netto unfounded, because his texts are rich, but when compared to the texts by AluÃsio Azevedo and Camilo Castelo Branco, comparison of corpus, present vocabulary richness similar to CCN, as exposed in the results.
APA, Harvard, Vancouver, ISO, and other styles
12

Tang, Haijiang. "Building phrase based language model from large corpus /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20TANG.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 74-79). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
13

Bridle, Marcus. "Error correction through corpus consultation in EAP writing : an analysis of corpus use in a pre-sessional context." Thesis, University of Huddersfield, 2015. http://eprints.hud.ac.uk/id/eprint/24848/.

Full text
Abstract:
This study investigates the effect of corpus consultation on the accuracy of learner written error revisions. It examines the conditions which cause a learner to consult the corpus in correcting errors and whether these revisions are more effective than those made using other corrections methods. Claims have been made for the potential usefulness of corpora in encouraging a better understanding of language through inductive learning (Johns, 1991; Benson, 2001; Watson Todd, 2003). The opportunity for learners to interact with the authentic language used to compile corpora has also been cited as a possible benefit (Thurstun and Candlin, 1998). However, theoretical advantages of using corpus data have not always translated into actual benefits in real learning contexts. Learners frequently encounter difficulties in dealing with the volume of information available to them in concordances and can reject corpus use because it adds to their learning load (Yoon and Hirvela, 2004; Frankenberg Garcia, 2005; Lee and Swales, 2006). This has meant that practical employment of corpus data has sometimes been difficult to implement. In this experiment, learners on a six week pre-sessional English for Academic Purposes (EAP) course were shown how to use the BYU (Brigham Young University) website to access the BNC (British National Corpus) to address written errors. Through a draft/feedback/revision process using meta-linguistic error coding, the frequency, context and effectiveness of the corpus being used as a reference tool was measured. Use of the corpus was found to be limited to a small range of error types which largely involved queries of a pragmatic nature. In these contexts, the corpus was found to be a potentially more effective correction tool than dictionary reference or recourse to previous knowledge and it may have a beneficial effect in encouraging top-down processing skills. However, its frequency of use over the course was low and accounted for only a small proportion of accurate error revisions as a whole. Learner response to the corpus corroborated the negative perception already noted in previous studies. These findings prompt recommendations for further investigation into effective mediation of corpus data within the classroom and continued technological developments in order to make corpus data more accessible to non-specialists.
APA, Harvard, Vancouver, ISO, and other styles
14

Gonçalves, Marcos Antônio. "As formações x-inho nas modalidades oral e escrita: um estudo contrastivo baseado na lingüística de corpus." Universidade do Estado do Rio de Janeiro, 2006. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=59.

Full text
Abstract:
As formações x-inho são descritas, na maioria das gramáticas de Língua Portuguesa como contendo noções dimensiva e afetiva. Entretanto, essas mesmas gramáticas não incluem os fatores extraligüísticos e contextuais nos quais os anunciadores estão inseridos quando optam por uma formação em x-inho. Sob esta perspectiva, tem-se no presente trabalho, o objetivo de investigar a produtividade das formações x-inho em dois corpora eletrônicos: um oral, subdividido em dois subcorpora contendo respectivamente narrativas e descrições e um escrito, oriundo exclusivamente das variadas seções e cadernos de um jornal de grande circulação e qualidade. A dissertação quantifica as ocorrências das formações x-inho em cada um dos corpora. Em seguida cada uma dessas ocorrências é analisada para se verificar que tipo de noção (dimensiva, afetiva positiva ou negativa, intensificadora, etc) ela contem. Por fim são contrastados os dados de freqüência e dispersão de cada uma das noções encontradas para cada um dos corpora. A metodologia de nossa análise está centrada na área de investigação lingüística denominada Lingüística de corpus, que serve de base para que os dados colhidos sejam analisados e interpretados.
The items ending in -inho are described in the majority of grammars of Portuguese as conveying two notions, namely affect and dimension. However, the same grammars do not seem to include either the extralinguistic or contextual factors in which speakers are inserted when they opt for a word ending in -inho. The aim of the present work thus is to investigate the productivity of such items in two electronic corpora: one of an oral nature which is further subdivided into two sub-corpora containing narratives and descriptions and a second one compiled exclusively from the various sections of a widely read quality newspaper. The dissertation quantifies the various instances of items ending in inho in each of the corpora. Next, each of these occurrences is analysed and classified to check which notion (dimentio,positive affect, negative affect, intensification) they convey. Last the results of both frequency and dispension counts are contrasted for each of the corpora. The methodology of our analyses is centered on the area known as Corpus Linguistics, which provides a basis for the data to be compiled and interpreted.
APA, Harvard, Vancouver, ISO, and other styles
15

Caldwell, Joshua Marrinor. "Iconic Semantics in Phonology: A Corpus Study of Japanese Mimetics." BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2368.

Full text
Abstract:
Recent research on Japanese mimetics examines which part of speech the mimetic occurs as. An individual mimetic can appear as a noun, an adjective, an adverb, or a verb (Tsujimura & Deguchi 2007, 340). It is assumed by many scholars that mimetic words essentially function as adverbs (Inose 2007, 98). Few data-based studies exist that quantify the relative frequency of mimetic words in different word categories. Akita (2009) and Caldwell (2009a) have performed small scale or preliminary studies of this aspect of Japanese mimetics. The use of mimetics in other grammatical function categories has been attributed to the polysemous nature of Japanese mimetics (Key 1997). The common explanation is that the flexibility of mimetics is probably due to their iconicity (Sugiyama 2005, 307; Akita 2009; among others). Yet the definition of "iconicity" is often incomplete or cursory in nature. Newmeyer, Nuckolls, Kohn, and Key all accept or suggest the philosophies of C.S. Peirce as a possible explanation or source for understanding the iconicity of mimetic words. The purpose of this thesis is twofold: first, examine the prominent semantic theories regarding Japanese mimetics and show how the philosophies of Peirce can add clarity; second, examine overall occurrence of 1700+ mimetics per parts of speech using the data from the Kotonoha (http://www.kotonoha.gr.jp) and JpWaC (http://corpus.leeds.ac.uk/) Corpora. Peirce identified three distinct icon types: icons of abstract quality (1-1-1), icons of physical instantiation (1-1-2), and icons of abstract relation (1-1-3). These three types correspond to three distinct types of mimetic word: phonomimes (abstract sound qualities), typically predicate modifiers, phenomimes (physical actions), more often nouns or noun modifiers, and psychomimes, (relational), more often verbs or parts of verbs. Corpus data validates the observation that mimetics are usually functioning as predicate modifiers, but also supports Akita's hypothesis that psychomimes are incorporated into verbs more readily than other mimetics, which in turn is explained by the Peircean analysis.
APA, Harvard, Vancouver, ISO, and other styles
16

Thomas, Penelope Leith. "Facebook in the Australian News: a corpus linguistic approach." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/18747.

Full text
Abstract:
This thesis analyses the reporting about Facebook in the Australian newsprint media over time, from 2004 to 2013. Based on linguistic analysis of news values, it investigates how traditional news organisations have presented Facebook as ‘newsworthy’. It makes use of a 104,514 word specialised corpus built specifically for the investigation called the ‘Facebook News Corpus’ (FNC), which consists of Australian news texts that appeared around three main events in the company’s history: 1) the launch of Facebook in Australia on 4 February 2004; 2) the listing of Facebook Inc. on Nasdaq on 18 May 2012; and 3) the introduction of Graph Search on 15 January 2013. The FNC is used to examine how news values are construed around a central topic, representing the first attempt to use corpus linguistics to evaluate news about Facebook. The thesis applies an iterative sequence of corpus linguistic techniques, drawing on quantitative and qualitative methods and analytical frameworks, especially Bednarek and Caple’s (2014) discursive news values analysis (DNVA). The study identifies important news values, clusters of co-occurring news values, and how they are constructed through language. It also provides empirical evidence for shifts in news discourse about Facebook over the three time periods that are investigated. Given the rise of Facebook as a primary news source for its more than two billion users, this information will be useful for future research on the role of social networking sites and their relationship with traditional news organisations.
APA, Harvard, Vancouver, ISO, and other styles
17

Souza, Adílio Junior de. "Lexicalização e neologismo: análise funcional em corpus digital." Universidade Federal da Paraíba, 2015. http://tede.biblioteca.ufpb.br:8080/handle/tede/8403.

Full text
Abstract:
Submitted by Maike Costa (maiksebas@gmail.com) on 2016-07-19T13:59:09Z No. of bitstreams: 1 arquivo total.pdf: 3655935 bytes, checksum: c32c80be0b5d66b04eb2ca7b1e59f308 (MD5)
Made available in DSpace on 2016-07-19T13:59:09Z (GMT). No. of bitstreams: 1 arquivo total.pdf: 3655935 bytes, checksum: c32c80be0b5d66b04eb2ca7b1e59f308 (MD5) Previous issue date: 2015-12-04
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
This dissertation points out how the appearances of the neologisms in a language, by lexicalization, can contribute to enrichment and updating of the lexicon of the same language. Therefore, it looked for: (i) expose the main concepts about lexicon, neologism and lexicalization, based on the Usage-Based Linguistics (UBL), (ii) it presents 13 lexical items selected from the digital corpus and (iii) present the real relevance of the lexicalization for the formation of new words, for to understand how this affects/changes the multi-system. The corpus used was the one of the Project AC/DC: corpo Corpus Brasileiro, which has about one billion words employed in the most varied use contexts. For the fundamentation of the dissertation, some scholars were consulted, among them we highlight: Martelotta (2011), Gonçalves (2011), Contiero and Ferraz (2014), Correia and Almeida (2012), Carvalho (2009a), Biderman (1981), Câmara Jr. (2011), Pontes-Ribeiro (2007), Castilho (2003a; 2003b; 2008), Cunha (2011), Mendes and Seabra (2006), Ferraz (2006; 2007) and Fortunato (2008). The methodology consists in three stages: a) select of lexical elements samples in the corpus, b) extraction of this samples and compilations of them in tables and c) analyses of collected data. The results revealed that some of the 13 lexicalized words/neologisms, possibly, appeared to fulfill an existing space of linguistic signs in the multi-system, others acquired new meanings when used in new contexts of use and many others are in process of disappearance. The frequency of use was determining in the change of meaning.
Esta dissertação aponta como o surgimento dos neologismos em uma língua, pela lexicalização, pode contribuir para o enriquecimento e atualização do léxico desta mesma língua. Deste modo, buscou-se: (i) expor os principais conceitos sobre léxico, neologismo e lexicalização, com base na Linguística Centrada no Uso (LCU), (ii) apresentar 13 itens lexicais selecionados a partir do corpus digital e (iii) discutir a relevância da lexicalização para a formação de novas palavras, para entender como isso afeta/altera o multissistema. O corpus utilizado foi o Projeto AC/DC: corpo Corpus Brasileiro, que contém cerca de um bilhão de palavras empregadas nos mais variados contextos de uso. Para a fundamentação da dissertação, alguns estudiosos foram consultados, entre os quais se destacam: Martelotta (2011), Gonçalves (2011), Contiero e Ferraz (2014), Correia e Almeida (2012), Carvalho (2009a), Biderman (1978; 1981), Câmara Jr. (2011), Pontes-Ribeiro (2007), Castilho (2003a; 2003b; 2008), Cunha (2011), Mendes e Seabra (2006), Ferraz (2006; 2007) e Fortunato (2008). A metodologia consistiu em três etapas: a) coleta de amostras de itens lexicais no corpus, b) extração dessas amostras e compilação em tabelas e c) análise dos dados coletados. Os resultados revelaram que alguns dos 13 neologismos/palavras lexicalizadas, possivelmente, surgiram para preencher um vazio de signos linguísticos no multissistema, outros adquiriram novos sentidos ao serem empregados em novos contextos de uso e outros tantos estão em processo de desaparecimento. A frequência de uso foi determinante para a mudança no sentido.
APA, Harvard, Vancouver, ISO, and other styles
18

Trklja, Aleksandar. "A corpus linguistics study of translation correspondences in English and German." Thesis, University of Birmingham, 2014. http://etheses.bham.ac.uk//id/eprint/4785/.

Full text
Abstract:
This thesis aims at developing an analytical model for differentiation of translation correspondences and for grouping lexical items according to their semantic similarities. The model combines the language in use theory of meaning with the distributional corpus linguistics method. The identification of translation correspondences derives from the exploration of the occurrence of lexical items in the parallel corpus. The classification of translation correspondences into groups is based on the substitution principle, whereas the distinguishing features used to differentiate between lexical items emerge as a result of the study of local contexts in which these lexical items occur. The distinguishing features are analysed with the help of various statistical measurements. The results obtained indicate that the proposed model has advantages over the traditional approaches that rely on the referential theory of meaning. In addition to contributing to lexicology the model also has its applications in practical lexicography and in language teaching.
APA, Harvard, Vancouver, ISO, and other styles
19

Costa, Danilo Duarte. "Linking adverbials in applied linguistics research articles: a corpus-based study." Universidade Federal de Minas Gerais, 2015. http://hdl.handle.net/1843/MGSS-9VKN7F.

Full text
Abstract:
This study aims at investigating the use of linking adverbials (Biber et al., 1999) in research articles written by Brazilian English L2 applied linguists in comparison to those written by English L1 professionals of the same field. Two comparable corpora have been compiled for this study, namely CRAB (Corpus of Research Articles written by Brazilians) and CRAN (Corpus of Research Articles written by Natives), both containing more than 300,000 tokens. The corpora compilation process followed strict methodological procedures based on Biber (1993) and McEnery et al. (2006). The data, after undergoing the Log-Likelihood statistical test, were analysed using the software AntConc 3.4.2 for a qualitative examination. Seven different semantic categories of linking adverbials were investigated so as find similarities and differences in the use of those linguistic elements between the two corpora. The results show that there are significant differences in the use of linking adverbials in Brazilian academic writing in comparison to native speakers. These differences are in both frequency of use (over and underuse of some forms) and the way in which the linking adverbials are employed in texts. In addition, we have found that there are adverbials which are, at times, misused by Brazilian writers.
Este estudo se propõe a investigar o uso de linking adverbials (Biber et al., 1999) em artigos científicos de linguística aplica escritos em inglês por brasileiros, em comparação com aqueles escritos por falantes nativos de inglês. Dois corpora comparáveis foram compilados para este estudo, a saber, CRAB (Corpus of Research Articles written by Brazilians) e CRAN (Corpus of Research Articles written by Natives), ambos com mais de 300.000 palavras. O processo de compilação dos corpora seguiu rigorosos procedimentos metodológicos embasados em Biber (1993) e McEnery et al. (2006). Os dados, depois de submetidos ao teste estatístico Log-Likelihood, foram analisados utilizando o software AntConc 3.4.2 para uma análise qualitativa. Sete diferentes categorias semânticas dos linking adverbials foram investigados de forma encontrar semelhanças e diferenças na utilização desses elementos linguísticos nos dois corpora. Os resultados mostram que existem diferenças significativas no uso de linking adverbials na escrita acadêmica dos brasileiros em comparação à dos falantes nativos. Essas diferenças dizem respeito tanto à frequência de uso (sobre e sub-uso de algumas formas), quanto à maneira pela qual tais elementos são empregados em textos. Além disso, foi observado que existem linking adverbials, por vezes, mal utilizados nos textos escritos pelos profissionais brasileiros.
APA, Harvard, Vancouver, ISO, and other styles
20

Vogel, Ralf, and Marco Zugck. "Counting Markedness : a corpus investigation on German free relative constructions." Universität Potsdam, 2003. http://opus.kobv.de/ubp/volltexte/2009/3247/.

Full text
Abstract:
This paper reports the results of a corpus investigation on case conflicts in German argument free relative constructions. We investigate how corpus frequencies reflect the relative markedness of free relative and correlative constructions, the relative markedness of different case conflict configurations, and the relative markedness of different conflict resolution strategies. Section 1 introduces the conception of markedness as used in Optimality Theory. Section 2 introduces the facts about German free relative clauses, and section 3 presents the results of the corpus study. By and large, markedness and frequency go hand in hand. However, configurations at the highest end of the markedness scale rarely show up in corpus data, and for the configuration at the lowest end we found an unexpected outcome: the more marked structure is preferred.
APA, Harvard, Vancouver, ISO, and other styles
21

Zinsmeister, Heike, and Eva Smolka. "Corpus-based evidence for approximating semantic transparency of complex verbs." Universität Potsdam, 2012. http://opus.kobv.de/ubp/volltexte/2012/6235/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.

Full text
Abstract:
La Linguistica dei corpora - che fa uso di corpora elettronici annotati per lo studio delle lingue - è un approccio ormai diffuso e consolidato. I corpora paralleli, in particolare, in cui i testi in una lingua A sono allineati con la traduzione in lingua B, sono uno strumento molto utile nell’analisi contrastiva. La mancata disponibilità di corpora paralleli di qualità per le lingue di nostro interesse - russo e italiano - ci ha portati a volere ampliare e migliorare il corpus parallelo italiano-russo presente come corpus pilota nel Nacional’nyj Korpus Russkogo Jazyka (Corpus Nazionale della Lingua Russa). Il presente lavoro ha avuto pertanto uno scopo applicativo e uno teorico. Da un lato, dopo aver studiato le questioni imprescindibili per la progettazione di un corpus di qualità, sono stati stabiliti i criteri per l’ampliamento e inseriti nuovi testi, consentendo così al corpus parallelo di passare da 700.000 a più di 4 milioni di parole, entità che consente ora di condurre ricerche scientificamente valide. In seguito, sono state proposte tre analisi corpus-based così da mettere in luce le potenzialità del corpus ampliato: lo studio dei verbi prefissali di memoria russi e la loro resa in italiano; il confronto tra il causativo analitico italiano “fare + infinito” e il causativo russo; l’analisi comparata di quindici versioni italiane de Il Cappotto di N. Gogol’. Le tre analisi hanno consentito di avanzare innanzitutto osservazioni di carattere metodologico in vista di un ulteriore ampliamento e miglioramento del corpus parallelo italiano-russo. In secondo luogo, la prospettiva corpus-based si è dimostrata utile per approfondire lo studio di questi temi dal punto di vista teorico.
Corpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.
APA, Harvard, Vancouver, ISO, and other styles
23

NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.

Full text
Abstract:
La Linguistica dei corpora - che fa uso di corpora elettronici annotati per lo studio delle lingue - è un approccio ormai diffuso e consolidato. I corpora paralleli, in particolare, in cui i testi in una lingua A sono allineati con la traduzione in lingua B, sono uno strumento molto utile nell’analisi contrastiva. La mancata disponibilità di corpora paralleli di qualità per le lingue di nostro interesse - russo e italiano - ci ha portati a volere ampliare e migliorare il corpus parallelo italiano-russo presente come corpus pilota nel Nacional’nyj Korpus Russkogo Jazyka (Corpus Nazionale della Lingua Russa). Il presente lavoro ha avuto pertanto uno scopo applicativo e uno teorico. Da un lato, dopo aver studiato le questioni imprescindibili per la progettazione di un corpus di qualità, sono stati stabiliti i criteri per l’ampliamento e inseriti nuovi testi, consentendo così al corpus parallelo di passare da 700.000 a più di 4 milioni di parole, entità che consente ora di condurre ricerche scientificamente valide. In seguito, sono state proposte tre analisi corpus-based così da mettere in luce le potenzialità del corpus ampliato: lo studio dei verbi prefissali di memoria russi e la loro resa in italiano; il confronto tra il causativo analitico italiano “fare + infinito” e il causativo russo; l’analisi comparata di quindici versioni italiane de Il Cappotto di N. Gogol’. Le tre analisi hanno consentito di avanzare innanzitutto osservazioni di carattere metodologico in vista di un ulteriore ampliamento e miglioramento del corpus parallelo italiano-russo. In secondo luogo, la prospettiva corpus-based si è dimostrata utile per approfondire lo studio di questi temi dal punto di vista teorico.
Corpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.
APA, Harvard, Vancouver, ISO, and other styles
24

Lúcio, Denise Delegá. "A variação entre textos argumentativos e o material didático de inglês: aplicações da análise multidimensional e do Corpus Internacional de Aprendizes de Inglês (ICLE)." Pontifícia Universidade Católica de São Paulo, 2013. https://tede2.pucsp.br/handle/handle/13640.

Full text
Abstract:
Made available in DSpace on 2016-04-28T18:22:45Z (GMT). No. of bitstreams: 1 Denise Delega Lucio.pdf: 5910043 bytes, checksum: e39fb054a13db1353de44dcdf6626c8b (MD5) Previous issue date: 2013-10-17
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
This thesis aims to check the way how argumentative texts produced by English learners vary and, by means of this knowledge, suggest procedures for developing activities for English teaching material. The research resorts to the theoretical framework of Corpus Linguistics, Learner Corpus Linguistics, and Multidimensional Analysis. Our study corpora were the International Corpus of Learner English (ICLE), the Brazilian International Corpus of Learner English (BrICLE), and the Louvain Corpus of Native English Essays (LOCNESS). In the first phase of this research, we checked the way how variation in learner s essays was distributed along the dimensions of English variation proposed by Biber (1988). In the second phase, we identified the specific variation dimensions in leaner s essays, something which resulted in 4 dimensions of variation: dimension 1 literate writing versus narrativelike and oral-like writing; dimension 2 description-driven writing versus action-driven writing; dimension 3 writing focused on thought and report; and dimension 4 qualifying writing. In the third phase, we addressed the linguistic characteristics observed in the dimension literate writing versus narrative-like and oral-like writing to find contents for the teaching activities about variation in texts. In addition to the suggested activities, we present the procedures needed to use results from researches like this for producing language teaching materials
Esta tese tem por objetivo verificar o modo como textos argumentativos produzidos por alunos de inglês variam e, a partir desse conhecimento, sugerir procedimentos para o desenvolvimento de atividades para material didático de inglês. A pesquisa recorre ao arcabouço teórico da Linguística de Corpus, Linguística de Corpus de Aprendiz e Análise Multidimensional. Nossos corpora de estudo foram o International Corpus of Learner English (ICLE), o Brazilian International Corpus of Learner English (BrICLE) e o Louvain Corpus of Native English Essays (LOCNESS). Na primeira fase desta pesquisa, verificamos o modo como a variação nas redações de aprendizes se distribuía nas dimensões de variação do inglês propostas por Biber (1988). Na segunda fase, identificamos as dimensões de variação específicas nas redações de aprendizes, o que resultou em 4 dimensões de variação: dimensão 1 escrita letrada versus escrita narrativizada e oralizada; dimensão 2 escrita com foco na descrição versus escrita com foco no agir; dimensão 3 escrita com foco no pensamento e no relato; e dimensão 4 escrita qualificativa. Na terceira fase, partimos das características linguísticas observadas na dimensão escrita letrada versus escrita narrativizada e oralizada para encontrar conteúdos para as atividades didáticas sobre a variação em textos. Além das atividades sugeridas, apresentamos os procedimentos necessários para utilizar resultados de pesquisas como esta para a produção de materiais didáticos para ensino de línguas
APA, Harvard, Vancouver, ISO, and other styles
25

Abe, Mariko. "Syntactic variation across proficiency levels in Japanese EFL learner speech." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/350754.

Full text
Abstract:
Teaching & Learning
Ed.D.
Overall patterns of language use variation across oral proficiency levels of 1,243 Japanese EFL learners and 20 native speakers of English using the linguistic features set from Biber (1988) were investigated in this study. The approach combined learner corpora, language processing techniques, visual inspection of descriptive statistics, and multivariate statistical analysis to identify characteristics of learner language use. The largest spoken learner corpus in Japan, the National Institute of Information and Communications Technology Japanese Learner English (NICT JLE) Corpus was used for the analysis. It consists of over one million running words of L2 spoken English with oral proficiency level information. The level of the material in the corpus is approximately equal to a Test of English for International Communication (TOEIC) range of 356 to 921. It also includes data gathered from 20 native speakers who performed identical speaking tasks as the learners. The 58 linguistic features (e.g., grammatical features) were taken from the original list of 67 linguistic features in Biber (1988) to explore the variation of learner language. The following research questions were addressed. First, what linguistic features characterize different oral proficiency levels? Second, to what degree do the language features appearing in the spoken production of high proficiency learners match those of native speakers who perform the same task? Third, is the oral production of Japanese EFL learners rich enough to display the full range of features used by Biber? Grammatical features alone would not be enough to comprehensively distinguish oral proficiency levels, but the results of the study show that various types of grammatical features can be used to describe differences in the levels. First, frequency change patterns (i.e., a rising, a falling, a combination of rising, falling, and a plateauing) across the oral proficiency levels were shown through linguistic features from a wide range of categories: (a) part-of-speech (noun, pronoun it, first person pronoun, demonstrative pronoun, indefinite pronoun, possibility modal, adverb, causative adverb), (b) stance markers (emphatic, hedge, amplifier), (c) reduced forms (contraction, stranded preposition), (d) specialized verb class (private verb), complementation (infinitive), (e) coordination (phrasal coordination), (f) passive (agentless passive), and (g) possibly tense and aspect markers (past tense, perfect aspect). In addition, there is a noticeable gap between native and non-native speakers of English. There are six items that native speakers of English use more frequently than the most advanced learners (perfect aspect, place adverb, pronoun it, stranded preposition, synthetic negation, emphatic) and five items that native speakers use less frequently (past tense, first person pronoun, infinitive, possibility modal, analytic negation). Other linguistic features are used with similar frequency across the levels. What is clear is that the speaking tasks and the time allowed for provided ample opportunity for most of Biber’s features to be used across the levels. The results of this study show that various linguistic features can be used to distinguish different oral proficiency levels, and to distinguish the oral language use of native and non-native speakers of English.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
26

White, Sara LuAnne. "Applying Corpus-Assisted Critical Discourse Analysis to an Unrestricted Corpus: A Case Study in Indonesian and Malay Newspapers." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/6478.

Full text
Abstract:
In 2008, Baker et al. proposed a nine-step method that combines quantitative corpus linguistics with qualitative critical discourse analysis. To date this cycle has only been used to analyze a single language with a restricted corpus. Can this method, originally designed for this narrow focus, be applied cross-culturally to an unrestricted corpus? There are two over-arching goals for this paper, one linguistic and one methodological. The first goal is to learn about language ideologies in Indonesian and Malay newspapers; the second goal is to evaluate the efficacy of a mixed-methods corpus-driven approach to discourse analysis using the methods proposed by Baker et al. Our research will be based on the cross-cultural analysis of two 4-million-word corpora of newspaper articles; one Indonesian and one Malay. Malaysia and Indonesia are home to two peoples, living side by side and sharing a common language background, but reacting to the Islamic fundamentalist movement in different ways. Applying Baker et al.'s cycle, we will use keyword analysis, collocation, concordance lines, and qualitative analysis in this study. Whereas Baker employed a corpus restricted to articles about refugees, asylum seekers, immigrants, and migrants, our corpus encompasses articles on any topic; whereas their study focused solely on English, ours will compare Indonesian and Malay. To build a "useful methodological synergy" between qualitative and quantitative analysis (Baker, et al., 2008), this corpus-driven study will consider how Islam and related terms are being represented by government, historical, and religious sources. The results of this study will help us discern how these two countries are reacting to the fundamentalist movement. This study will also help evaluate the applicability of Baker et al.'s proposed methods to other types of sociolinguistic research and bring to light any modifications that could be made.
APA, Harvard, Vancouver, ISO, and other styles
27

Mansouri, Aous. "Stative and Stativizing Constructions in Arabic News Reports| A Corpus-Based Study." Thesis, University of Colorado at Boulder, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10108811.

Full text
Abstract:

This dissertation uses a corpus of tokens retrieved from broadcast news stories and print news articles to examine the array of constructions used to encode stative predications in Modern Standard Arabic. A state is defined as a situation that includes its reference time, whether that time is encoding time or another time of orientation. A range of stativity diagnostics are implemented. The constructions analyzed include both those that select for the class of states and those that yield various stative construals of otherwise dynamic predications. The constructions examined range from inflectional constructions to verb-headed phrasal patterns to verbless predicates; a lexicalist implementation of Construction Grammar, Sign Based Construction Grammar, provides a uniform format for representing the constructions as feature-structure descriptions. The constructions include: the p(refix)-stem verb, an inflectional construction exhibiting considerable semantic and syntactic flexibility; participles, including both the Active Participle, which typically yields a progressive reading and sometimes a perfect reading, and the Passive Participle, which yields a perfect reading; non-verbal predicates, which denote various stative relations, including existence, property attribution, possession and deontic modality; and phrasal constructions headed by the auxiliary k?na, which are used to convey past states, irrealis states and resultant states, while serving as a copula in syntactic contexts requiring a copula. A final case study underlines the formal and semantic heterogeneity of the class of Arabic stativizers by examining an emergent idiomatic pattern, the yatimmu construction, which has either a progressive function or a perfect function, depending primarily on subordination. The dissertation shows that in Arabic news narratives, users deploy distinct stative constructions in distinct contexts to convey whatever state is relevant in the context. It demonstrates that constructions convey both tense-based notions (like state ongoing at encoding time) and aspectual notions (state ongoing at the time of another event invoked by the text). In addition, it demonstrates that aspectual constructions are not ‘merely’ aspectual, but instead have constraints relating to argument structure, valency and subordination.

APA, Harvard, Vancouver, ISO, and other styles
28

Kirsten, Johanita. "Laaste spore van Nederlands in Afrikaanse werkwoorde / J. Kirsten." Thesis, North-West University, 2013. http://hdl.handle.net/10394/10193.

Full text
Abstract:
In the diachronic studies of Afrikaans in the past, the focus used to be on the origin and early development of Afrikaans from Dutch. During the twentieth century, the philological school, with a tradition of researching all Cape-Dutch coloured texts in detail, was established through the work of J. du P. Scholtz and his students. Through their analyses, they estimated the stabilisation of Afrikaans as early as the end of the eighteenth century (for example Raidt, 1991:145; Ponelis, 1994:229). In the past few decades, however, this estimation has begun to receive criticism from other scholars, including Roberge (1994:159) and Deumert (2004:20). With the help of a corpus, Deumert (2004) has shown that there is substantial variation in Afrikaans letters as late as the early twentieth century, and this study expands on her work by researching the variation in published writing. This is done by focusing on verbs, as there is significant change from the Dutch verbal system to the Afrikaans verbal system. This study uses corpus linguistic research methods, and researches Dutch-Afrikaans variation in verbs in published Afrikaans texts, compiled in three corpora. The main corpus was compiled from all the Afrikaans writings of Totius (J.D. du Toit) in the publication Het Kerkblad from 1916 to 1922. Two control corpora are also used: the first was compiled from excerpts from published Afrikaans books for the same period, and the second was compiled from excerpts from Afrikaans periodicals for the same period. In order to compensate for the shortcomings of corpus data alone, normative works on Afrikaans from the relevant period are also taken into account, and there is shown which recommendations these works made about the relevant constructions, and how the corpus data correlates with these recommendations. Variation in six verbal constructions are analysed in this study: 1. End consonant t/n (for example gaat/gaan): the old (more Dutch) word forms are scarcely used in the corpora, while the modern Afrikaans word forms are almost fully established. 2. End consonant g (for example seg/sê): the old word forms are also scarcely used in the corpora, while the modern word forms take the lead. 3. Stem vowel (for example breng/bring): the old word forms are more frequent at the beginning of the period, followed by some uncertainty, with the modern word forms taking over by the end of the period. 4. Preterite (specifically had/gehad and werd/geword): there is great instability throughout, worsened by a distinction in use between main verbs and auxiliary verbs made by some authors. 5. Past participle (for example gedaan/gedoen): there is significant instability at the beginning of the period, but the modern word forms are used more frequently by the end of the period. 6. Perfect tense auxiliary verb (is/het): the old form is still used in the corpora, but the modern form is more frequent from the beginning, and becomes even more frequent towards the end. This data shows that there was still significant variation in Afrikaans under Dutch influence as late as the early twentieth century, and the correlation between the different corpora implies that the written language might have been much closer to the spoken language than had been previously assumed. It is further confirmed by the amount of attention this variation gets in the normative works from that period.
Thesis (M.A. (Afrikaans and Dutch))--North-West University, Vaal Triangle Campus, 2013
APA, Harvard, Vancouver, ISO, and other styles
29

Jones, Warwick Alfred. "A corpus-linguistic approach to foreign/second language learning: an experimental study of a new pedagogicmodel for integrating linguistic knowledge with corpus technology." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46053372.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Barron, Andrew T. "Exposing Deep-rooted Anger: A Metaphor Pattern Analysis of Mixed Anger Metaphors." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84170/.

Full text
Abstract:
This project seeks to serve two purposes: first, to investigate various semantic and grammatical aspects of mixed conceptual metaphors in reference to anger; and secondly, to explore the potential of a corpus-based, TARGET DOMAIN-oriented method termed metaphor pattern analysis to the study of mixed metaphor. This research shows that mixed metaphors do not pattern in a manner consistent with statements made within conceptual metaphor theory. These metaphors prove highly dynamic in their combinability and resist resonance between SOURCE DOMAINS used. Also shown is the viability of metaphor pattern analysis as a methodology to approach mixed metaphor research.
APA, Harvard, Vancouver, ISO, and other styles
31

Almujaiwel, Sultan Nasser. "Contrastive lexicology and comparable English-Arabic corpora-based analysis of vague and mistranslated Arabic equivalence : the case of the modern English-Arabic dictionary of al-Mawrid." Thesis, University of Exeter, 2012. http://hdl.handle.net/10871/13141.

Full text
Abstract:
The main concern in this research is to reveal the existence of shortcomings in the representation of meaning in the equivalents provided in a given context of the bilingual English-Arabic dictionary of al-Mawrid (Ba<albaki 2005), and to disclose the contributions made in Contrastive Lexicology, Bilingual Lexicography, Translation Theory, Corpus Linguistics and Contrastive Linguistics, in an attempt to come up with a more suitable framework, based on bilingual lexicology and corpora-based approaches, for the analysis of equivalence in English-Arabic by means of computerized corpora, especially by what is known as comparable corpora. This research is divided into 6 Chapters. The introduction, Chapter 1, provides the statement of the research problem, the rationale, the objectives and the questions of the study. Chapter 2 discusses three issues: (i) the terms used to refer to the word; (ii) the semantic analysis and relations of the word; and (iii) the disciplines of bilingual lexicography, translation studies and contrastive linguistics, and their respective contributions to the central notion of equivalence in the bilingual dictionary. The discussion about the last issue will pave the way for using comparable corpora in the investigation of selected entries and their equivalents in the given context. It will also show how useful and effective such an approach is in criticising existing Arabic equivalents in al-Mawrid (2005). Chapter 3 is a review of the bilingual English-Arabic dictionary of al-Mawrid in terms of its purpose and the representation of meanings and entries. It also includes an overview of previous reviews. The aim is to provide and develop a new critical framework of al-Mawrid by a new multi-approach to equivalence in the English-Arabic dictionary, as given in Chapter 4: this is mainly based on comparable English-Arabic corpora, and the criteria for making two individual corpora comparable rather than parallel. Chapters 5 and 6 are dedicated to the analysis of equivalents which are found to be either vague (see Chapter 5) or a mistranslation (see Chapter 6) in a given context.
APA, Harvard, Vancouver, ISO, and other styles
32

Botley, Simon Philip. "Corpora and discourse anaphora : using corpus evidence to test theoretical claims." Thesis, Lancaster University, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Deignan, Alice. "A corpus-based study of some linguistic features of metaphor." Thesis, University of Birmingham, 1998. http://etheses.bham.ac.uk//id/eprint/831/.

Full text
Abstract:
Recent studies of metaphor have stressed both its importance to thinking and its pervasiveness in language. A number of researchers now claim that metaphorical transfer often connects semantic domains at the level of thought. This has implications for formal features of individual linguistic metaphors and for the lexical relations holding between them. The linguistic data used by metaphor researchers has largely been either intuitively derived or taken from small hand-sorted collections of texts. As yet, there have been few attempts to systematically examine metaphorical linguistic expressions in non-literary corpus data. In this thesis I use corpus data to examine a number of polysemous lexemes and I attempt to establish whether their metaphorical meanings, the lexical relations holding between these meanings, and aspects of their collocational and syntactic behaviour can be accounted for by a theory of metaphor as conceptual mapping. The investigation comprises a number of studies of non-innovative metaphorical expressions and their literal counterparts. I conclude that the contemporary theory of conceptual metaphorical mapping accounts for some features of linguistic metaphor but that it does not completely explain the data.
APA, Harvard, Vancouver, ISO, and other styles
34

Luzorio, Camilla Canella Moraes. "Gramaticalização e Preposições Complexas do Português: um estudo baseado em corpus." Universidade do Estado do Rio de Janeiro, 2008. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=578.

Full text
Abstract:
Este trabalho apresenta um estudo que aplica a teoria de gramaticalização a um corpus eletrônico diacrônico a fim de dar conta das mudanças ocorridas em estruturas da língua portuguesa normalmente denominadas Preposições Complexas. O estudo teve como objetivos: 1) investigar as preposições complexas em face de, em face a, face a, em vista de, em frente de, em frente a e frente a com vistas a compreender seu funcionamento em termos sintáticos e semânticos a fim de verificar se elas estão se gramaticalizando; 2) examinar textos de períodos históricos diferentes de modo que se compreenda a possível trajetória empreendida por tais formas entre os séculos XIV e XX; 3) averiguar se os itens frente a e face a podem ser considerados reduções das formas em frente a e em face a, respectivamente. A teoria da gramaticalização forneceu um arcabouço teórico para explicar os fenômenos de mudança que afetam os itens lingüísticos. O processo de gramaticalização consiste na passagem de uma construção de um status lexical para um status gramatical ou de um status menos gramatical para um mais gramatical. Um dos fatores desencadeantes desse processo é a freqüência de uso que leva o item a ser mais previsível e estável. A Lingüística de Corpus entra nesta pesquisa fornecendo a metodologia de compilação, extração e observação dos dados, pois à semelhança dos estudos de Hoffman (2005) foi realizada uma investigação baseada em corpora eletrônicos. O corpus base foi o Corpus do Português, composto por textos em língua portuguesa escritos a partir do século XIV até o século XX, disponível online em http://www.corpusdoportugues.org/. Verificou-se que as preposições complexas analisadas ascenderam a escala de gramaticalidade, pois se expandiram suas possibilidades de uso através do desenvolvimento de polissemias de semântica abstrata. Constatou-se, ainda, que, em muitos sentidos, elas coexistem como camadas, mas que pode haver uma tendência que conduzirá a escolha de uma forma para expressar cada sentido evidenciado
The present dissertation introduces a study which applies the theory of Grammaticalization to a digital diachronic corpus, with a view to mapping some of the changes which have taken place in certain structures of Portuguese, the so-called prepositional phrases. The objectives of the research were threefold. First, the study aimed at investigating the complex prepositions em face de, em face a, face a, em vista de, em frente de, em frente a e frente a, in order to understand their syntactic and semantic development and, in turn, to evaluate whether they are undergoing a process of grammaticalization. Secondly, the study sought to examine texts from a variety of historical periods, so as to map a possible trajectory taken by the afore mentioned forms between the 14th and the 20th centuries. Thirdly, the study intended to verify whether the items frente a e face a may be considered reductions of em frente a and em face a, respectively. The theoretical framework for the study has been taken from Grammaticalization, a theory which explains phenomena which affect linguistic items. The process of grammaticalization may consists in one item, lexical or grammatical, becoming more grammatical. The triggering factor in this case is said to be the frequency of use. Corpus Linguistics has provided a methodology for the compilation, extraction and treatment of the textual data in this dissertation. Similarly to Hoffman (2005) the investigation here was based on electronic corpora. The study corpus was the Corpus do Português, which consists of texts in Portuguese, written between the 14th and the 20th century, available at http://www.corpusdoportugues.org/. The study suggests that the complex prepositions analysed have become increasingly grammaticalised, because they have acquired additional abstract meanings. It has also been observed that, in many ways, these abstract meanings coexist as layers. However, there seems to be a tendency for one form to become the preferred way of expressing each of these new meanings
APA, Harvard, Vancouver, ISO, and other styles
35

Karageorgou, Ioanna. "Fitness Discourse on Instagram: A Corpus Linguistic Analysis." Thesis, Malmö universitet, Fakulteten för kultur och samhälle (KS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-21671.

Full text
Abstract:
Fitness relates to several life aspects, such as health and exercise. Because of its vast popularity, it is often referred to as a ‘fitness trend’ where the body has a central role. Due to technological advances, fitness has found its way into mobile applications and Social Network Sites (SNSs), prompting the linguistic analysis of these environments. This study investigates how female fitness is discussed by female personal trainers (PTs) online. A mixed approach of quantitative methodology (Corpus Linguistics) and qualitative textual analysis (Discourse Analysis) was adopted. Following Baker’s corpus-driven approach (2006), a specialised corpus was compiled with a total of 440 posts (51,779 tokens) from the Instagram accounts of three female professional PTs. Various patterns were presented under four themes: mind and body, physical strength, empowerment, and the FITNESS IS A JOURNEY metaphor. The most salient patterns discussed were health, aesthetics, weight-loss, and body-representation. There was strong evidence of other trends (‘fitspiration’, ‘HAES’, and ‘body positivity’) which promote a positive body image and strength (physical and mental) as a health indicator. In sum, the findings provide a female PT’s perspective on fitness and show how female fitness is promoted by encouraging positive narratives around fitness, the body and ourselves.
APA, Harvard, Vancouver, ISO, and other styles
36

Yoon, Hyunsook. "An investigation of students' experiences with corpus technology in second language academic writing." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1109806353.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2005.
Document formatted into pages; contains 307 p. Includes bibliographical references. Abstract available online via OhioLINK's ETD Center; full text release delayed at author's request until 2006 March 7.
APA, Harvard, Vancouver, ISO, and other styles
37

Dornelas, Aline Bisotti. "Construções de movimento fictivo em Português do Brasil: cognição e corpus." Universidade Federal de Juiz de Fora (UFJF), 2014. https://repositorio.ufjf.br/jspui/handle/ufjf/4638.

Full text
Abstract:
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-22T19:34:17Z No. of bitstreams: 1 alinebisottidornelas.pdf: 1984615 bytes, checksum: be8ee6306dbf0bfe5a77968a2802f00e (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-22T21:41:35Z (GMT) No. of bitstreams: 1 alinebisottidornelas.pdf: 1984615 bytes, checksum: be8ee6306dbf0bfe5a77968a2802f00e (MD5)
Made available in DSpace on 2017-05-22T21:41:35Z (GMT). No. of bitstreams: 1 alinebisottidornelas.pdf: 1984615 bytes, checksum: be8ee6306dbf0bfe5a77968a2802f00e (MD5) Previous issue date: 2014-02-06
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
O presente estudo tem como objetivo descrever e analisar Construções de Movimento Fictivo do Português do Brasil (CMF), do tipo “A estrada vai até a praça...” e “A veia percorre toda a extensão do braço...”. Tais construções utilizam um verbo de movimento associado a um tema estático. Como base teórica, utilizamos pressupostos da Linguística Cognitiva (TALMY, 2000; LANGACKER, 1987, 1999, 2008; FAUCONNIER, 1997; FAUCONNIER; TURNER, 2002) e dos Modelos de Gramática Baseados no Uso (LANGACKER, 1987, 1999, 2008; GOLDBERG, 1995, 2006; GOLDBERG; JACKENDOFF, 2004). Como aporte metodológico, elegemos instrumentos da Linguística de Corpus(SARDINHA, 2004; SILVA, 2008), que forneceram condições para a formação de um corpus específicos das CMF, com 536 ocorrências. A análise subsequente revelou dois padrões formais mais produtivos: (1)[XSNEYVM(ZSP)] (...o cabelo(SNE)ia(VM)até o pé(SP)) e (2) [XSNE YVM ZSN] (A artéria vertebral(SNE) (...) percorre(VM)o restante da coluna(SN)). O padrão (1), com variações, apresentou 34 tipos e 372 ocorrências; o padrão (2), com variações, 16 tipos e 164 ocorrências. Postula–se que a motivação cognitiva das CMF advémdo processo de mesclagem conceptual entreum domínio de experiência de movimento e outrorelacionado visualmente à extensão, o que promove um escaneamento visual da extensão. Essa motivação faz com que, no polo semântico–pragmático, as CMF evoquem uma matriz dominial caracterizadora de espaço físico, focalizando domínios conceptuais de área, dimensão, localização, formato, posição e direção. Pragmaticamente, possuem função descritiva, possibilitando a reconstrução mental da cena estática em questão. Quanto ao ambiente discursivo, as CMF se encontram em maior número nos gêneros ficção e acadêmico e estão relacionadas a tópicos conversacionais como anatomia, turismo, geografia, urbanismo, construção, vestuário e explicação de rotas, que têm como centro a descrição de trajetórias ou outros objetos que são conceptualizados como trajetórias.Assim, nossa análise coloca as CMF como mais um nódulo na rede de construções do PB e procura contribuir com a descrição de nova rede – a rede construcional do movimento. A análise das CMF traz à tona a atuação da mesclagem conceptual na formação de novas construções. Atesta, ainda, a relevância da abordagem da linguagem corporificada proposta pela Linguística Cognitiva e a visão da língua como inventário de construções moldadas pelo uso discursivo.
The present work aims at describing and analyzing the Fictive Motion Constructions of Brazilian Portuguese(FMC) such as “A Estrada vaiaté a praça…” and “A veiapercorretoda a extensão do braço…”. These constructions use a motion verb with a static theme. As theoretical basis we use the constructs of Cognitive Linguistics (TALMY, 2000; LANGACKER, 1987, 1999, 2008; FAUCONNIER, 1997; FAUCONNIER; TURNER, 2002) and the Usage–based Models of Grammar (LANGACKER, 1987, 1999, 2008; GOLDBERG, 1995, 2006; GOLDBERG; JACKENDOFF, 2004). For methodology, we chose Corpus Linguistics instruments (SARDINHA, 2004; SILVA, 2008) that provided conditions for the construction of a specific corpus, containing 536 examples of FM constructions. The analysisledustotwomain formal patterns: (1) [XNPSYVM (ZPP)] (...o cabelo(NPS)ia(VM)até o pé(PP)) e (2) [XNPS YVM ZNP] (A artéria vertebral(NPS) (...) percorre(VM)o restante da coluna(NP)). The first one and its variations presented 34 types and 372 occurrences; the second one, and its variations, 16 types and 164 occurrences. It’s assumed thatCMFs cognitive motivation comes from conceptual blending processes which integrate an experience of motion domain to a visual domain related to the extension described. This integration promotes a visual scanning of this extension. The conceptual motivation allows the FMC to evocate, in its semantic–pragmatic pole, a space qualifier conceptual matrix which focuses on area, dimension, location, shape, position and direction domains. In pragmatic dimension, FM constructions have descriptive function and make possible the mental reconstruction of static scenes. About discursive environment, we found great number of FMC in genres academic and fiction. They are also related to conversational topics such as anatomy, tourism, geography, urbanism, construction, clothing and routes explanations, because these topics have, as its central subject, trajectories or extensions conceptualized as trajectories. Therefore, our analysis locates FMC as a specific construction standard inside the construction network of Brazilian Portuguese. Besides, our work aims at contributing for the description of a new construction network, related to movement verbs. The analysis of FMC brings out the role of conceptual blending at new constructions building. It also attests the relevance of Cognitive Linguistics embodied language approachand the vision of language as an inventory of constructions shaped in discourse.
APA, Harvard, Vancouver, ISO, and other styles
38

Celebi, Hatice. "Extracting And Analyzing Impoliteness In Corpora A Study Based On Thebritish National Corpus And The Spoken Turkish Corpus." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615309/index.pdf.

Full text
Abstract:
This study aims to focus on extracting and analyzing impoliteness in corpora in British English and Turkish retrieved from two different corpora British National Corpus (BNC) and Spoken Turkish Corpus (STC), which is under construction. It focuses on conversation as genre in spoken interaction and discusses issues related to impoliteness in a corpus driven linguistics (CDL) approach. It proposes two levels
extraction and analysis. Within the CDL framework, the theory or model of impoliteness behind the analysis will be forced by the findings gathered from the extraction of impoliteness. At the extraction level, among the spoken texts in both in BNC and the databases of STC, for the purposes of this study, dialogues that include a conflict or an offending event will be selected. In order to select such dialogues, various methods will be applied. First, spoken texts will be scanned through an initial word query, collocation query, question sentences and tags query, query for imperatives and possible queries that allow for searching for prosodic nuances, as well as interruptions and overlaps to the extent the corpora and the focus of the study allow. Second, metapragmatics comments, conventionalized impoliteness formulae, cues for non-conventionalized implicational impoliteness,conversational patterns, and other cues such as semantic prosody coming into play in the co-text and context are taken into consideration. Once the selection is completed, the insights gathered from the extracted instances of impoliteness will be applied to analyze the data. Impoliteness in both languages will be examined in regards to how impoliteness is triggered, how the progression of impolite exchanges takes place, and how those instances of impoliteness are resolved. Other considerations such as context-determined impoliteness, intentionality of the speaker, and perception of the hearer will be discussed.
APA, Harvard, Vancouver, ISO, and other styles
39

Teixeira, Rosana de Barros Silva e. "Termos de (onco)mastologia: uma abordagem mediada por corpus." Pontifícia Universidade Católica de São Paulo, 2011. https://tede2.pucsp.br/handle/handle/13496.

Full text
Abstract:
Made available in DSpace on 2016-04-28T18:22:16Z (GMT). No. of bitstreams: 1 Rosana de Barros Silva e Teixeira.pdf: 7098951 bytes, checksum: b1b1d4faa3cebd7c0e58d59e7751a964 (MD5) Previous issue date: 2011-02-02
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Limited to the research field of Applied Linguistics, articulating area of multiple domains of knowledge, this research, by adding the theoretical and methodological basis of Terminology-communicational language (Communicative Theory of Terminology CTT) and Corpus Linguistics, has the purpose of achieving two goals. The first objective aims to organize a monolingual glossary (same title of the research) designed to scientific journalists. The glossary s purpose is to help these professionals make the scientific terminology understood by non-scientific ones. This initiative is based on the fact that breast cancer causes the most deaths among women in Brazil each year, about 22% of new cases are diagnosed according to Health Institute. In order to get language in use, Corpus Linguistics has been chosen to go to that specialty language by observing empirical data, i.e., in vivo perspective, from a corpus of 563,482 words, according to WordSmith Tools 3.0. To do so, taking into consideration computer softwares available to corpus text, I have decided as a second objective to check the achievement accurancy of four tools (Corpógrafo 4.0, WordSmith Tools 3.0, e-Terms and ZExtractor) in relation to index of positive-candidates (terms). As pointed data, Corpógrafo 4.0 leads this ranking, with 27.56% of accurancy, followed respectively by ZExtractor (26.05%), WordSmith Tools 3.0 (21.77%) and e-Terms (14.44%). In order to make it feasible, it was developed a methodology based on the usage of Microsoft Office Excel 2007 to filter the common candidates extracted among all tools and exclusive ones of each. This data cutting, besides offering support to results achievement, provided the recognition of this methodology as a possible resource in terms of optimizing the extraction of terminology groups, starting from processed lists by two or more programs, since all of them are limited. In this way, 237 terms obtained by unigrams were listed, among which 104 were elected to head the entries that are more relevant in terms of conception
Circunscrita ao campo de investigação da Linguística Aplicada, área articuladora de múltiplos domínios do saber, esta pesquisa, ao agregar pressupostos teórico-metodológicos da Terminologia de base linguístico-comunicacional (Teoria Comunicativa da Terminologia TCT) e da Linguística de Corpus, procurou atingir dois objetivos: o primeiro deles visa à confecção de um glossário monolíngue, cujo título é homônimo ao desta pesquisa, para jornalistas científicos, uma vez que cabe a esses profissionais a tarefa de transformar em inteligível, para o público leigo, a linguagem hermética da ciência. Essa iniciativa baseia-se no fato de ser o câncer de mama o que mais provoca mortes entre as mulheres no Brasil a cada ano, cerca de 22% de novos casos são constatados, segundo o Ministério da Saúde. A fim de partir da língua em uso, a Linguística de Corpus foi escolhida para aceder a essa linguagem de especialidade por meio da observação empírica dos dados, ou seja, numa perspectiva in vivo, a partir de um corpus de 563.482 palavras, segundo o programa WordSmith Tools 3.0. Para tanto, tendo em vista alguns dos programas computacionais disponíveis para processamento de corpus textual, estabeleci, como segundo objetivo, a verificação da acuidade de quatro dessas ferramentas (Corpógrafo 4.0, WordSmith Tools 3.0, e-Termos e ZExtractor) no que tange ao índice de acerto de termos, propriamente, isto é, almejei saber qual delas era mais eficiente na extração de candidatos verdadeiro-positivos. Conforme indicam os dados, o Corpógrafo 4.0 lidera esse ranking, com 27,56% de acerto, seguido, respectivamente, pelo ZExtractor (26,05%), WordSmith Tools 3.0 (21,77%) e e-Termos (14,44%). Com vistas a tornar factível o exame dos candidatos, posto que o total de dados obtidos com as listas geradas pelos programas abrangia milhares de palavras (mais de 10 mil), foi desenvolvida uma metodologia com o auxílio do Microsoft Office Excel 2007 para filtragem dos candidatos comuns entre todas as ferramentas e exclusivos de cada uma. Esse recorte nos dados, além de oferecer subsídios para obtenção dos resultados, propiciou o reconhecimento dessa metodologia como um recurso possivelmente viável, no sentido de otimizar a extração de conjuntos terminológicos a partir de listas processadas por dois ou mais programas, já que, como apontou a análise dos resultados, todos mostraram limitações. Dessa forma, 237 termos, obtidos por meio de unigramas (uma lexia), foram elencados, dentre os quais 104 foram eleitos para encabeçar os verbetes que integram o glossário devido à relevância conceitual que demonstraram comportar
APA, Harvard, Vancouver, ISO, and other styles
40

Haertel, Robbie A. "MayanWiki : an online, consensus-based linguistic corpus of the Mayan hieroglyphs /." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2212.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Crapo, Robert Nishan. "Pun Strategies Across Joke Schemata: A Corpus-Based Study." BYU ScholarsArchive, 2018. https://scholarsarchive.byu.edu/etd/6739.

Full text
Abstract:
In the linguistic study of humor, research has largely been centered around the formulation of models and theories or the dissecting and categorization of jokes. Because of the often difficult-to-categorize aspects of verbal jokes, much time has been spent trying to create taxonomies for humor types and mechanisms. Linguists such as Raskin and Attardo have sought to categorize all verbal humor according to various functional elements (Attardo & Raskin, 1991). Such elements include, but are not limited to, the logical mechanism that drives the humor in the joke or the situation where the joke takes place. These categorizations are helpful in understanding the potential components of a given joke. However, relatively few studies have sought to quantify and qualify the distribution of these components across real-world data. This study seeks to understand the distribution of some of these categorizations laid out by Raskin and Attardo across joke topics, namely pun wordplay and narrative strategy. To do this, an original 100,000 word joke corpus was designed and compiled consisting of four joke topics: Marriage, Politics, Animals, and Food. Through some manual sorting and Python programming, jokes were labeled according to wordplay strategy and narrative structure. A subsequent statistical analysis was carried out to determine whether there exists a pattern of specific joke strategies when dealing with children's humor versus adult humor.
APA, Harvard, Vancouver, ISO, and other styles
42

Hnin, Tun San San. "Discourse marking in Burmese and English : a corpus-based approach." Thesis, University of Nottingham, 2006. http://eprints.nottingham.ac.uk/11963/.

Full text
Abstract:
This study is a comparative analysis of discourse marking systems in Burmese and in English, using a corpus-based approach within the framework of discourse analysis. The focus of this study is a set of lexical items in a particular word class called 'particles' in Burmese, which lack one-to-one equivalents in English and are characterized by highly context-dependent semantic values. Unlike traditional comparative studies involving less commonly studied languages that tend to base their analyses on the model of well-established linguistic systems such as English, this study is Burmese-originated. It starts out with an identification of discourse functions typically associated with high frequency Burmese particles, and their equivalent realisations in English are subsequently identified. Findings indicate that Burmese particles share common cross-linguistic characteristics of discourse markers as described in the current literature. The data offers clear evidence that discourse functions of Burmese particles investigated are commonly found in spoken English, but they are not realised through the same discourse marking system. This study therefore calls for a more effective comparative methodology that can compare syntactically-oriented discourse marking systems more effectively with lexically-oriented ones, such as in the case of Burmese and English respectively. Last but not least, this study also challenges the notion of 'word' as a unit of analysis for a corpus-based approach, as the notion of word cannot be easily defined in a syllabic language such as Burmese.
APA, Harvard, Vancouver, ISO, and other styles
43

Plappert, Gary Lee. "Phraseology and epistemology in scientific writing : a corpus-driven approach." Thesis, University of Birmingham, 2012. http://etheses.bham.ac.uk//id/eprint/3884/.

Full text
Abstract:
This thesis uses the tools and methods of corpus linguistics to study the process of knowledge encoding in a corpus of texts from the scientific discipline of genetics. It is argued here that the approach taken fits into the tradition of corpus-driven approaches to linguistic questions in that no assumption is made about the linguistic form that this knowledge encoding will take. Instead the study proceeds by identifying a set of keywords using the concept of lexical chains to identify items of terminology. The investigation of these uses the cluster function of WordSmith Tools (Scott 2004) and is qualitative, following Sinclair (1991; 2004) in attempting to develop a picture of the typical linguistic nature of the patterns surrounding these clusters inductively through a process of studying collocation and colligation patterns and identifying phraseology. It is argued here that such an approach is required to discover linguistic aspects of epistemic encoding that have as yet not been identified by those working in the related fields of discourse analysis or corpus linguistics.
APA, Harvard, Vancouver, ISO, and other styles
44

He, Yuan William. "A corpus-assisted study on modal verbs in consecutive interpreting." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3953519.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Silveira, Gustavo Estef Lino da. "Análise de quadrigramas na escrita em inglês como língua estrangeira: um estudo baseado em corpus." Universidade do Estado do Rio de Janeiro, 2014. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=6869.

Full text
Abstract:
O presente estudo tem como objetivo geral traçar um perfil das escolhas léxico-gramaticais da escrita em inglês de um grupo de aprendizes brasileiros na cidade do Rio de Janeiro, ao longo dos anos de 2009 a 2012, através da análise de sua produção de quadrigramas (ou blocos de quatro itens lexicais usados com frequência por vários aprendizes) em composições escritas como parte da avaliação final de curso. Como objetivo específico, a pesquisa pretendeu analisar se os quadrigramas produzidos estavam dentre aqueles que haviam sido previamente ensinados para a execução da redação ou se pertenceriam a alguma outra categoria, isto é, quadrigramas já incorporados ao uso da língua ou quadrigramas errôneos usados com abrangência pela população investigada. Para tal, foram coletadas composições escritas por aprendizes de mesmo nível de proficiência de várias filiais de um mesmo curso livre de inglês na cidade do Rio de Janeiro. Em seguida, essas composições foram digitadas e anotadas para constituírem um corpus digital facilmente identificável em termos do tipo e gênero textual, perfil do aprendiz, filial e área de origem do Rio de Janeiro. O estudo faz uso de preceitos e métodos da Linguística de Corpus, área da Linguística que compila grandes quantidades de textos e deles extrai dados com o auxílio de um programa de computador para mapear uso, frequência, distribuição e abrangência de determinados fenômenos linguístico ou discursivo. O resultado demonstra que os aprendizes investigados usaram poucos quadrigramas ensinados e, coletivamente, preferiram usar outros que não haviam sido ensinados nas aulas específicas para o nível cursado. O estudo também demonstrou que quando o gênero textual faz parte de seu mundo pessoal, os aprendizes parecem utilizar mais quadrigramas previamente ensinados. Isto pode querer dizer que o gênero pode influenciar nas escolhas léxico-gramaticais corretas. O estudo abre portas para se compreender a importância de blocos léxico-gramaticais em escrita em L2 como forma de assegurar fluência e acuracidade no idioma e sugere que é preciso proporcionar maiores oportunidades de prática e conscientização dos aprendizes quanto ao uso de tais blocos
This study seeks to trace the profile of lexico-grammatical choices of a group of apprentice writers in the city of Rio de Janeiro, between 2009 and 2012. To this end it analyses the apprentices production of 4-grams (or rather blocks of four lexical items used with relative frequency by a number of apprentices) in written compositions, as part of their final assessment. Specifically, the research aimed to analyse whether the 4-grams produced by the apprentices had been taught previously as part of their composition lessons or whether they belonged to some other category. In other words, namely 4-grams already internalized as part of their language use of erroneous 4-grams used frequently and extensively by the subjects investigated. Thus, compositions written by apprentices at the same proficiency level were collected at various branches of a private English school in the city of Rio de Janeiro. Subsequently, these compositions were typed and tagged in order to compile a digital corpus easily identified in terms of type and textual genre, apprentice profile, branch and area of the city of Rio de Janeiro. The study makes use of precepts and methods of Corpus Linguistics, an area of Linguistics that collects large quantities of texts and from them extracts data with the help of a computer programme in order to map use, frequency, distribution and range of a certain linguistic or discursive phenomena. The results demonstrate that the apprentices studied made little use of 4-grams that had been taught them and, collectively, they preferred to use other n-grams that had not been taught in the specific lessons of the level. The study has also shown that when the textual genre is part of ones personal life, the apprentices seem to make use of more previously taught 4-grams. This may lead to believe that the genre may influence the choice of correct lexico-grammatical items. The study creates a research space for the understanding of the importance of lexico-grammatical chunks in L2 writing as a means of ensuring fluency and accuracy in the target language. In addition, it also suggests that more opportunities of practice should be offered to learners so that they become aware of the use of such chunks
APA, Harvard, Vancouver, ISO, and other styles
46

Lüdeling, Anke. "Heterogeneity and standardization in data, use, and annotation : a diachronic corpus of german." Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2006/864/.

Full text
Abstract:
This paper describes the standardization problems that come up in a diachronic corpus: it has to cope with differing standards with regard to diplomaticity, annotation, and header information.
Such highly het-erogeneous texts must be standardized to allow for comparative re-search without (too much) loss of information.
APA, Harvard, Vancouver, ISO, and other styles
47

Li, Lu. "Copular and complex-transitive constructions in modern written English : a corpus-based study." Thesis, Lancaster University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334660.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Stewart, Miranda Mary. "Personal reference and politeness strategies in French and Spanish : a corpus-based approach." Thesis, Heriot-Watt University, 1992. http://hdl.handle.net/10399/1508.

Full text
Abstract:
The aim of this thesis is to examine personal pronominal reference in two lang1;5ges, French and Spanish, from an interactional perspective. Brown and Levinson's (1978, 1987) 'Politeness theory' seeks to provide an explanation for much of the mismatch between what is 'said' and what Is 'implicated' in spoken discourse. One area of speech where this mismatch is particularly evident is that of personal reference where extralinguistic information is paramount in its use and interpretation. While previous approaches to this area have sought to assign one interpretation to a given pronominal use, this study seeks to show how speakers and hearers can exploit a multiplicity of potential values in the interest of faceprotection. Based on 5 qualitative methodology derived from the field of linguistic pragmatics applied to a corpus of naturally-oc:urring data of speech situations where there is threat to the face of speakers and hearers, this study will argue that the contextual factors of power and status as well as a knowledge of linguistic politeness itself are of crucial :mportance in the use and interpretation of persmal reference.
APA, Harvard, Vancouver, ISO, and other styles
49

Terry, Devon K. "Linguistics of Russian Media During the 2016 US Election: A Corpus-Based Study." BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/9154.

Full text
Abstract:
The purpose of this study is to perform a linguistic analysis of Russian mass media focused on its coverage of the 2016 US presidential election. It will be a corpus-based study, using a corpus as a foundational source for quantitative and qualitative data. This study will use a collection of keywords from the corpus and analyze their contexts as they pertain to Hillary Clinton and Donald Trump. This study uses corpus linguistic research tools such as sentence tokenization, Key Words in Context (KWIC), sentiment analysis, word embedding visualization, word-vector math, word frequency lists, and collocate analysis as part of the quantitative analysis. The results of the sentiment analysis and word vector analysis show a moderate bias in the corpus favoring Donald Trump. Additionally, a more in-depth qualitative analysis of sentences containing keywords is performed. A framework using Appraisal Theory is used to examine sample sentences to show how the corpus appraises the candidates. The qualitative analysis shows how many sentences are full of judgment towards Hillary Clinton, positive appraisal of Donald Trump, and attempts to expand positive dialog about Donald Trump, as opposed to a contraction of dialog and expansion of negativity about Hillary Clinton. The predicted Russian geopolitical agenda seeks to demean American politics, positively influence perceptions of Russians towards Vladimir Putin, and support Donald Trump insofar as his policies align with Russia's goals.
APA, Harvard, Vancouver, ISO, and other styles
50

Tolle, Kristin M. "Domain-independent semantic concept extraction using corpus linguistics, statistics and artificial intelligence techniques." Diss., The University of Arizona, 2003. http://hdl.handle.net/10150/280502.

Full text
Abstract:
For this dissertation two software applications were developed and three experiments were conducted to evaluate the viability of a unique approach to medical information extraction. The first system, the AZ Noun Phraser, was designed as a concept extraction tool. The second application, ANNEE, is a neural net-based entity extraction (EE) system. These two systems were combined to perform concept extraction and semantic classification specifically for use in medical document retrieval systems. The goal of this research was to create a system that automatically (without human interaction) enabled semantic type assignment, such as gene name and disease, to concepts extracted from unstructured medical text documents. Improving conceptual analysis of search phrases has been shown to improve the precision of information retrieval systems. Enabling this capability in the field of medicine can aid medical researchers, doctors and librarians in locating information, potentially improving healthcare decision-making. Due to the flexibility and non-domain specificity of the implementation, these applications have also been successfully deployed in other text retrieval experimentation for law enforcement (Atabakhsh et al., 2001; Hauck, Atabakhsh, Ongvasith, Gupta, & Chen, 2002), medicine (Tolle & Chen, 2000), query expansion (Leroy, Tolle, & Chen, 2000), web document categorization (Chen, Fan, Chau, & Zeng, 2001), Internet spiders (Chau, Zeng, & Chen, 2001), collaborative agents (Chau, Zeng, Chen, Huang, & Hendriawan, 2002), competitive intelligence (Chen, Chau, & Zeng, 2002), and Internet chat-room data visualization (Zhu & Chen, 2001).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography