Dissertations / Theses on the topic 'Corpus-based data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 17 dissertations / theses for your research on the topic 'Corpus-based data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Nolli, Carla Fernanda. "Data-driven learning and corpus-based approaches in language education." Florianópolis, SC, 2006. http://repositorio.ufsc.br/xmlui/handle/123456789/88465.
Full textMade available in DSpace on 2012-10-22T09:21:53Z (GMT). No. of bitstreams: 0
This study focuses on the analysis of conditional sentences examples found in teaching materials (textbooks and grammar books) and compares them with a large corpus in order to verify their frequency and authenticity. In order to do so, the comparison was carried out with the help of a corpus analysis software, which generated a concordance list of the word if. These tokens were analyzed and classified in order to distinguish the three types of conditional sentences studied in this thesis. One of the purposes of this research is also to shed light on an approach that still remains largely unexplored in Brazil, namely Data-Driven Learning (DDL), which explores teaching and learning through corpus linguistics. Este estudo se concentra na análise de exemplos de sentenças condicionais em materiais de ensino (livros textos e gramáticas) e compara-os com um corpus lingüístico a fim de verificar sua freqüência e autenticidade. Para isso, a comparação foi realizada com a ajuda de um software de análise de corpus, que gerou uma lista de concordâncias com a palavra if. Todos os exemplos foram analisados e classificados a fim de detectar os três tipos de sentenças condicionais estudadas nesta dissertação. Um dos objetivos desta pesquisa é também dar ênfase a uma metodologia que ainda permanece muito inexplorada no Brasil, chamada de Aprendizagem a Partir de Dados, que explora o ensino e a aprendizagem através de lingüística de corpus.
Adolphs, Svenja. "Linking lexico-grammar and speech acts : a corpus-based approach." Thesis, University of Nottingham, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.391412.
Full textMarchewka, Katarzyna M. "Gender agreement in Polish : a study based on elicitation and corpus data." Thesis, University of Surrey, 2016. http://epubs.surrey.ac.uk/809946/.
Full textWang, Lixum. "The use of parallel texts in language learning : computer software and teaching materials for English and Chinese." Thesis, University of Birmingham, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368990.
Full textZhang, Min, and 張珉. "Using corpus data in a MOODLE-based self-learning course : teaching education students to 'cite like an academic'." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2015. http://hdl.handle.net/10722/211141.
Full textpublished_or_final_version
Education
Doctoral
Doctor of Philosophy
Tsiros, Augoustinos. "A multidimensional sketching interface for visual interaction with corpus-based concatenative sound synthesis." Thesis, Edinburgh Napier University, 2016. http://researchrepository.napier.ac.uk/Output/463438.
Full textVieira, Nataliya Godinho Soares. "Training and discovering corpus-based data driven exercices in english teaching (L2/FL) to native speakers of portuguese (L1)." Master's thesis, Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa, 2012. http://hdl.handle.net/10362/7422.
Full textConsiderando o rápido desenvolvimento das novas tecnologias e o seu uso no ensino de línguas estrangeiras, Linguística de Corpus oferece novas ferramentas e materiais que enriquecem a aprendizagem de uma segunda língua. Este projecto apresenta um quadro de princípios teóricos relacionados com os corpora online e propõe os exemplos de training e discovering corpus-based data-driven exercícios, que são uma contribuição original para o ensino/aprendizagem de Inglês (L2) aos falantes nativos da língua Portuguesa (L1). Os data-driven exercícios, com base em concordâncias extraídas de corpora, proporcionam um ensino-descoberta e envolvem os alunos numa "aprendizagemdescoberta", enriquecendo, deste modo, o desenvolvimento pessoal dos professores e dos alunos. Múltiplas são as finalidades pedagógicas deste projecto relacionadas com a utilização da data-driven learning (DDL) abordagem assim como a aplicação dos recursos baseados em TIC no ensino/aprendizagem das línguas estrangeiras.
Garcia, William Danilo. "Fanfictions, linguística de corpus e aprendizagem direcionada por dados : tarefas de produção escrita com foco no uso autêntico de língua e atividades que visam à autonomia dos alunos de letras em analisar preposições /." São José do Rio Preto, 2020. http://hdl.handle.net/11449/192699.
Full textResumo: A relação da Linguística de Corpus com o Ensino de Línguas, apesar de receber foco mesmo antes do advento dos computadores, se intensificou por volta da década de 90, momento em que pesquisas em corpora de aprendizes e em Aprendizagem Direcionada por Dados foram enfatizadas. Considerado esse estreitamento, esta pesquisa objetiva compilar quatro corpora de aprendizes a partir do uso autêntico da língua com o intuito de desenvolver atividades didáticas direcionadas por dados dos próprios alunos que promovam nos discentes um perfil autônomo de investigação linguística (mais precisamente das preposições with, in, on, at, for e to). No tocante à fundamentação teórica, destacam-se Prabhu (1987), Skehan (1996), Willis (1996), Nunan (2004) e Ellis (2006) a respeito do Ensino de Línguas por Tarefas, Jenkins (2012) e Neves (2014) que discorrem sobre as ficções de fã. Já sobre a Linguística de Corpus, tem-se Sinclair (1991), Berber Sardinha (2000) e Viana (2011). Granger (1998, 2002, 2013) mais relacionado a Corpus de Aprendizes, e Johns (1991, 1994), Berber Sardinha (2011) e Boulton (2010) no que diz respeito à Aprendizagem Direcionada por Dados. Como metodologia, levantaram-se textos escritos pelos alunos a partir de uma tarefa de produção escrita em que eles redigiram uma ficção de fã. Em seguida, esses textos formaram dois corpora de aprendizes iniciais, que foram analisados com o auxílio da ferramenta AntConc (ANTHONY, 2018) no intuito de observar a presença ou não de inadequações ... (Resumo completo, clicar acesso eletrônico abaixo)
Abstract: Although the relation between Corpus Linguistics and Language Teaching has been emphasized even before the advent of computers, it has been highlighted around the 90s. This was the moment when research on learner corpora and Data-Driven Learning was focused. Having said that, this study aimed to compile four learner corpora based on the authentic use of the language. This was done in order to develop data-driven teaching activities that could promote, among the students, an autonomous profile of linguistic investigation (more precisely about the prepositions with, in, on, at, for and to). Concerning the existing literature, we highlight the works of Prabhu (1987), Skehan (1996), Willis (1996), Nunan (2004) and Ellis (2006) about Task-Based Language Teaching, and Jenkins (2012) and Neves (2014) about fanfictions. In relation to Corpus Linguistics, this study is based on Sinclair (1991), Berber Sardinha (2000) and Viana (2011). Granger (1998, 2012, 2013) is referenced to define learner corpora, and Johns (1991, 1994), Berber Sardinha (2011) and Boulton (2010) to discuss Data-Driven Learning. The methodological approach involved the collection of the compositions from Language Teaching undergraduate students who developed a writing task in which they had to write a fanfiction. These texts composed two learner corpora, which were analyzed with the AntConc tool (ANTHONY, 2018) with the purpose of observing the occurrence of prepositions in English and whether they were accurately ... (Complete abstract click electronic access below)
Mestre
Gentilini, Livia. "La terminologia della sicurezza informatica nella banca dati FranceTerme: un'analisi corpus-based." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17696/.
Full textGhisi, Daniele. "Music across music : towards a corpus-based, interactive computer-aided composition." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066561/document.
Full textThe reworking of existing music in order to build new one is a quintessential characteristic of the Western musical tradition. This thesis proposes and discusses my personal approach to the subject: the borrowing of music fragments from large-scale corpora (containing audio samples as well as symbolic scores) in order to build a low-level, descriptor-based palette of grains. Parameters are handled via digital hybrid scores, in order to equip corpus-based composition with the control of notational practices. This thesis also introduces the dada library, providing Max with the ability to organize, select and generate musical content via a set of graphical interfaces manifesting an exploratory approach towards music composition. Its modules address a range of scenarios, including, but not limited to, database visualization, score segmentation and analysis, concatenative synthesis, music generation via physical or geometrical modelling, wave terrain synthesis, graph exploration, cellular automata, swarm intelligence, and videogames. The library is open-source and it fosters a performative approach to computer-aided composition. Finally, this thesis addresses the issue of whether classical representation of music, disentangled in the standard set of traditional parameters, is optimal. Two possible alternatives to orthogonal decompositions are presented: grain-based score representations, inheriting techniques from corpus-based composition, and unsupervised machine learning models, providing entangled, `agnostic' representations of music. The thesis also details my first experience of collaborative writing within the /nu/thing collective
Kalledat, Tobias. "Tracking domain knowledge based on segmented textual sources." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2009. http://dx.doi.org/10.18452/15925.
Full textThe research work available here has the goal of analysing the influence of pre-processing on the results of the generation of knowledge and of giving concrete recommendations for action for suitable pre-processing of text corpora in TDM. The research introduced here focuses on the extraction and tracking of concepts within certain knowledge domains using an approach of horizontally (timeline) and vertically (persistence of terms) segmenting of corpora. The result is a set of segmented corpora according to the timeline. Within each timeline segment clusters of concepts can be built according to their persistence quality in relation to each single time-based corpus segment and to the whole corpus. Based on a simple frequency measure it can be shown that only the statistical quality of a single corpus allows measuring the pre-processing quality. It is not necessary to use comparison corpora. The time series of the frequency measure have significant negative correlations between the two clusters of concepts that occur permanently and others that vary within an optimal pre-processed corpus. This was found to be the opposite in every other test set that was pre-processed with lower quality. The most frequent terms were grouped into concepts by the use of domain-specific taxonomies. A significant negative correlation was found between the time series of different terms per yearly corpus segments and the terms assigned to taxonomy for corpora with high quality level of pre-processing. A semantic analysis based on a simple TDM method with significant frequency threshold measures resulted in significant different knowledge extracted from corpora with different qualities of pre-processing. With measures introduced in this research it is possible to measure the quality of applied taxonomy. Rules for the measuring of corpus as well as taxonomy quality were derived from these results and advice suggested for the appropriate level of pre-processing.
Utgof, Darja. "The Perception of Lexical Similarities Between L2 English and L3 Swedish." Thesis, Linköping University, Department of Culture and Communication, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15874.
Full textThe present study investigates lexical similarity perceptions by students of Swedish as a foreign language (L3) with a good yet non-native proficiency in English (L2). The general theoretical framework is provided by studies in transfer of learning and its specific instance, transfer in language acquisition.
It is accepted as true that all previous linguistic knowledge is facilitative in developing proficiency in a new language. However, a frequently reported phenomenon is that students see similarities between two systems in a different way than linguists and theoreticians of education do. As a consequence, the full facilitative potential of transfer remains unused.
The present research seeks to shed light on the similarity perceptions with the focus on the comprehension of a written text. In order to elucidate students’ views, a form involving similarity judgements and multiple choice questions for formally similar items has been designed, drawing on real language use as provided by corpora. 123 forms have been distributed in 6 groups of international students, 4 of them studying Swedish at Level I and 2 studying at Level II.
The test items in the form vary in the degree of formal, semantic and functional similarity from very close cognates, to similar words belonging to different word classes, to items exhibiting category membership and/or being in subordinate/superordinate relation to each other, to deceptive cognates. The author proposes expected similarity ratings and compares them to the results obtained. The objective measure of formal similarity is provided by a string matching algorithm, Levenshtein distance.
The similarity judgements point at the fact that intermediate similarity values can be considered problematic. Similarity ratings between somewhat similar items are usually lower than could be expected. Besides, difference in grammatical meaning lowers similarity values significantly even if lexical meaning nearly coincides. Thus, the obtained results indicate that in order to utilize similarities to facilitate language learning, more attention should be paid to underlying similarities.
Hong, Shinchul. "The pedgogical use of corpus date based on two case studies:the Dong-A for Korean Learners and the Chemnitz for German Learners." Thesis, Lancaster University, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.531696.
Full text"A corpus-based induction learning approach to natural language processing." Chinese University of Hong Kong, 1996. http://library.cuhk.edu.hk/record=b5888859.
Full textThesis (Ph.D.)--Chinese University of Hong Kong, 1996.
Includes bibliographical references (leaves 163-171).
Chapter Chapter 1. --- Introduction --- p.1
Chapter Chapter 2. --- Background Study of Natural Language Processing --- p.9
Chapter 2.1. --- Knowledge-based approach --- p.9
Chapter 2.1.1. --- Morphological analysis --- p.10
Chapter 2.1.2. --- Syntactic parsing --- p.11
Chapter 2.1.3. --- Semantic parsing --- p.16
Chapter 2.1.3.1. --- Semantic grammar --- p.19
Chapter 2.1.3.2. --- Case grammar --- p.20
Chapter 2.1.4. --- Problems of knowledge acquisition in knowledge-based approach --- p.22
Chapter 2.2. --- Corpus-based approach --- p.23
Chapter 2.2.1. --- Beginning of corpus-based approach --- p.23
Chapter 2.2.2. --- An example of corpus-based application: word tagging --- p.25
Chapter 2.2.3. --- Annotated corpus --- p.26
Chapter 2.2.4. --- State of the art in the corpus-based approach --- p.26
Chapter 2.3. --- Knowledge-based approach versus corpus-based approach --- p.28
Chapter 2.4. --- Co-operation between two different approaches --- p.32
Chapter Chapter 3. --- Induction Learning applied to Corpus-based Approach --- p.35
Chapter 3.1. --- General model of traditional corpus-based approach --- p.36
Chapter 3.1.1. --- Division of a problem into a number of sub-problems --- p.36
Chapter 3.1.2. --- Solution selected from a set of predefined choices --- p.36
Chapter 3.1.3. --- Solution selection based on a particular kind of linguistic entity --- p.37
Chapter 3.1.4. --- Statistical correlations between solutions and linguistic entities --- p.37
Chapter 3.1.5. --- Prediction of the best solution based on statistical correlations --- p.38
Chapter 3.2. --- First problem in the corpus-based approach: Irrelevance in the corpus --- p.39
Chapter 3.3. --- Induction learning --- p.41
Chapter 3.3.1. --- General issues about induction learning --- p.41
Chapter 3.3.2. --- Reasons of using induction learning in the corpus-based approach --- p.43
Chapter 3.3.3. --- General model of corpus-based induction learning approach --- p.45
Chapter 3.3.3.1. --- Preparation of positive corpus and negative corpus --- p.45
Chapter 3.3.3.2. --- Statistical correlations between solutions and linguistic entities --- p.46
Chapter 3.3.3.3. --- Combination of the statistical correlations obtained from the positive and negative corpora --- p.48
Chapter 3.4. --- Second problem in the corpus-based approach: Modification of initial probabilistic approximations --- p.50
Chapter 3.5. --- Learning feedback modification --- p.52
Chapter 3.5.1. --- Determination of which correlation scores to be modified --- p.52
Chapter 3.5.2. --- Determination of the magnitude of modification --- p.53
Chapter 3.5.3. --- An general algorithm of learning feedback modification --- p.56
Chapter Chapter 4. --- Identification of Phrases and Templates in Domain-specific Chinese Texts --- p.59
Chapter 4.1. --- Analysis of the problem solved by the traditional corpus-based approach --- p.61
Chapter 4.2. --- Phrase identification based on positive and negative corpora --- p.63
Chapter 4.3. --- Phrase identification procedure --- p.64
Chapter 4.3.1. --- Step 1: Phrase seed identification --- p.65
Chapter 4.3.2. --- Step 2: Phrase construction from phrase seeds --- p.65
Chapter 4.4. --- Template identification procedure --- p.67
Chapter 4.5. --- Experiment and result --- p.70
Chapter 4.5.1. --- Testing data --- p.70
Chapter 4.5.2. --- Details of experiments --- p.71
Chapter 4.5.3. --- Experimental results --- p.72
Chapter 4.5.3.1. --- Phrases and templates identified in financial news articles --- p.72
Chapter 4.5.3.2. --- Phrases and templates identified in political news articles --- p.73
Chapter 4.6. --- Conclusion --- p.74
Chapter Chapter 5. --- A Corpus-based Induction Learning Approach to Improving the Accuracy of Chinese Word Segmentation --- p.76
Chapter 5.1. --- Background of Chinese word segmentation --- p.77
Chapter 5.2. --- Typical methods of Chinese word segmentation --- p.78
Chapter 5.2.1. --- Syntactic and semantic approach --- p.78
Chapter 5.2.2. --- Statistical approach --- p.79
Chapter 5.2.3. --- Heuristic approach --- p.81
Chapter 5.3. --- Problems in word segmentation --- p.82
Chapter 5.3.1. --- Chinese word definition --- p.82
Chapter 5.3.2. --- Word dictionary --- p.83
Chapter 5.3.3. --- Word segmentation ambiguity --- p.84
Chapter 5.4. --- Corpus-based induction learning approach to improving word segmentation accuracy --- p.86
Chapter 5.4.1. --- Rationale of approach --- p.87
Chapter 5.4.2. --- Method of constructing modification rules --- p.89
Chapter 5.5. --- Experiment and results --- p.94
Chapter 5.6. --- Characteristics of modification rules constructed in experiment --- p.96
Chapter 5.7. --- Experiment constructing rules for compound words with suffixes --- p.98
Chapter 5.8. --- Relationship between modification frequency and Zipfs first law --- p.99
Chapter 5.9. --- Problems in the approach --- p.100
Chapter 5.10. --- Conclusion --- p.101
Chapter Chapter 6. --- Corpus-based Induction Learning Approach to Automatic Indexing of Controlled Index Terms --- p.103
Chapter 6.1. --- Background of automatic indexing --- p.103
Chapter 6.1.1. --- Definition of index term and indexing --- p.103
Chapter 6.1.2. --- Manual indexing versus automatic indexing --- p.105
Chapter 6.1.3. --- Different approaches to automatic indexing --- p.107
Chapter 6.2. --- Corpus-based induction learning approach to automatic indexing --- p.109
Chapter 6.2.1. --- Fundamental concept about corpus-based automatic indexing --- p.110
Chapter 6.2.2. --- Procedure of automatic indexing --- p.111
Chapter 6.2.2.1. --- Learning process --- p.112
Chapter 6.2.2.2. --- Indexing process --- p.118
Chapter 6.3. --- Experiments of corpus-based induction learning approach to automatic indexing --- p.118
Chapter 6.3.1. --- An experiment evaluating the complete procedures --- p.119
Chapter 6.3.1.1. --- Testing data used in the experiment --- p.119
Chapter 6.3.1.2. --- Details of the experiment --- p.119
Chapter 6.3.1.3. --- Experimental result --- p.121
Chapter 6.3.2. --- An experiment comparing with the traditional approach --- p.122
Chapter 6.3.3. --- An experiment determining the optimal indexing score threshold --- p.124
Chapter 6.3.4. --- An experiment measuring the precision and recall of indexing performance --- p.127
Chapter 6.4. --- Learning feedback modification --- p.128
Chapter 6.4.1. --- Positive feedback --- p.129
Chapter 6.4.2. --- Negative feedback --- p.131
Chapter 6.4.3. --- Change of indexed proportions of positive/negative training corpus in feedback iterations --- p.132
Chapter 6.4.4. --- An experiment evaluating the learning feedback modification --- p.134
Chapter 6.4.5. --- An experiment testing the significance factor in merging process --- p.136
Chapter 6.5. --- Conclusion --- p.138
Chapter Chapter 7. --- Conclusion --- p.140
Appendix A: Some examples of identified phrases in financial news articles --- p.149
Appendix B: Some examples of identified templates in financial news articles --- p.150
Appendix C: Some examples of texts containing the templates in financial news articles --- p.151
Appendix D: Some examples of identified phrases in political news articles --- p.152
Appendix E: Some examples of identified templates in political news articles --- p.153
Appendix F: Some examples of texts containing the templates in political news articles --- p.154
Appendix G: Syntactic tags used in word segmentation modification rule experiment --- p.155
Appendix H: An example of semantic approach to automatic indexing --- p.156
Appendix I: An example of syntactic approach to automatic indexing --- p.158
Appendix J: Samples of INSPEC and MEDLINE Records --- p.161
Appendix K: Examples of Promoting and Demoting Words --- p.162
References --- p.163
Mak, King Tong. "The dynamics of collocation: a corpus-based study of the phraseology and pragmatics of the introductory-it construction." Thesis, 2005. http://hdl.handle.net/2152/1776.
Full textVyatkina, Nina A. "Development of second language pragmatic competence the data-driven teaching of German modal particles based on a learner corpus /." 2007. http://www.etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-1928/index.html.
Full textYANG, SHU-HUI, and 楊淑惠. "An Investigation into the Acquisition of Japanese Giving and Receiving Expressions-Based on the LARP at SCU Corpus Data-." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/96861692541568529274.
Full text東吳大學
日本語文學系
104
In Japanese, the giving and receiving expression is a complicated part that included modestly and cautiously of Japanese interpersonal relationship and could cause misunderstanding merely by misusage. This study is conducted mainly based on the LARP at SCU Corpus Data and focused on Japanese auxiliary verb for giving and receiving to examine the process of acquisition of this kind by Taiwanese learners. This study consists of 5 chapters. The first chapter is introduction. Many previous researches which inspired the author to parse from different aspects are mentioned in second chapter. The third chapter is an overall introduction regarding the history of LARP at SCU Corpus Data, the definition of related terminologies and the methodology in this study. A conclusion can be found in the fifth chapter. Following are the analysis findings. 1. Category “(TE) KURERU” “(TE) AGERU” and “(TE) MORAU” are the most misusage items. 2. Using Auxiliary verbs are more difficult than verbs, this implies more attention. 3. The ratio of pre-test and post-test is 1:3.5, this justifies the vital role the tutor plays. 4. The usage ability of giving and receiving terminology strongly related with the inadequate knowledge of Japanese language and culture. To conclude, the acquisition of the giving and receiving expression will be more effective for students if the teacher can give them more guidance and advice.