Log in

Relevant bibliographies by topics / Corpora analysis / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Corpora analysis.

Dissertations / Theses on the topic 'Corpora analysis'

Author: Grafiati

Published: 4 June 2025

Last updated: 20 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Corpora analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Panteli, Maria. "Computational analysis of world music corpora." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/36696.

Full text

Abstract:

The comparison of world music cultures has been considered in musicological research since the end of the 19th century. Traditional methods from the field of comparative musicology typically involve the process of manual music annotation. While this provides expert knowledge, the manual input is timeconsuming and limits the potential for large-scale research. This thesis considers computational methods for the analysis and comparison of world music cultures. In particular, Music Information Retrieval (MIR) tools are developed for processing sound recordings, and data mining methods are considered to study similarity relationships in world music corpora. MIR tools have been widely used for the study of (mainly) Western music. The first part of this thesis focuses on assessing the suitability of audio descriptors for the study of similarity in world music corpora. An evaluation strategy is designed to capture challenges in the automatic processing of world music recordings and different state-of-the-art descriptors are assessed. Following this evaluation, three approaches to audio feature extraction are considered, each addressing a different research question. First, a study of singing style similarity is presented. Singing is one of the most common forms of musical expression and it has played an important role in the oral transmission of world music. Hand-designed pitch descriptors are used to model aspects of the singing voice and clustering methods reveal singing style similarities in world music. Second, a study on music dissimilarity is performed. While musical exchange is evident in the history of world music it might be possible that some music cultures have resisted external musical influence. Low-level audio features are combined with machine learning methods to find music examples that stand out in a world music corpus, and geographical patterns are examined. The last study models music similarity using descriptors learned automatically with deep neural networks. It focuses on identifying music examples that appear to be similar in their audio content but share no (obvious) geographical or cultural links in their metadata. Unexpected similarities modelled in this way uncover possible hidden links between world music cultures. This research investigates whether automatic computational analysis can uncover meaningful similarities between recordings of world music. Applications derive musicological insights from one of the largest world music corpora studied so far. Computational analysis as proposed in this thesis advances the state-of-the-art in the study of world music and expands the knowledge and understanding of musical exchange in the world.

APA, Harvard, Vancouver, ISO, and other styles

2

Sudhahar, Saatviga. "Automated analysis of narrative text using network analysis in large corpora." Thesis, University of Bristol, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.685924.

Full text

Abstract:

In recent years there has been an increased interest in computational social sciences, digital humanities and political sciences to perform automated quantitative narrative analysis (QNA) of text in large scale, by studying actors, actions and relations in a given narration. Social scientists have always relied on news media content to study opinion biases and extraction of socio-historical relations and events. Yet in order to perform analysis they had to face labour-intensive coding where basic narrative information was manually extracted from text and annotated by hand. This PhD thesis addresses this problem using a big-data approach based on automated information extraction using state of the art Natural Language Processing, Text mining and Artificial Intelligence tools. A text corpus is transformed into a semantic network formed of subject-verb-object (SVO) triplets, and the resulting network is analysed drawing from various theories and techniques such as graph partitioning, network centrality, assortativity, hierarchy and structural balance. Furthermore we study the position of actors in the network of actors and actions; generate scatter plots describing the subject/object bias, positive/ negative bias of each actor; and investigate the types of actions each actor is most associated with. Apart from QNA, SVO triplets extracted from text can also be used to summarize documents. Our findings are demonstrated on two different corpora containing English news articles about US elections and Crime and a third corpus containing ancieilt folklore stories from the Gutenberg Project. Amongst potentially interesting findings we found the 2012 US elections campaign was very much focused on 'Economy' and 'Rights'; and overall, the media reported more frequently positive statements for the Democrats than the Republicans. In the Crime study we found that the network identified men as frequent perpetrators, and women and children as victims, of violent crime. A network approach to text based on semantic graphs is a promising approach to analyse large corpora of texts and, by retaining relational information pertaining to actors and objects, this approach can reveal latent and hidden patterns, and therefore has relevance in the social sciences and humanities.

APA, Harvard, Vancouver, ISO, and other styles

3

Kura, Deekshit. "Categorization of Large Corpora of Malicious Software." ScholarWorks@UNO, 2013. http://scholarworks.uno.edu/td/1746.

Full text

Abstract:

Malware is computer software written by someone with mischievous or, more usually, malicious and/or criminal intent and specifically designed to damage data, hosts or networks. The variety of malware is increasing proportionally with the increase in computers and we are not aware of newly emerging malware. Tools are needed to categorize families of malware, so that analysts can compare new malware samples to ones that have been previously analyzed and determine steps to detect and prevent malware infections. In this thesis, I developed a technique to catalog and characterize the behavior of malware, so that malware families, the level of potential threat, and the effects of malware can be identified. Combinations of complementary techniques, including third-party tools, are integrated to scan and illustrate how malware may harm a target machine, search for related malware behavior, and organize malware into families, based on a number of characteristics.

APA, Harvard, Vancouver, ISO, and other styles

4

Lucas, Christopher G. "Patent semantics : analysis, search and visualization of large text corpora." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/33146.

Full text

Abstract:

Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.<br>Includes bibliographical references (leaves 47-48).<br>Patent Semantics is system for processing text documents by extracting features capturing their semantic content, and searching, clustering, and relating them by those same features. It is set apart from existing methodologies by combining a visualization scheme that integrates retrieval and clustering, providing a variety of ways to find and relate documents depending on their goals. In addition, the system provides an explanatory mechanism that makes the retrieval an understandable process rather than a black box. The domain in which the system currently works is biochemistry and molecular biology patents but it is not intrinsically constrained to any document set.<br>by Christopher G. Lucas.<br>M.Eng.and S.B.

APA, Harvard, Vancouver, ISO, and other styles

5

Ghanem, Amer G. "Identifying Patterns of Epistemic Organization through Network-Based Analysis of Text Corpora." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Gashteovski, Kiril [Verfasser], and Rainer [Akademischer Betreuer] Gemulla. "Compact open information extraction: methods, corpora, analysis / Kiril Gashteovski ; Betreuer: Rainer Gemulla." Mannheim : Universitätsbibliothek Mannheim, 2021. http://d-nb.info/123650285X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Kwan, Yu Hang. "Assessing pre-service teaching practicum: a corpus-assisted discourse analysis of field experience supervision forms." HKBU Institutional Repository, 2014. https://repository.hkbu.edu.hk/etd_oa/117.

Full text

Abstract:

This study analyses the moves, linguistic realisations and mitigation devices of four teaching practicum supervisors' cmmnents written to eighteen supervisees on fifty- four standard field experience supervision forms. Broadly speaking, the results reveal that the supervisors use evaluative adjectives, modality markers and imperatives to give praise and acknowledge good practice, identify weaknesses, and suggest improvements in relation to teaching and managing learning. As the supervision exercise can be face-threatening, the supervisors demonstrate sensitivity to redress their negative comments through such mitigation strategies as hedging, praise-criticism pairs, rhetorical questions and personal attributions, although the strengths of such devices may vary according to contextual issues. These findings enable readers to understand how the pragma-linguistic resources realise two global communicative purposes, i.e., "Assessment of Learning" and "Assessment for Learning".

APA, Harvard, Vancouver, ISO, and other styles

8

Cid, Uribe Miriam Elizabeth. "Contrastive analysis of English and Spanish intonation using computer corpora - a preliminary study." Thesis, University of Leeds, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.236136.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Sawalha, Majdi Shaker Salem. "Open-source resources and standards for Arabic word structure analysis : fine grained morphological analysis of Arabic text corpora." Thesis, University of Leeds, 2011. http://etheses.whiterose.ac.uk/2165/.

Full text

Abstract:

Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging.

APA, Harvard, Vancouver, ISO, and other styles

10

Van, Olmen Daniel. "The imperative in English and Dutch : a functional analysis in comparable and parallel corpora." Thesis, Lancaster University, 2011. http://eprints.lancs.ac.uk/66233/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, An Ni Annie. "What's the buzz? :a discursive approach to news values of Buzzfeed News." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3953570.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Boselli, Camilla. "Corpora e analisi del linguaggio politico: l'esempio di Alternative für Deutschland." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17844/.

Full text

Abstract:

Il presente elaborato si configura come analisi approfondita del linguaggio del partito tedesco Alternative für Deutschland, la cui comparsa nel 2013 ha cambiato definitivamente lo scenario politico in Germania. Data l'importanza crescente che il partito ha assunto nel corso degli ultimi anni, si pone la necessità di analizzare il linguaggio di questo partito. In particolare questa ricerca si propone di analizzare l'uso che i membri di Alternative für Deutschland fanno della lingua in Parlamento. Nel caso del presente elaborato, si tratta di uno studio dei discorsi selezionati mediante l'uso di corpora. La ricerca si pone, da un lato, l'obiettivo di fornire una panoramica dei fenomeni linguistici presenti nel corpus preso in analisi basandosi su dati statistici. Dall'altro lato l'analisi si prefigge di rintracciare le strategie discorsive e gli schemi argomentativi ricorrenti. L'approccio integrato fra metodo quantitativo e qualitativo consente di portare alla luce il vero significato dei testi garantendo in questo modo un'analisi dettagliata del discorso in esame. Il presente elaborato si suddivide in tre sezioni: nel primo capitolo verrà descritto il contesto politico-culturale in cui la ricerca si colloca: nel secondo capitolo verrà illustrata la cornice teorica da cui lo studio prende le mosse. In ultimo, nel terzo capito verrà presentata l'analisi vera e propria. Lo studio si pone come proposta di descrizione delle caratteristiche della porzione di discorso politico presa in analisi. Il presente elaborato rappresenta dunque uno spunto per ulteriori approfondimenti sul linguaggio di Alternative für Deutschland.

APA, Harvard, Vancouver, ISO, and other styles

13

Almeida, Maria Izabel de Andrade. "Prosa argumentativa em língua inglesa: um estudo contrastivo sobre advérbios em corpora digitais." Universidade do Estado do Rio de Janeiro, 2010. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=2908.

Full text

Abstract:

Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro<br>Esta pesquisa tem como objetivo principal investigar como aprendizes brasileiros de língua inglesa usam advérbios com terminação em ly no inglês escrito, e comparar ao uso que deles fazem os falantes de inglês como língua materna. Para tanto, o trabalho encontra suporte teórico e metodológico na Linguística de Corpus e fundamenta-se na área chamada de pesquisa sobre corpora de aprendizes, que se ocupa da coleta e armazenagem de dados linguísticos de sujeitos aprendizes de uma língua estrangeira, para a formação de um corpus que possa ser utilizado para fins descritivos e pedagógicos. Esta área objetiva identificar em que aspectos os aprendizes diferem ou se assemelham aos falantes nativos. Os corpora empregados na pesquisa são o corpus de estudo (Br-ICLE), contendo inglês escrito por brasileiros, compilado de acordo com o projeto ICLE (International Corpus of Learner English) e dois corpora de referência (LOCNESS e BAWE), contendo inglês escrito por falantes de inglês como língua materna. Os resultados indicam que os alunos brasileiros usam, em demasia, as categorias de advérbios que indicam veracidade, realidade e intensidade, em relação ao uso que deles fazem os falantes nativos, além de usarem esses advérbios de forma distinta. Os resultados sugerem que, além das diferenças apresentadas em termos de frequência (seja pelo sobreuso ou subuso dos advérbios), os aprendizes apresentavam combinações errôneas, ou em termos de colocados ou em termos de prosódia semântica. E finalmente a pesquisa revela que a preferência dos aprendizes por advérbios que exprimem veracidade, realidade e intensidade cria a impressão de um discurso muito assertivo. Conclui-se que as diferenças encontradas podem estar ligadas a fatores como o tamanho dos corpora, a influência da língua materna dos aprendizes, a internalização dos elementos linguísticos necessários para a produção de um texto em língua estrangeira, a falta de fluência dos aprendizes e o contexto de sala de aula nas universidades<br>This research investigates how Brazilian learners of English use adverbs ending in-ly in written English and compares their use to that of speakers of English as a mother tongue. To this end, the work resorts to Corpus Linguistics as both theoretical and methodological support. The research is based on the area called Learner Corpora Research, which deals with the collection, storage and analysis of linguistic data produced by learners of a foreign language, which can then be used for descriptive and teaching purposes. This area aims to identify ways in which learners use of the foreign language is different or similar to that of native speakers. The data used in this research are the corpus of study (Br-ICLE), containing written English produced by Brazilian learners, built according to the ICLE project (International Corpus of Learner English), as well as two reference corpora (Locness and BAWE) containing written English produced by speakers of English as a mother tongue. The results indicate that Brazilian learners overuse the categories of adverbs that indicate truth, reality and intensity in comparison to the use made by native speakers, furthermore they use these adverbs in different ways. The results also suggest that, given the differences in frequency (either by overuse or underuse of adverbs), the learners tend to misuse combinations in terms of collocates or in terms of semantic prosody. And finally, the research reveals that the preference of learners for adverbs expressing truth, reality and intensity creates the impression of very assertive voices. We conclude that these differences may be related to factors such as the size of the corpus, the influence of the learners mother tongue, the internalization of linguistic elements needed to produce a text in a foreign language or even the lack of fluency of the learners and the classroom context in the universities

APA, Harvard, Vancouver, ISO, and other styles

14

Gulati, Sankalp. "Computational approaches for melodic description in indian art music corpora." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/398984.

Full text

Abstract:

Automatically describing contents of recorded music is crucial for interacting with large volumes of audio recordings, and for developing novel tools to facilitate music pedagogy. Melody is a fundamental facet in most music traditions and, therefore, is an indispensable component in such description. In this thesis, we develop computational approaches for analyzing high-level melodic aspects of music performances in Indian art music (IAM), with which we can describe and interlink large amounts of audio recordings. With its complex melodic framework and well-grounded theory, the description of IAM melody beyond pitch contours offers a very interesting and challenging research topic. We analyze melodies within their tonal context, identify melodic patterns, compare them both within and across music pieces, and finally, characterize the specific melodic context of IAM, the rāgas. All these analyses are done using data-driven methodologies on sizable curated music corpora. Our work paves the way for addressing several interesting research problems in the field of mu- sic information research, as well as developing novel applications in the context of music discovery and music pedagogy. The thesis starts by compiling and structuring largest to date music corpora of the two IAM traditions, Hindustani and Carnatic music, comprising quality audio recordings and the associated metadata. From them we extract the predominant pitch and normalize by the tonic context. An important element to describe melodies is the identification of the meaningful temporal units, for which we propose to detect occurrences of nyās svaras in Hindustani music, a landmark that demarcates musically salient melodic patterns. Utilizing these melodic features, we extract musically relevant recurring melodic pat- terns. These patterns are the building blocks of melodic structures in both improvisation and composition. Thus, they are fundamental to the description of audio collections in IAM. We propose an unsupervised approach that employs time-series analysis tools to discover melodic patterns in sizable music collections. We first carry out an in-depth supervised analysis of melodic similarity, which is a critical component in pattern discovery. We then improve upon the best possible competing approach by exploiting peculiar melodic characteristics in IAM. To identify musically meaningful patterns, we exploit the relationships between the discovered patterns by performing a network analysis. Extensive listening tests by professional musicians reveal that the discovered melodic patterns are musically interesting and significant. Finally, we utilize our results for recognizing rāgas in recorded performances of IAM. We propose two novel approaches that jointly capture the tonal and the temporal aspects of melody. Our first approach uses melodic patterns, the most prominent cues for rāga identification by humans. We utilize the discovered melodic patterns and employ topic modeling techniques, wherein we regard a rāga rendition similar to a textual description of a topic. In our second approach, we propose the time delayed melodic surface, a novel feature based on delay coordinates that captures the melodic outline of a rāga. With these approaches we demonstrate unprecedented accuracies in rāga recognition on the largest datasets ever used for this task. Although our approach is guided by the characteristics of melodies in IAM and the task at hand, we believe our methodology can be easily extended to other melody dominant music traditions. Overall, we have built novel computational methods for analyzing several melodic aspects of recorded performances in IAM, with which we describe and interlink large amounts of music recordings. In this process we have developed several tools and compiled data that can be used for a number of computational studies in IAM, specifically in characterization of rāgas, compositions and artists. The technologies resulted from this research work are a part of several applications developed within the CompMusic project for a better description, enhanced listening experience, and pedagogy in IAM.<br>La descripció automàtica d’enregistraments musicals és crucial per interactuar amb grans volums de dades i per al desenvolupament de noves eines per a la pedagogia musical. La melodia és una faceta fonamental en la majoria de les tradicions musicals i, per tant, és un component indispensable per a la descripció automàtica d’enregistraments musicals. En aquesta tesi desenvolupem sistemes computacionals per analitzar aspectes melòdics d'alt nivell presents en la música clàssica de l’Índia (MCI), a partir dels quals descrivim i interconnectem grans quantitats d'enregistraments d'àudio. La descripció de melodies en la MCI, complexes i amb una base teòrica ben fonamentada, va més enllà de l’anàlisi estàndard de contorns de to (“pitch” en anglès), i, per tant, és un tema de recerca molt interessant i tot un repte. Analitzem les melodies dins del seu context tonal, identifiquem patrons melòdics, els comparem tant amb ells mateixos com amb altres enregistraments, i, finalment, caracteritzem el context melòdic específic de la música IAM: els rāgas. Tots els anàlisis s’han realitzat utilitzant metodologies basades en dades, amb un corpus musical de mida considerable. Iniciem la tesi recopilant la col·lecció més gran de MCI obtinguda fins al moment. Aquesta col·lecció comprèn enregistraments de qualitat amb metadades de música Hindustani i Carnatic, les dues grans tradicions de la MCI. A partir d’aquí analitzem el to predominant i normalitzem la peça pel context tonal. Un element important per a descriure melodies és la identificació d’unitats temporals rellevants, per la qual cosa detectem les ocurrències de nyās svaras en la MCI, que serveixen com a marques identificadores dels patrons melòdics més destacats. Utilitzant aquestes característiques melòdiques, extraiem els patrons melòdics recurrents més destacats. Aquests patrons són els blocs que construeixen les estructures melòdiques, tant en la improvisació i com en la composició. Per tant, són fonamentals per a la descripció de col·leccions de música MCI. Proposem partir d’un enfocament no supervisat que utilitza eines d'anàlisi basades en sèries temporals per descobrir patrons melòdics en grans col·leccions de música. En primer lloc, hem realitzat un anàlisi supervisat extensiu sobre la similitud melòdica, que és un component fonamental per al descobriment de patrons. A continuació, millorem els resultats (respecte al millor competidor segons l’estat de la qüestió) explotant les característiques peculiars dels patrons melòdics de la música MCI. Per identificar patrons musicalment rellevants, explotem les relacions entre els patrons descoberts mitjançant un anàlisi de xarxa. Extenses proves realitzades amb músics professionals revelen que els patrons melòdics descoberts són musicalment interessants i significatius. Finalment, fem servir els nostres resultats per al reconeixement de rāgas en actuacions gravades d'IAM. Proposem dos enfocaments nous que capturen conjuntament el to i els aspectes temporals de la melodia. El primer enfoc utilitza patrons melòdics, l’aspecte més important per als éssers humans a l’hora d’identificar rāgas. Utilitzem els patrons melòdics descoberts i fem servir tècniques de modelatge de temes (“topic modeling” en anglès), on considerem que la interpretació d’un raga és similar a la descripció textual d’un tema. En el nostre segon enfocament, proposem utilitzar el “time delayed melodic surface”, una característica innovadora basada en coordenades de retard que captura l’evolució melòdica del rāga. Amb aquests enfocaments demostrem una precisió sense precedents per al reconeixement de rāgas en el conjunt de dades més gran utilitzat mai per a aquesta tasca. Encara que el nostre enfocament està basat en les característiques de les melodies MCI i la tasca en qüestió, creiem que la nostra metodologia es pot estendre fàcilment a altres tradicions de la música on la melodia és rellevant. En general, hem incorporat nous mètodes computacionals per a l'anàlisi de diversos aspectes melòdics per a interpretacions de MCI, a partir dels quals descrivim i inter-connectem gran quantitat d'enregistraments de música. En aquest procés hem recopilat dades i hem desenvolupat diverses eines que poden ser utilitzades per a diferents estudis computacionals per a MCI, específicament en la caracterització de rāgas, composicions i artistes. Les tecnologies resultants d'aquest treball d’investigació són part de diverses aplicacions desenvolupades dins el projecte CompMusic que pretén millorar la descripció, l’experiència auditiva, i la pedagogia de la MCI.<br>La descripción automática del contenido de música grabada es crucial para la interacción con grandes colecciones de grabaciones de audio y para el desarrollo de nuevas herramientas que faciliten la pedagogía musical. La melodía es un aspecto fundamental para la mayoría de las tradiciones musicales, y es por tanto un componente indispensable para tal descripción. En esta tesis desarrollamos propuestas computacionales para el análisis de aspectos melódicos de alto nivel en interpretaciones musicales de Música Clásica de la India (MCI), con las que podemos describir e interrelacionar grandes cantidades de grabaciones de audio. Debido a su complejidad melódica y a su sólido marco teórico, la descripción de la melodía en MCI más allá de la línea melódica supone un interesante y desafiante objeto de investigación. Analizamos melodías en su contexto tonal, identificamos patrones melódicos, comparamos ambos tanto en piezas individuales como entre diferentes piezas, y finalmente caracterizamos el contexto melódico específico de MCI, los rāgas. Todos estos análisis se llevan a cabo mediante métodos dirigidos por datos en corpus de música de considerable tamaño y meticulosamente organizados. La tesis comienza con la confección y estructuración de los mayores corpus musicales hasta la fecha de las dos tradiciones de MCI, indostaní y carnática. Dichos corpus están formados por grabaciones de audio de alta calidad y sus correspondientes metadatos. De estas extraemos la línea melódica predominante y la normalizamos según la tónica de su contexto. Un elemento importante para la descripción de melodías es la identificación de unidades temporales significativas, para lo que proponemos detectar en música indostaní las ocurrencias de nyās svaras, marcas que delimitan patrones melódicos musicalmente prominentes. A partir de estas características melódicas, extraemos patrones melódicos recurrentes y musicalmente relevantes. Estos patrones son las unidades básicas con las que se construyen estructuras melódicas tanto en improvisaciones como composiciones, y por tanto son fundamentales para la descripción de colecciones de audio en MCI. Proponemos un método no supervisado basado en el análisis de las series temporales para el descubrimiento de patrones melódicos en colecciones musicales de tamaño considerable. En primer lugar llevamos a cabo un análisis supervisado en profundidad de similitud melódica, que es el componente crítico para el descubrimiento de patrones. A continuación mejoramos la propuesta más competitiva sirviéndonos de las características melódicas propias de MCI. Para identificar patrones musicalmente significativos, hacemos uso de las relaciones entre los patrones descubiertos mediante la implementación de análisis de redes. Exhaustivas evaluaciones auditivas por parte de músicos profesionales de los patrones melódicos descubiertos revelan que estos son musicalmente interesantes y significativos. Finalmente, utilizamos nuestros resultados para el reconocimiento de rāgas en interpretaciones grabadas de MCI. Proponemos dos métodos nuevos que captan conjuntamente los aspectos tonales y temporales de la melodía. Nuestro primer método se sirve de patrones melódicos, los principales indicadores para la identificación de rāgas por parte de oyentes humanos. Utilizamos los patrones melódicos descubiertos y empleamos técnicas de modelado de temas, en las que equiparamos la interpretación de un rāga a la descripción textual de un tema. En nuestro segundo método, proponemos una superficie melódica de tiempo de retardo, una característica nueva basada en las coordenadas de retraso que captan el contorno melódico de un rāga. Con estos métodos alcanzamos precisiones sin precedentes en el reconocimiento de rāgas en los mayores conjuntos de datos nunca usados para esta tarea. Aunque nuestra propuesta se fundamenta en las características de las melodías en MCI y la tarea en cuestión, creemos que nuestra metodología puede ser fácilmente aplicable a otras tradiciones musicales predominantemente melódicas. En resumen, hemos construido nuevos métodos computacionales para el análisis de varios aspectos melódicos de interpretaciones grabadas de MCI, con las que describimos e interrelacionamos grandes cantidades de grabaciones musicales. En este proceso hemos desarrollado varias herramientas y reunido datos que pueden ser empleados en numerosos estudios computacionales de MCI, específicamente para la caracterización de rāgas, composiciones y artistas. Las tecnologías resultantes de este trabajo de investigación son parte de varias aplicaciones desarrolladas en el proyecto CompMusic para la mejora de la descripción, experiencia de escucha, y enseñanza de MCI.

APA, Harvard, Vancouver, ISO, and other styles

15

Aljuhani, Hind S. "USING CORPORA IN A LEXICALIZED STYLISTICS APPROACH TO TEACHING ENGLISH-AS-A-FOREIGN-LANGUAGE LITERATURE." CSUSB ScholarWorks, 2016. https://scholarworks.lib.csusb.edu/etd/272.

Full text

Abstract:

As a lingua franca across the globe, English plays a vital role in international communications. Due to rapid economic, political, and educational globalization, the English language has become a powerful means of communication. Therefore, English education is vital to the development of many countries around the world. Since 1932, the need for a lingua franca in Saudi Arabia developed as the country progressed politically, economically, and educationally. Now, English is important to Saudis’ economic, educational, and career development and success. Vocabulary is a major step in learning any language. By deepening students’ lexical knowledge, they will be able to use English accurately to express themselves. However, teaching words in isolation and through memorization is not highly effective; English-as-a-foreign-language (EFL) learners need to interact with the language and its usage in a more profound way. This can be done by integrating corpora and stylistics analysis in an EFL curriculum. The importance of stylistics analysis to literary texts in the EFL classroom lies in the way that EFL learners will be exposed to authentic language. At the same time they will get insight into how English is structured; and by accessing corpora, which provide a wide range of data for the analysis of stylistics, students will be able to compare the lexical and grammatical patterns in authentic texts. Also, it is important to introduce students to the different levels of English (i.e. semantic, lexis, morphology); this will enlarge EFL learners’ knowledge of English vocabulary and various grammatical patterns. This project offers an innovative perspective on how to teach English for EFL university-level students by using corpora in a lexicalized stylistics approach, which will enable EFL learners to acquire vocabulary by reading literary texts. This provides a rich environment of lexical items and a variety of grammatical patterns. This approach offers EFL learners analytical tools that will improve their linguistic skills as they interact with and analyze authentic examples of English and gain insight about its historical, social and cultural background.

APA, Harvard, Vancouver, ISO, and other styles

16

Uslu, Tolga [Verfasser], Alexander [Akademischer Betreuer] Mehler, Alexander [Gutachter] Mehler, and Visvanathan [Gutachter] Ramesh. "Multi-document analysis : semantic analysis of large text corpora beyond topic modeling / Tolga Uslu ; Gutachter: Alexander Mehler, Visvanathan Ramesh ; Betreuer: Alexander Mehler." Frankfurt am Main : Universitätsbibliothek Johann Christian Senckenberg, 2020. http://d-nb.info/1221669125/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Hatzidaki, Ourania. "Part and parcel : a linguistic analysis of binomials and its application to the internal characterization of corpora." Thesis, University of Birmingham, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.323634.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Al-Sharief, Sultan M. "Interaction in writing : an analysis of the writer-reader relationship in four corpora of medical written texts." Thesis, University of Liverpool, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368632.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Baker, Dylan. "The Document Similarity Network: A Novel Technique for Visualizing Relationships in Text Corpora." Scholarship @ Claremont, 2017. https://scholarship.claremont.edu/hmc_theses/100.

Full text

Abstract:

With the abundance of written information available online, it is useful to be able to automatically synthesize and extract meaningful information from text corpora. We present a unique method for visualizing relationships between documents in a text corpus. By using Latent Dirichlet Allocation to extract topics from the corpus, we create a graph whose nodes represent individual documents and whose edge weights indicate the distance between topic distributions in documents. These edge lengths are then scaled using multidimensional scaling techniques, such that more similar documents are clustered together. Applying this method to several datasets, we demonstrate that these graphs are useful in visually representing high-dimensional document clustering in topic-space.

APA, Harvard, Vancouver, ISO, and other styles

20

Lavender, Andrew Jordan. "Code Switching, Lexical Borrowing, and Polylanguaging in Valencian Spanish| An Analysis of Data From Conversational Corpora and Twitter." Thesis, State University of New York at Albany, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10281503.

Full text

Abstract:

<p> This study examines lexical borrowing, code switching, and polylanguaging in Valencian Spanish to better understand how each is used differently in oral conversation in comparison with online communication on Twitter. This study compares data collected from three published corpora of oral interviews of speakers of Valencian Spanish with data collected from Twitter profiles of individuals residing in Valencia. In each of the sources Spanish is the preferred code into which Valencian material is inserted. A unique feature of data from the published corpora is the high frequency of code switching (CS) into Valencian in instances of reported speech. With regard to frequency, Twitter users switch from Spanish into Valencian, followed by from Valencian into Spanish and then from Spanish into English. On Twitter, the most frequent type of switch found is the tag switch, which includes exhortatives, greetings and farewells, happy birthday wishes, and a variety of other types of tags and other idiomatic expressions used in a highly emblematic fashion as a way of preforming identity. Both intrasentential and intersentential switches also appear online and reflect how discourse might be organized differently online than offline. In looking at lone vs. multiword insertions, the importance of turn taking is noted and instances where speakers are not in a naturalistic conversation evidence traits which influence patterns of CS and polylanguaguing. Additionally, lexical economy is suggested as a motivating factor for CS on Twitter given the platform’s technological limitation of 140 characters per tweet.</p><p>

APA, Harvard, Vancouver, ISO, and other styles

21

Frey, Jennifer Carmen <1988&gt. "Using data mining to repurpose German language corpora. An evaluation of data-driven analysis methods for corpus linguistics." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amsdottorato.unibo.it/9300/1/frey_jennifercarmen_tesi.pdf.

Full text

Abstract:

A growing number of studies report interesting insights gained from existing data resources. Among those, there are analyses on textual data, giving reason to consider such methods for linguistics as well. However, the field of corpus linguistics usually works with purposefully collected, representative language samples that aim to answer only a limited set of research questions. This thesis aims to shed some light on the potentials of data-driven analysis based on machine learning and predictive modelling for corpus linguistic studies, investigating the possibility to repurpose existing German language corpora for linguistic inquiry by using methodologies developed for data science and computational linguistics. The study focuses on predictive modelling and machine-learning-based data mining and gives a detailed overview and evaluation of currently popular strategies and methods for analysing corpora with computational methods. After the thesis introduces strategies and methods that have already been used on language data, discusses how they can assist corpus linguistic analysis and refers to available toolkits and software as well as to state-of-the-art research and further references, the introduced methodological toolset is applied in two differently shaped corpus studies that utilize readily available corpora for German. The first study explores linguistic correlates of holistic text quality ratings on student essays, while the second deals with age-related language features in computer-mediated communication and interprets age prediction models to answer a set of research questions that are based on previous research in the field. While both studies give linguistic insights that integrate into the current understanding of the investigated phenomena in German language, they systematically test the methodological toolset introduced beforehand, allowing a detailed discussion of added values and remaining challenges of machine-learning-based data mining methods in corpus at the end of the thesis.

APA, Harvard, Vancouver, ISO, and other styles

22

Médoc, Nicolas. "A visual analytics approach for multi-resolution and multi-model analysis of text corpora : application to investigative journalism." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB042/document.

Full text

Abstract:

À mesure que la production de textes numériques croît exponentiellement, un besoin grandissant d’analyser des corpus de textes se manifeste dans beaucoup de domaines d’application, tant ces corpus constituent des sources inépuisables d’information et de connaissance partagées. Ainsi proposons-nous dans cette thèse une nouvelle approche de visualisation analytique pour l’analyse de corpus textuels, mise en œuvre pour les besoins spécifiques du journalisme d’investigation. Motivées par les problèmes et les tâches identifiés avec une journaliste d’investigation professionnelle, les visualisations et les interactions ont été conçues suivant une méthodologie centrée utilisateur, impliquant l’utilisateur durant tout le processus de développement. En l’occurrence, les journalistes d’investigation formulent des hypothèses, explorent leur sujet d’investigation sous tous ses angles, à la recherche de sources multiples étayant leurs hypothèses de travail. La réalisation de ces tâches, très fastidieuse lorsque les corpus sont volumineux, requiert l’usage de logiciels de visualisation analytique se confrontant aux problématiques de recherche abordées dans cette thèse. D’abord, la difficulté de donner du sens à un corpus textuel vient de sa nature non structurée. Nous avons donc recours au modèle vectoriel et son lien étroit avec l’hypothèse distributionnelle, ainsi qu’aux algorithmes qui l’exploitent pour révéler la structure sémantique latente du corpus. Les modèles de sujets et les algorithmes de biclustering sont efficaces pour l’extraction de sujets de haut niveau. Ces derniers correspondent à des groupes de documents concernant des sujets similaires, chacun représenté par un ensemble de termes extraits des contenus textuels. Une telle structuration par sujet permet notamment de résumer un corpus et de faciliter son exploration. Nous proposons une nouvelle visualisation, une carte pondérée des sujets, qui dresse une vue d’ensemble des sujets de haut niveau. Elle permet d’une part d’interpréter rapidement les contenus grâce à de multiples nuages de mots, et d’autre part, d’apprécier les propriétés des sujets telles que leur taille relative et leur proximité sémantique. Bien que l’exploration des sujets de haut niveau aide à localiser des sujets d’intérêt ainsi que leur voisinage, l’identification de faits précis, de points de vue ou d’angles d’analyse, en lien avec un événement ou une histoire, nécessite un niveau de structuration plus fin pour représenter des variantes de sujet. Cette structure imbriquée révélée par Bimax, une méthode de biclustering basée sur des motifs avec chevauchement, capture au sein des biclusters les co-occurrences de termes partagés par des sous-ensembles de documents pouvant dévoiler des faits, des points de vue ou des angles associés à des événements ou des histoires communes. Cette thèse aborde les problèmes de visualisation de biclusters avec chevauchement en organisant les biclusters terme-document en une hiérarchie qui limite la redondance des termes et met en exergue les parties communes et distinctives des biclusters. Nous avons évalué l’utilité de notre logiciel d’abord par un scénario d’utilisation doublé d’une évaluation qualitative avec une journaliste d’investigation. En outre, les motifs de co-occurrence des variantes de sujet révélées par Bima. sont déterminés par la structure de sujet englobante fournie par une méthode d’extraction de sujet. Cependant, la communauté a peu de recul quant au choix de la méthode et son impact sur l’exploration et l’interprétation des sujets et de ses variantes. Ainsi nous avons conduit une expérience computationnelle et une expérience utilisateur contrôlée afin de comparer deux méthodes d’extraction de sujet. D’un côté Coclu. est une méthode de biclustering disjointe, et de l’autre, hirarchical Latent Dirichlet Allocation (hLDA) est un modèle de sujet probabiliste dont les distributions de probabilité forment une structure de bicluster avec chevauchement. (...)<br>As the production of digital texts grows exponentially, a greater need to analyze text corpora arises in various domains of application, insofar as they constitute inexhaustible sources of shared information and knowledge. We therefore propose in this thesis a novel visual analytics approach for the analysis of text corpora, implemented for the real and concrete needs of investigative journalism. Motivated by the problems and tasks identified with a professional investigative journalist, visualizations and interactions are designed through a user-centered methodology involving the user during the whole development process. Specifically, investigative journalists formulate hypotheses and explore exhaustively the field under investigation in order to multiply sources showing pieces of evidence related to their working hypothesis. Carrying out such tasks in a large corpus is however a daunting endeavor and requires visual analytics software addressing several challenging research issues covered in this thesis. First, the difficulty to make sense of a large text corpus lies in its unstructured nature. We resort to the Vector Space Model (VSM) and its strong relationship with the distributional hypothesis, leveraged by multiple text mining algorithms, to discover the latent semantic structure of the corpus. Topic models and biclustering methods are recognized to be well suited to the extraction of coarse-grained topics, i.e. groups of documents concerning similar topics, each one represented by a set of terms extracted from textual contents. We provide a new Weighted Topic Map visualization that conveys a broad overview of coarse-grained topics by allowing quick interpretation of contents through multiple tag clouds while depicting the topical structure such as the relative importance of topics and their semantic similarity. Although the exploration of the coarse-grained topics helps locate topic of interest and its neighborhood, the identification of specific facts, viewpoints or angles related to events or stories requires finer level of structuration to represent topic variants. This nested structure, revealed by Bimax, a pattern-based overlapping biclustering algorithm, captures in biclusters the co-occurrences of terms shared by multiple documents and can disclose facts, viewpoints or angles related to events or stories. This thesis tackles issues related to the visualization of a large amount of overlapping biclusters by organizing term-document biclusters in a hierarchy that limits term redundancy and conveys their commonality and specificities. We evaluated the utility of our software through a usage scenario and a qualitative evaluation with an investigative journalist. In addition, the co-occurrence patterns of topic variants revealed by Bima. are determined by the enclosing topical structure supplied by the coarse-grained topic extraction method which is run beforehand. Nonetheless, little guidance is found regarding the choice of the latter method and its impact on the exploration and comprehension of topics and topic variants. Therefore we conducted both a numerical experiment and a controlled user experiment to compare two topic extraction methods, namely Coclus, a disjoint biclustering method, and hierarchical Latent Dirichlet Allocation (hLDA), an overlapping probabilistic topic model. The theoretical foundation of both methods is systematically analyzed by relating them to the distributional hypothesis. The numerical experiment provides statistical evidence of the difference between the resulting topical structure of both methods. The controlled experiment shows their impact on the comprehension of topic and topic variants, from analyst perspective. (...)

APA, Harvard, Vancouver, ISO, and other styles

23

Médoc, Nicolas. "A visual analytics approach for multi-resolution and multi-model analysis of text corpora : application to investigative journalism." Electronic Thesis or Diss., Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB042.

Full text

Abstract:

À mesure que la production de textes numériques croît exponentiellement, un besoin grandissant d’analyser des corpus de textes se manifeste dans beaucoup de domaines d’application, tant ces corpus constituent des sources inépuisables d’information et de connaissance partagées. Ainsi proposons-nous dans cette thèse une nouvelle approche de visualisation analytique pour l’analyse de corpus textuels, mise en œuvre pour les besoins spécifiques du journalisme d’investigation. Motivées par les problèmes et les tâches identifiés avec une journaliste d’investigation professionnelle, les visualisations et les interactions ont été conçues suivant une méthodologie centrée utilisateur, impliquant l’utilisateur durant tout le processus de développement. En l’occurrence, les journalistes d’investigation formulent des hypothèses, explorent leur sujet d’investigation sous tous ses angles, à la recherche de sources multiples étayant leurs hypothèses de travail. La réalisation de ces tâches, très fastidieuse lorsque les corpus sont volumineux, requiert l’usage de logiciels de visualisation analytique se confrontant aux problématiques de recherche abordées dans cette thèse. D’abord, la difficulté de donner du sens à un corpus textuel vient de sa nature non structurée. Nous avons donc recours au modèle vectoriel et son lien étroit avec l’hypothèse distributionnelle, ainsi qu’aux algorithmes qui l’exploitent pour révéler la structure sémantique latente du corpus. Les modèles de sujets et les algorithmes de biclustering sont efficaces pour l’extraction de sujets de haut niveau. Ces derniers correspondent à des groupes de documents concernant des sujets similaires, chacun représenté par un ensemble de termes extraits des contenus textuels. Une telle structuration par sujet permet notamment de résumer un corpus et de faciliter son exploration. Nous proposons une nouvelle visualisation, une carte pondérée des sujets, qui dresse une vue d’ensemble des sujets de haut niveau. Elle permet d’une part d’interpréter rapidement les contenus grâce à de multiples nuages de mots, et d’autre part, d’apprécier les propriétés des sujets telles que leur taille relative et leur proximité sémantique. Bien que l’exploration des sujets de haut niveau aide à localiser des sujets d’intérêt ainsi que leur voisinage, l’identification de faits précis, de points de vue ou d’angles d’analyse, en lien avec un événement ou une histoire, nécessite un niveau de structuration plus fin pour représenter des variantes de sujet. Cette structure imbriquée révélée par Bimax, une méthode de biclustering basée sur des motifs avec chevauchement, capture au sein des biclusters les co-occurrences de termes partagés par des sous-ensembles de documents pouvant dévoiler des faits, des points de vue ou des angles associés à des événements ou des histoires communes. Cette thèse aborde les problèmes de visualisation de biclusters avec chevauchement en organisant les biclusters terme-document en une hiérarchie qui limite la redondance des termes et met en exergue les parties communes et distinctives des biclusters. Nous avons évalué l’utilité de notre logiciel d’abord par un scénario d’utilisation doublé d’une évaluation qualitative avec une journaliste d’investigation. En outre, les motifs de co-occurrence des variantes de sujet révélées par Bima. sont déterminés par la structure de sujet englobante fournie par une méthode d’extraction de sujet. Cependant, la communauté a peu de recul quant au choix de la méthode et son impact sur l’exploration et l’interprétation des sujets et de ses variantes. Ainsi nous avons conduit une expérience computationnelle et une expérience utilisateur contrôlée afin de comparer deux méthodes d’extraction de sujet. D’un côté Coclu. est une méthode de biclustering disjointe, et de l’autre, hirarchical Latent Dirichlet Allocation (hLDA) est un modèle de sujet probabiliste dont les distributions de probabilité forment une structure de bicluster avec chevauchement. (...)<br>As the production of digital texts grows exponentially, a greater need to analyze text corpora arises in various domains of application, insofar as they constitute inexhaustible sources of shared information and knowledge. We therefore propose in this thesis a novel visual analytics approach for the analysis of text corpora, implemented for the real and concrete needs of investigative journalism. Motivated by the problems and tasks identified with a professional investigative journalist, visualizations and interactions are designed through a user-centered methodology involving the user during the whole development process. Specifically, investigative journalists formulate hypotheses and explore exhaustively the field under investigation in order to multiply sources showing pieces of evidence related to their working hypothesis. Carrying out such tasks in a large corpus is however a daunting endeavor and requires visual analytics software addressing several challenging research issues covered in this thesis. First, the difficulty to make sense of a large text corpus lies in its unstructured nature. We resort to the Vector Space Model (VSM) and its strong relationship with the distributional hypothesis, leveraged by multiple text mining algorithms, to discover the latent semantic structure of the corpus. Topic models and biclustering methods are recognized to be well suited to the extraction of coarse-grained topics, i.e. groups of documents concerning similar topics, each one represented by a set of terms extracted from textual contents. We provide a new Weighted Topic Map visualization that conveys a broad overview of coarse-grained topics by allowing quick interpretation of contents through multiple tag clouds while depicting the topical structure such as the relative importance of topics and their semantic similarity. Although the exploration of the coarse-grained topics helps locate topic of interest and its neighborhood, the identification of specific facts, viewpoints or angles related to events or stories requires finer level of structuration to represent topic variants. This nested structure, revealed by Bimax, a pattern-based overlapping biclustering algorithm, captures in biclusters the co-occurrences of terms shared by multiple documents and can disclose facts, viewpoints or angles related to events or stories. This thesis tackles issues related to the visualization of a large amount of overlapping biclusters by organizing term-document biclusters in a hierarchy that limits term redundancy and conveys their commonality and specificities. We evaluated the utility of our software through a usage scenario and a qualitative evaluation with an investigative journalist. In addition, the co-occurrence patterns of topic variants revealed by Bima. are determined by the enclosing topical structure supplied by the coarse-grained topic extraction method which is run beforehand. Nonetheless, little guidance is found regarding the choice of the latter method and its impact on the exploration and comprehension of topics and topic variants. Therefore we conducted both a numerical experiment and a controlled user experiment to compare two topic extraction methods, namely Coclus, a disjoint biclustering method, and hierarchical Latent Dirichlet Allocation (hLDA), an overlapping probabilistic topic model. The theoretical foundation of both methods is systematically analyzed by relating them to the distributional hypothesis. The numerical experiment provides statistical evidence of the difference between the resulting topical structure of both methods. The controlled experiment shows their impact on the comprehension of topic and topic variants, from analyst perspective. (...)

APA, Harvard, Vancouver, ISO, and other styles

24

Watanabe, Tomoko. "Corpus-based study of the use of English general extenders spoken by Japanese users of English across speaking proficiency levels and task types." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/19549.

Full text

Abstract:

There is a pronounced shift in English language teaching policy in Japan with the recognition not only of the importance of spoken English and interactional competence in a globalised world, but also the need to emphasise it within English language pedagogy. Given this imperative to improve the oral communication skills of Japanese users of English (JUEs), it is vital for teachers of English to understand the cultural complexities surrounding the language, one of which is the use of vague language, which has been shown to serve both interpersonal and interactional functions in communications. One element of English vague language is the general extender (for example, or something). The use of general extenders by users of English as a second language (L2) has been studied extensively. However, there is a lack of research into the use of general extenders by JUEs, and their functional differences across speaking proficiency levels and contexts. This study sought to address the knowledge gap, critically exploring the use of general extenders spoken by JUEs across speaking proficiency levels and task types. The study drew on quantitative and qualitative corpus-based tools and methodologies using the National Institute of Information and Communications Technology Japanese Learner English Corpus (Izumi, Uchimoto, & Isahara, 2004), which contains transcriptions of a speaking test. An in-depth analysis of individual frequently-occurring general extenders was carried out across speaking proficiency levels and test tasks (description, narrative, interview and role-play) in order to reveal the frequency, and the textual and functional complexity of general extenders used by JUEs. In order to ensure the relevance of the application of the findings to the context of language education, the study also sought language teachers’ beliefs on the use of general extenders by JUEs. Three general extenders (or something (like that), and stuff, and and so on) were explored due to their high frequency within the corpus. The study showed that the use of these forms differed widely across the JUEs’ speaking proficiency levels and task types undertaken: or something (like that) is typically used in description tasks at the higher level and in interview and description tasks at the intermediate level; and stuff is typical of the interview at the higher level; and so on of the interview at the lower-intermediate level. The study also revealed that a greater proportion of the higher level JUEs use general extenders than do those at lower levels, while those with lower speaking proficiency level who do use general extenders, do so at an high density. A qualitative exploration of concordance lines and extracts revealed a number of interpersonal and discourse-oriented functions across speaking proficiency levels: or something (like that) functions to show uncertainty about information or linguistic choice and helps the JUEs to hold their turn; and stuff serves to make the JUEs’ expression emphatic; and so on appears to show the JUEs’ lack of confidence in their language use, and signals the desire to give up their turn. The findings suggest that the use of general extenders by JUEs is multifunctional, and that this multi-functionality is linked to various elements, such as the level of language proficiency, the nature of the task, the real time processing of their speech and the power asymmetry where the time and floor are mainly managed by the examiners. The study contributes to extending understanding of how JUEs use general extenders to convey interpersonal and discourse-oriented functions in the context of language education, in speaking tests and possibly also in classrooms, and provides new insights into the dynamics of L2 users’ use of general extenders. It brings into questions the generally-held view that the use of general extenders by L2 users as a group is homogenous. The findings from this study could assist teachers to understand JUEs’ intentions in their speech and to aid their speech production. More importantly, it may raise language educators’ awareness of how the use of general extenders by JUEs varies across speaking proficiency levels and task types. These findings should have pedagogical implications in the context of language education, and assist teachers in improving interactional competence, in line with emerging English language teaching policy in Japan.

APA, Harvard, Vancouver, ISO, and other styles

25

Badenhorst, Jacob Andreas Cornelius. "Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst." Thesis, North-West University, 2009. http://hdl.handle.net/10394/3994.

Full text

Abstract:

The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora.<br>Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.

APA, Harvard, Vancouver, ISO, and other styles

26

Chan, Chin-ying Alice, and 陳展瑩. "A corpus-based analysis of tense usage in Cantonese-English bilingual children." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B4515093X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Bobrik, Annette Verfasser], and Hermann [Akademischer Betreuer] [Krallmann. "Content-based Clustering in Social Corpora - A New Method for Knowledge Identification based on Text Mining and Cluster Analysis / Annette Bobrik. Betreuer: Hermann Krallmann." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2013. http://d-nb.info/1031075364/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Bobrik, Annette [Verfasser], and Hermann [Akademischer Betreuer] Krallmann. "Content-based Clustering in Social Corpora - A New Method for Knowledge Identification based on Text Mining and Cluster Analysis / Annette Bobrik. Betreuer: Hermann Krallmann." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2013. http://nbn-resolving.de/urn:nbn:de:kobv:83-opus-38461.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Farhan, Athil Khaleel. "Ideological manipulation in the translation of political discourse : a study of presidential speeches after the Arab Spring based on corpora and critical discourse analysis." Thesis, University of Surrey, 2017. http://epubs.surrey.ac.uk/841207/.

Full text

Abstract:

The present study explains that ideology can affect translators’ linguistic selections which can consequently shape the receivers’ worldviews. Owing to the fact that after the Arab Spring, new leaders with different ideologies and belonging to different political movements sprung forth, their political discourse has become a subject of increasing interest. The language these leaders use to promote their own political and ideological visions and the way to interpret them requires analysis to detect the possibility of translators’ intervention in the translation of these speeches. Adopting a mixed approach of corpus linguistics and critical discourse analysis, the present study focuses on investigating the manipulation of the source text ideology in the translation of presidential speeches after the Arab Spring. The source texts analysed in this study are 20 speeches by the former Egyptian president Morsi translated into English by five translators of various ideological backgrounds. The analysis of these source texts is based on the extraction of keywords and a selection of keywords with ideological content. The analysis of the target texts, on the other hand, focuses on the use of ideological keywords in lexical patterns and grammatical structures to detect ideological manipulation in translation. The thesis aims to describe systematically the means through which translations transfer, strengthen, or mitigate the ideology underlying the source texts. Using five parallel corpora of the source texts and their translations, the thesis also aims to ascertain whether the lexical choices and the syntactic structures employed in the target texts engender changes in the ideological content of the source texts and their underlying ideology. The results reveal that two out of the five translations project a manipulated ideology that is at variance with the ideology underlining the original texts. One translation strengthens the ideology of the source texts, whereas the other two translations aim to maintain the original ideology unchanged. This indicates that instances of ideological manipulation are probable even in the translation of presidential speeches due to the nature of the source texts, the ideology underlying them as well as the possibility of an ideological clash.

APA, Harvard, Vancouver, ISO, and other styles

30

Black, Nicholas. "Explaining and challenging the growing level of income inequality in organisations : corpora of texts about pay in UK universities taken from the press, remuneration committees and trade unions." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/explaining-and-challenging-the-growing-level-of-income-inequality-in-organisations-corpora-of-texts-about-pay-in-uk-universities-taken-from-the-press-remuneration-committees-and-trade-unions(1ddf5f46-c02a-4fab-8a2c-e90266728cce).html.

Full text

Abstract:

To explain and challenge the growing level of income inequality in organisations, this thesis collected and analysed corpora of texts about pay in UK universities from the press, remuneration committees and trade unions. Deploying the methodology of critical discourse analysis, it describes the contents of arguments as discourse types, interprets the reasoning behind arguments as genres of organisation theories and explains the common-sense assumptions ordering arguments as ideological values. Seeking answers, the analysis groups 30,038 data fragments into 74 first-order discourse types, 7 aggregate genres of organisation theories and 9 ideological values across three corpora of texts. Finding from the press suggested that actors drew upon the same set of organisation theories regardless of whether they were discursively challenging or defending the legitimacy of income inequality. This made it unfeasible to halt the level of income inequality because the underlying ideological values of competition, quantification and economic rationality only required the organisations to conform to unclear methodological processes. Thus, it is only possible to challenge the legitimacy of income inequality by proposing new members' resources, which objectified the exact contingencies for when it was appropriate. This insight lead to the creation of a new genre of organisation theory, which proposed paying employees relative to their comparative sacrifices. Findings from remuneration committees suggested that their members drew upon organisation theories to legitimise income inequality, which related to the ideological values of economic science, individualism and capitalistic hierarchy. However, how these ideological values constructed the legitimacy of their decisions lacked a substantiate rationality because the neoliberal model of capitalism was a source of legitimacy within itself. As such, the foundations of legitimacy were critiqued and a 2x2 matrix consisting of a process-outcome axis and pragmatic-moral axis was introduced. Applying this matrix to this corpus of text meant that none of these genres of organisation theories reasoned based on outcomes. Therefore, a new genre of organisation was proposed which focused on the income distribution shape for organisations. Findings from trade unions suggested that their representatives drew upon the same set of organisation theories to reinforce their own legitimacy in addition to interrogating the legitimacy of universities. These organisational theories were then related to the ideological values of performativity, exchange relations and freedom that hegemonically legitimised income inequality. Meanwhile, it was interpreted that trade unions relied on the neoliberal model of capitalism for their existence and were encouraging employees to participate in markets that only served the interests of employers. Therefore, a new members' resource was proposed, which conceptualised why sacrifice was a moral and pragmatic process for distributing pay to employees in comparison with other macro-economic frameworks. The findings from these three corpora of texts explained and challenged the social practices that were creating income inequality growth. Essentially, the ideological values of neoliberalism ordered discourse so that there was no reason to reduce the level of income inequality according to the dominate members' resources. Therefore, to change these social practices three new discourses were proposed which challenged the level of income inequality by illustrating the false consciousness embodied within their reasoning.

APA, Harvard, Vancouver, ISO, and other styles

31

Stentaford, Allison. "Translating the Environment: A Comparative Analysis of Monolingual Corpora and Corpus-Based Resources, their Usability and their Effectiveness in Improving Translation Students’ Comprehension and Usage of Specialized Terminology in the Field of the Environment." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/35978.

Full text

Abstract:

Corpora and corpus-based resources have received much attention with regard to translator training, terminology, and specialized resource development. With a specialized monolingual corpus and a specialized online dictionary, the DiCoEnviro, we sought to provide insight into the usability and effectiveness of both types of resources in improving translation students’ comprehension and usage of specialized terminology in the field of the environment. We assessed a specialized corpus and the DiCoEnviro through three lenses adapted from the usability framework proposed by Nielsen (2001): effectiveness, efficiency, and satisfaction. We used data ( screen recordings, questionnaires, translation exercises) collected from six translation students enrolled in undergraduate and graduate programs at the University of Ottawa School of Translation and Interpretation (UO-STI). Through quantitative and qualitative data analysis, we provide insight into the usability of both types of resources and into the prospective application of these findings in translator training programs and the development of specialized resources.

APA, Harvard, Vancouver, ISO, and other styles

32

Almujaiwel, Sultan Nasser. "Contrastive lexicology and comparable English-Arabic corpora-based analysis of vague and mistranslated Arabic equivalence : the case of the modern English-Arabic dictionary of al-Mawrid." Thesis, University of Exeter, 2012. http://hdl.handle.net/10871/13141.

Full text

Abstract:

The main concern in this research is to reveal the existence of shortcomings in the representation of meaning in the equivalents provided in a given context of the bilingual English-Arabic dictionary of al-Mawrid (Ba<albaki 2005), and to disclose the contributions made in Contrastive Lexicology, Bilingual Lexicography, Translation Theory, Corpus Linguistics and Contrastive Linguistics, in an attempt to come up with a more suitable framework, based on bilingual lexicology and corpora-based approaches, for the analysis of equivalence in English-Arabic by means of computerized corpora, especially by what is known as comparable corpora. This research is divided into 6 Chapters. The introduction, Chapter 1, provides the statement of the research problem, the rationale, the objectives and the questions of the study. Chapter 2 discusses three issues: (i) the terms used to refer to the word; (ii) the semantic analysis and relations of the word; and (iii) the disciplines of bilingual lexicography, translation studies and contrastive linguistics, and their respective contributions to the central notion of equivalence in the bilingual dictionary. The discussion about the last issue will pave the way for using comparable corpora in the investigation of selected entries and their equivalents in the given context. It will also show how useful and effective such an approach is in criticising existing Arabic equivalents in al-Mawrid (2005). Chapter 3 is a review of the bilingual English-Arabic dictionary of al-Mawrid in terms of its purpose and the representation of meanings and entries. It also includes an overview of previous reviews. The aim is to provide and develop a new critical framework of al-Mawrid by a new multi-approach to equivalence in the English-Arabic dictionary, as given in Chapter 4: this is mainly based on comparable English-Arabic corpora, and the criteria for making two individual corpora comparable rather than parallel. Chapters 5 and 6 are dedicated to the analysis of equivalents which are found to be either vague (see Chapter 5) or a mistranslation (see Chapter 6) in a given context.

APA, Harvard, Vancouver, ISO, and other styles

33

Parnell, Mike. "A genealogical analysis of the deployment of personality disorder in the UK psychiatric context since 1950 : corpus linguistics as an adjunct to a Foucauldian discourse analysis of diachronic corpora of psychiatric texts from 1950 to 2007." Thesis, University of Nottingham, 2010. http://eprints.nottingham.ac.uk/13537/.

Full text

Abstract:

In order to examine how personality disorder and related concepts have been deployed in UK psychiatric literature over the last 50 years, a number of methodological and theoretical approaches are initially examined. It is concluded that a Foucauldian discourse analytic approach, supported and informed by findings from Corpus Linguistic techniques would provide a means of uncovering discourses surrounding the use of personality disorder in such literature. A new combined methodology is proposed that uses evidence from a Corpus Linguistic analysis to support Willig's six step methodology for Foucauldian Discourse Analysis (Willig 2001b). Three diachronic corpora of UK psychiatric articles are created, covering the 1950s, 1970s and 2000s. These are interrogated using word frequencies, concordance and collocational approaches in order to uncover patterns which reflect discourse changes over these periods. Evidence for a move from Narrative Discourses towards a dominant Statistical and Scientific Discourse is presented and discussed along with the implications and subject positions associated with these.

APA, Harvard, Vancouver, ISO, and other styles

34

Périnet, Amandine. "Analyse distributionnelle appliquée aux textes de spécialité : réduction de la dispersion des données par abstraction des contextes." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD056/document.

Full text

Abstract:

Dans les domaines de spécialité, les applications telles que la recherche d’information ou la traduction automatique, s’appuient sur des ressources terminologiques pour prendre en compte les termes, les relations sémantiques ou les regroupements de termes. Pour faire face au coût de la constitution de ces ressources, des méthodes automatiques ont été proposées. Parmi celles-ci, l’analyse distributionnelle s’appuie sur la redondance d’informations se trouvant dans le contexte des termes pour établir une relation. Alors que cette hypothèse est habituellement mise en oeuvre grâce à des modèles vectoriels, ceux-ci souffrent du nombre de dimensions considérable et de la dispersion des données dans la matrice des vecteurs de contexte. En corpus de spécialité, ces informations contextuelles redondantes sont d’autant plus dispersées et plus rares que les corpus ont des tailles beaucoup plus petites. De même, les termes complexes sont généralement ignorés étant donné leur faible nombre d’occurrence. Dans cette thèse, nous nous intéressons au problème de la limitation de la dispersion des données sur des corpus de spécialité et nous proposons une méthode permettant de densifier la matrice des contextes en réalisant une abstraction des contextes distributionnels. Des relations sémantiques acquises en corpus sont utilisées pour généraliser et normaliser ces contextes. Nous avons évalué la robustesse de notre méthode sur quatre corpus de tailles, de langues et de domaines différents. L’analyse des résultats montre que, tout en permettant de prendre en compte les termes complexes dans l’analyse distributionnelle, l’abstraction des contextes distributionnels permet d’obtenir des groupements sémantiques de meilleure qualité mais aussi plus cohérents et homogènes<br>In specialised domains, the applications such as information retrieval for machine translation rely on terminological resources for taking into account terms or semantic relations between terms or groupings of terms. In order to face up to the cost of building these resources, automatic methods have been proposed. Among those methods, the distributional analysis uses the repeated information in the contexts of the terms to detect a relation between these terms. While this hypothesis is usually implemented with vector space models, those models suﬀer from a high number of dimensions and data sparsity in the matrix of contexts. In specialised corpora, this contextual information is even sparser and less frequent because of the smaller size of the corpora. Likewise, complex terms are usually ignored because of their very low number of occurrences. In this thesis, we tackle the problem of data sparsity on specialised texts. We propose a method that allows making the context matrix denser, by performing an abstraction of distributional contexts. Semantic relations acquired from corpora are used to generalise and normalise those contexts. We evaluated the method robustness on four corpora of diﬀerent sizes, diﬀerent languages and diﬀerent domains. The analysis of the results shows that, while taking into account complex terms in distributional analysis, the abstraction of distributional contexts leads to deﬁning semantic clusters of better quality, that are also more consistent and more homogeneous

APA, Harvard, Vancouver, ISO, and other styles

35

FORCHINI, PIER FRANCA. "Spontaneity in American English: face - to - face and movie conversation compared." Doctoral thesis, Università Cattolica del Sacro Cuore, 2009. http://hdl.handle.net/10280/411.

Full text

Abstract:

La tesi fornisce uno studio empirico relativo agli elementi linguistici caratterizzanti il parlato faccia-a-faccia e il parlato filmico americano, due domini conversazionali solitamente detti differire in termini di spontaneità, essendo il primo generalmente descritto come la quintessenza del linguaggio parlato (in quanto totalmente spontaneo) e il secondo come non-spontaneo (essendo scritto-per-essere-parlato) e, quindi, non adatto a rappresentare l'uso generale della conversazione. Entrambe le analisi (i.e. quella multi-dimensionale, che offre una panoramica generale dei due domini presi in considerazione, e quella più specifica relativa al comportamento linguistico dell’espressione you know) basate su esempi autentici tratti da corpora dimostrano che, nonostante quanto venga generalmente descritto dalla letteratura a riguardo, conversazione faccia-a-faccia e conversazione filmica hanno molti tratti in comune e confutano l’idea che il linguaggio filmico non possa essere rappresentativo dell'uso generale della conversazione.<br>The present dissertation examines empirically the linguistic features characterizing American face-to-face and movie conversation, two domains which are usually claimed to differ especially in terms of spontaneity. Natural conversation is, indeed, considered the quintessence of the spoken language for it is totally spontaneous, whereas movie conversation is usually described as non-spontaneous, being artificially written-to-be spoken and, thus, not likely to represent the general usage of conversation. In spite of what is generally maintained by the literature, both the Multi-Dimensional analysis and the micro-analysis of the functions of you know based on authentic data retrieved from corpora show that the two conversational domains do not differ to a great extent and thus confutes the claim that movie language has “a very limited value” in that it does not reflect natural conversation and, consequently, is “not likely to be representative of the general usage of conversation”.

APA, Harvard, Vancouver, ISO, and other styles

36

FORCHINI, PIER FRANCA. "Spontaneity in American English: face - to - face and movie conversation compared." Doctoral thesis, Università Cattolica del Sacro Cuore, 2009. http://hdl.handle.net/10280/411.

Full text

Abstract:

La tesi fornisce uno studio empirico relativo agli elementi linguistici caratterizzanti il parlato faccia-a-faccia e il parlato filmico americano, due domini conversazionali solitamente detti differire in termini di spontaneità, essendo il primo generalmente descritto come la quintessenza del linguaggio parlato (in quanto totalmente spontaneo) e il secondo come non-spontaneo (essendo scritto-per-essere-parlato) e, quindi, non adatto a rappresentare l'uso generale della conversazione. Entrambe le analisi (i.e. quella multi-dimensionale, che offre una panoramica generale dei due domini presi in considerazione, e quella più specifica relativa al comportamento linguistico dell’espressione you know) basate su esempi autentici tratti da corpora dimostrano che, nonostante quanto venga generalmente descritto dalla letteratura a riguardo, conversazione faccia-a-faccia e conversazione filmica hanno molti tratti in comune e confutano l’idea che il linguaggio filmico non possa essere rappresentativo dell'uso generale della conversazione.<br>The present dissertation examines empirically the linguistic features characterizing American face-to-face and movie conversation, two domains which are usually claimed to differ especially in terms of spontaneity. Natural conversation is, indeed, considered the quintessence of the spoken language for it is totally spontaneous, whereas movie conversation is usually described as non-spontaneous, being artificially written-to-be spoken and, thus, not likely to represent the general usage of conversation. In spite of what is generally maintained by the literature, both the Multi-Dimensional analysis and the micro-analysis of the functions of you know based on authentic data retrieved from corpora show that the two conversational domains do not differ to a great extent and thus confutes the claim that movie language has “a very limited value” in that it does not reflect natural conversation and, consequently, is “not likely to be representative of the general usage of conversation”.

APA, Harvard, Vancouver, ISO, and other styles

37

Nghikembua, Annelie Ndapanda. "Error analysis in a learner corpus : a study of errors amongst Grade 12 Oshiwambo speaking learners of English in northern Namibia." Thesis, Rhodes University, 2015. http://hdl.handle.net/10962/d1018911.

Full text

Abstract:

High failure rates in English as a second language at secondary school level have become a concern in the Namibian education sector. From 2005 until 2013, the overall performance of the grade 12 learners in English as a second language on Ordinary level in the Oshana region was unsatisfactory. In fact, only a minority (18.52 percent) of the grade 12 learners obtained a grading in the range of A to D in comparison to the majority (81.48 percent) of learners who obtained a grading of E to U. The poor performance was attributed to: poor sentence structure, syllabification and spelling (Directorate of National Examination and Assessment, 2007-2010). The causes of these low performance rates however, were not scientifically explored in this region. Therefore this study embarked on an investigation in order to identify the reasons behind the low performance rates of the grade 12 Oshiwambo speaking learners of English and to determine whether the impressionistic results from the Directorate’s report correlate with the present study’s findings. In order to understand the dynamic linguistic system of the learners, a contrastive analysis of Oshiwambo and English was done in order to investigate the potential origins of some of the errors. An error analysis approach was also used to identify, classify and interpret the non-standard forms produced by the learners in their written work. Based on the results obtained from this study, a more comprehensible assessment rubric was devised to help identify learners’ written errors. A group of 100 learners from five different schools in the Oshana region was asked to write an essay of 150 to 200 words in English. The essays were analysed using Corder’s (1967) conceptual framework which outlines the steps that a researcher uses when undertaking an error analysis study. The errors were categorised according to Keshavarz’s (2006) linguistic error taxonomy. Based on this taxonomy, the results revealed that learners largely made errors in the following categories: phonology/orthography, morpho-syntax, lexico-semantics, discourse and techniquepunctuation. The study concluded that these errors were most likely due to: first language interference, overgeneralisation, ignorance of rule restriction and carelessness. Other proposed probable causes were context of learning and lack of knowledge of English grammar. The study makes a significant contribution, in that the findings can be used as a guide for the Namibian Ministry of Education in improving the status quo at schools and informing the line Ministry on various methods of dealing with language difficulties faced by learners. The findings can also empower teachers to help learners with difficulties in English language learning, thereby enabling learners to improve their English language proficiency. The study has proposed methods of intervention in order to facilitate the teaching of English as a second language in the Oshana region. In addition, the study has devised an easily applied assessment rubric that will assist in identifying non-standard forms of language used by learners. The reason for designing a new rubric is because the rubric which is currently being used is believed to be subjective, inconsistent and lacks transparency.<br>Name in Graduation Programme as: Nghikembua, Anneli Ndapanda

APA, Harvard, Vancouver, ISO, and other styles

38

Hutter, Jo-Anne. "A Corpus Based Analysis of Noun Modification in Empirical Research Articles in Applied Linguistics." PDXScholar, 2015. https://pdxscholar.library.pdx.edu/open_access_etds/2211.

Full text

Abstract:

Previous research has established the importance of the nouns and noun modification in academic writing because of their commonness and complexity. However, little is known about how noun modification varies across the rhetorical sections of research articles. Such a perspective is important because it reflects the interplay between communicative function and linguistic form. This study used a corpus of empirical research articles from the fields of applied linguistics and language teaching to explore the connection between article sections (Introduction, Methods, Results, Discussion; IMRD) and six types of noun modification: relative clauses, ing-clause postmodifiers, ed-clause postmodifiers, prepositional postmodifiers, premodifying nouns, and attributive adjectives. First the frequency of these six types of noun modification was compared across IMRD sections. Second, the study also used a hand coded analysis of the structure and structural patterns of a sample of noun phrases through IMRD sections. The results of the analyses showed that noun modification is not uniform across IMRD sections. Significant differences were found in the rates of use for attributive adjectives, premodifying nouns, and prepositional phrase postmodifiers. There were no significant differences between sections for relative clauses, ing-clause postmodifiers, or ed-clause postmodifiers. The differences between sections for attributive adjectives, premodifying nouns, and prepositional phrases illustrate the way the functions of these structures intersects with the functions of IMRD sections. For example, Methods sections describe research methods, which often have premodifying nouns (corpus analysis, conversation analysis, speech sample, etc.); this function of Methods sections results in a higher use of premodifying nouns compared to other sections. Results for structures of noun phrase across IMRD sections showed that the common noun modification patterns, such as premodifying noun only or attributive adjective with prepositional phrase postmodifier, were mostly consistent across sections. Noun phrase structures including pre-/post- or no modification did have differences across sections, with Introduction sections the most frequently modified and Methods sections the least frequently modified. The different functions of IMRD sections call for different rates of usage for noun modification, and the results reflected this. The results of this research benefit teachers of graduate students of applied linguistics in students' research reading and writing by describing the use of noun modification in the sections of empirical research articles and aiding teachers in the design of materials to clarify the use of noun modification in these IMRD sections.

APA, Harvard, Vancouver, ISO, and other styles

39

Yoo, Soyung. "Hypothetical Would-Clauses in Korean EFL Textbooks: An Analysis Based on a Corpus Study and Focus on Form Approach." PDXScholar, 2013. https://pdxscholar.library.pdx.edu/open_access_etds/911.

Full text

Abstract:

This study analyzed hypothetical would-clauses presented in Korean high school English textbooks from two perspectives: real language use and Focus on Form approach. Initiated by an interest in the results of a corpus study, this study discussed hypothetical would-clauses in terms of how their descriptions in Korean EFL textbooks matched real language use. This study additionally investigated whether the textbooks presented the target language features in ways recommended by the Focus on Form approach. In the past few decades, authentic language use and the Focus on Form approach have received a great amount of attention in the SLA field. Recognizing the trend in SLA as well as necessities in Korean EFL education, the Korean government has incorporated these two into the current 7th curriculum. Such condition provided the momentum for the evaluation of the textbooks in these respects. The findings show that the language features were hardly supplemented by the information drawn from real language data. In addition, there were very few attempts to draw learner attention to language forms while keeping them focused on communication as recommended by Focus on Form approach. With increasing use of the English language, it is becoming more necessary for Korean EFL learners to use English in real life contexts where understanding correct nuances and delivering appropriate expressions may be important. Also, in EFL contexts like Korea, the students may have limited access to the target language input and little opportunities to produce outputs in extracurricular settings, so the integrated methodology of Focus on Form approach, rather than just using either one of structure-centered or meaning-oriented approach, would be of greater benefit to the students. However, the results strongly indicate that the textbooks neither incorporate the language features as they occur in naturally occurring language nor present them as to facilitate the learning of both form and meaning. This study suggests that greater use of real language data and more thorough application of Focus on Form methods in the textbook writing process should be seriously considered. Thus, this study could be useful for curriculum developers and textbook writers in creating curriculum and language materials concerning the incorporation of grammar patterns based on actual language use as well as in improving textbooks with respect to the Focus on Form approach.

APA, Harvard, Vancouver, ISO, and other styles

40

Svensson, Maria. "Marqueurs corrélatifs en français et en suédois : Étude sémantico-fonctionnelle de d’une part… d’autre part, d’un côté… de l’autre et de non seulement… mais en contraste." Doctoral thesis, Uppsala universitet, Romanska språk, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-125659.

Full text

Abstract:

This thesis deals with the correlative markers d’une part… d’autre part, d’un côté… de l’autre and non seulement… mais in French and their Swedish counterparts dels… dels, å ena sidan… å andra sidan and inte bara… utan. These markers are composed of two separate parts generally occurring together, and announce a serial of at least two textual units to be considered together. The analyses of the use of these three French and three Swedish markers are based upon two corpora of non-academic humanities texts. The first, principal corpus, is composed only of original French and Swedish texts. The second, complementary corpus, is composed of source texts in the two languages and their translations in the other language. By the combination of these two corpora, this study is comparative as well as contrastive. Through application of the Geneva model of discourse analysis and the Rhetorical Structure Theory, a semantic and functional approach to correlative markers and their text-structural role is adopted. The study shows similarities as well as differences between the six markers, both within each language and between the languages. D’une part… d’autre part and dels… dels principally mark a conjunctive relation, whereas d’un côté… de l’autre and å ena sidan… å andra sidan more often are used in a contrastive relation, even though they all can be used for both kinds of relations. Non seulement… mais and inte bara… utan mark a conjunctive relation, but can also indicate that the second argument is stronger than the first one. By the use of these two markers, the language users also present the first one as given and the second one as new information. In general, the French correlative markers appear to have a more argumentative function, whereas the text-structural function is demonstrated to be the most important in Swedish.

APA, Harvard, Vancouver, ISO, and other styles

41

Kader, Carla Callegaro Corrêa. "UM ESTUDO DOS FATORES DE ATRIBUIÇÃO EM TEXTOS ACADÊMICOS DE LETRAS E PSICOLOGIA À LUZ DA TEORIA HOLÍSTICA DA ATIVIDADE E DA LINGUÍSTICA DE CORPUS." Universidade Federal de Santa Maria, 2014. http://repositorio.ufsm.br/handle/1/3989.

Full text

Abstract:

Conselho Nacional de Desenvolvimento Científico e Tecnológico<br>This study aims to investigate the constitution of the social role of the Language teacher/professor opposing it to an emancipated profession as Psychology, observing the internal asymmetry of the alopoietic systems and the endogeny of the autopoietic systems. With this in mind, it was used the methodological support from Corpus Linguistics to analyze and compare corpora, composed by graduate research papers, lato sensu monographs, stricto sensu dissertations and theses in the Language and in the Psychology areas. First, it was questioned how the macro and microanalyses, generated by the programs TreeTagger, WordSmith Tools 6.0 and the Semantic Mapper, are related to the multidimensional analysis. Second, it was verified which categories (axiologic, deontic and/or epistemic alethic ones), linguistically characterize the professional profile of the linguistic educators and psychologists. Third, it was examined which conceptions were recurrent in the corpora from the cluster analyses of the concord lines. These questions of the research were answered by the Holistic Theory of Activity. In this study, this theory is focused in questions about professional development in the opposition to autopoietic and alopoietic processes (RICHTER, 2011). In the sequence of the studies, it was observed the frame of each profession with basis in the statistical and qualitative survey of the corpora. The emphasis of the analysis was in the attribution factor which characterizes the social roles and the professional model. After the development of the theoretical review, it was passed to the methodological aspects which maintain this analysis. The collection of the texts which compose these corpora was accomplished on the internet by means of visiting the database of libraries of Brazilian universities. It was also collected texts in public and private university libraries of Santa Maria. After the arrangement of the corpora, the texts were converted into the txt format (text without format). The texts of the Language area were separated in two groups, market teachers and academic professionals, while the corpus of Psychology did not suffer any subdivision. The corpora were tagged and drew out the classes of words with more elevated results. Therefore, it was passed on to the use of the software WordSmith Tools 6.0 with the obtainment of the quantitative results for the words in prominence in the WordList. Later, it was passed on the Semantic Mapping of these words and the separation in subcategories. In the receipt of the mapping results, it was started the dimensional analysis. The results of this analysis indicate the prominence to the gnoseologic and praxeological competences and to the modal verbs must and can . The concords analysis was effectuated with the words which belong to these categories. The final results show a differentiation between emancipated and non-emancipated professions. The first ones guide their development and professional performance by the Ethic Federal Council and, the second ones, in the orientations found in professional s academic papers with postgraduate education and in the guidelines of the official documents. More specifically, the corpus with Psychology texts points to the linguistics indicia related to an autopoiesis, characterized by the regulated professions and the professionals of the Languages area subdivide them in alopoietic linguistics traces (market professionals) and almost autopoietic ones (academic professionals).<br>Este trabalho tem por objetivo investigar a constituição do papel social do professor de Letras contrapondo-o ao de uma profissão emancipada como a Psicologia, observando a assimetria interna dos sistemas alopoiéticos e a endogenia dos sistemas autopoiéticos. Para tanto, utilizaram-se os pressupostos metodológicos da Linguística de Corpus para analisar e comparar corpora, compostos por textos veiculados em trabalhos finais de graduação (TCCs), monografias lato sensu, dissertações e teses stricto sensu das áreas de Letras e Psicologia. Inicialmente, questionou-se como as análises macro e microscópicas, geradas pelos programas TreeTagger, WordSmith Tools 6.0 e Mapeador Semântico, relacionam-se à análise multidimensional. Em um segundo momento, averiguou-se que eixos (axiológico, deôntico e/ou epistêmico alético) caracterizam linguisticamente o perfil profissional dos educadores linguísticos e dos psicólogos. Na terceira fase, buscou-se verificar que concepções são recorrentes nos corpora a partir da análise dos clusters das linhas de concordância. Essas perguntas de pesquisa foram respondidas à luz da Teoria Holística da Atividade. Neste estudo, enfoca-se essa teoria nas questões sobre o desenvolvimento profissional na oposição entre processos autopoiéticos e alopoiéticos (RICHTER, 2011). Na sequência dos estudos, observou-se o enquadramento de cada profissão com base no levantamento estatístico e qualitativo dos corpora. O enfoque da análise foi no fator de atribuição que caracteriza os papéis sociais e a modelagem profissional. Após o desenvolvimento da revisão da literatura, passou-se para os aspectos metodológicos que sustentam esta análise. A coleta dos textos que compõem os corpora foi realizada via internet, por meio da visitação aos bancos de dados de bibliotecas de universidades brasileiras. Também foram coletados textos nas bibliotecas de universidades públicas e privadas de Santa Maria. Após a formação dos corpora, os textos foram convertidos para o formato txt (texto sem formatação). Os textos da área de Letras foram separados em dois grupos, profissionais de mercado e profissionais da academia, enquanto o corpus de Psicologia não sofreu subdivisões. Os corpora foram etiquetados e extraíram-se as classes de palavras com resultados mais elevados. Passou-se, assim, para a utilização do software WordSmith Tools 6.0 com a obtenção de resultados quantitativos para as lexias em destaque na WordList. Posteriormente, passou-se para o mapeamento semântico dessas lexias e separação em subcategorias. De posse dos resultados do mapeamento, iniciou-se a análise multidimensional. Os resultados dessa análise apontam o destaque para as competências gnoseológica e praxeológica e para os verbos modais deve e pode . A análise dos concords foi efetuada com as lexias que pertencem a essas categorias. Os resultados finais apontam para uma diferenciação entre profissões emancipadas e não emancipadas. As primeiras orientam sua formação e atuação profissional pelo Conselho Federal de Ética e, as segundas, nas orientações encontradas nos textos acadêmicos de profissionais com especializações e nas orientações dos documentos oficiais. Mais especificamente, o corpus com os textos da área da Psicologia aponta para indícios linguísticos voltados para uma autopoiesis, característico das profissões regulamentadas e os profissionais da área de Letras subdividem-se em indícios linguísticos alopoiéticos (profissionais de mercado) e quase autopoiéticos (profissionais da academia).

APA, Harvard, Vancouver, ISO, and other styles

42

Eklund, Robert. "A Probabilistic Tagging Module Based on Surface Pattern Matching." Thesis, Stockholm University, Department of Computational Linguistics, Institute of Linguistics, 1993. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-135294.

Full text

Abstract:

A problem with automatic tagging and lexical analysis is that it is never 100 % accurate. In order to arrive at better figures, one needs to study the character of what is left untagged by automatic taggers. In this paper untagged residue outputted by the automatic analyser SWETWOL (Karlsson 1992) at Helsinki is studied. SWETWOL assigns tags to words in Swedish texts mainly through dictionary lookup. The contents of the untagged residue files are described and discussed, and possible ways of solving different problems are proposed. One method of tagging residual output is proposed and implemented: the left-stripping method, through which untagged words are bereaved their left-most letters, searched in a dictionary, and if found, tagged according to the information found in the said dictionary. If the stripped word is not found in the dictionary, a match is searched in ending lexica containing statistical information about word classes associated with that particular word form (i.e., final letter cluster, be this a grammatical suffix or not), and the relative frequency of each word class. If a match is found, the word is given graduated tagging according to the statistical information in the ending lexicon. If a match is not found, the word is stripped of what is now its left-most letter and is recursively searched in a dictionary and ending lexica (in that order). The ending lexica employed in this paper are retrieved from a reversed version of Nusvensk Frekvensordbok (Allén 1970), and contain endings of between one and seven letters. The contents of the ending lexica are to a certain degree described and discussed. The programs working according to the principles described are run on files of untagged residual output. Appendices include, among other things, LISP source code, untagged and tagged files, the ending lexica containing one and two letter endings and excerpts from ending lexica containing three to seven letters.

APA, Harvard, Vancouver, ISO, and other styles

43

Nicaise, Laurent. "Een multifactoriële studie over metaforiek in de financieel-economische pers." Doctoral thesis, Universite Libre de Bruxelles, 2012. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209731.

Full text

Abstract:

Quels facteurs déterminent la présence et le choix de métaphores dans la presse financière et économique? Présentation d’un modèle explicatif. <p><p>Ces 20 dernières années, les publications en sémantique cognitive traitant de la relation entre la métaphore et l’idéologie dans la presse financière se sont multipliées. Grâce notamment à Boers (1997, 1999, 2000), Koller (2002) et Charteris-Black (2000), la plupart des mécanismes rhétoriques accompagnant les métaphores sont relativement bien connus. <p><p>Toutefois, jusqu’à présent, l’effet de l’idéologie sur les choix métaphoriques n’a pas pu être prouvé, et à fortiori mesuré. Le but de cette étude est de développer un modèle explicatif des facteurs influençant la présence et le choix de métaphores dans la presse financière, afin de fournir un instrument méthodologique et statistique fiable pour l’analyse critique du discours. Une telle analyse pourrait s’avérer également utile dans le domaine de la traduction et de l’apprentissage de la langue spécialisée dans le domaine économique.<p><p>Le cadre théorique est constitué par une version modernisée de la Conceptual Metaphor Theory. L’approche est cognitive et onomasiologique. Le point de départ est un ensemble de concepts élémentaires du monde financier et sélectionnés sur la base des résultats d’un échantillon randomisé de 10.000 mots dans 2 quotidiens de la presse belge. Les concepts sont ensuite rassemblés sur base de critères pragmatiques et statistiques dans un ensemble qui reflète la composition du monde des finances et de la bourse. <p><p>Pour chaque réalisation de ces concepts, on décide si oui ou non il s’agit d’une métaphore, en appliquant la méthode d’identification proposée par le « Pragglejaz Group » (2007). Ensuite, dans le cas d’une métaphore, on tente d’identifier le domaine source.<p><p>Le corpus bilingue couvre une période de 12 mois en 2005 et comprend 450.000 mots, répartis dans 6 publications belges :De Standaard, De Morgen, Trends Cash, La Libre Belgique, Le Soir et L’Investisseur.<p><br>Doctorat en philosophie et lettres, Orientation langue et littérature<br>info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

44

Feldman, Anna. "Portable language technology: a resource-light approach to morpho-syntactic taggin." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1153344391.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Crymble, Leigh. "Textual representations of migrants and the process of migration in selected South African media a combined critical discourse analysis and corpus linguistics study." Thesis, Rhodes University, 2011. http://hdl.handle.net/10962/d1002624.

Full text

Abstract:

South Africa has long been associated with racial and ethnic issues surrounding prejudice and discrimination and despite a move post-1994 to a democratic ‘rainbow nation’ society, the country has remained plagued by unequal power relations. One such instance of inequality relates to the marginalisation of migrants which has been realised through xenophobic attitudes and actions, most notably the violence that swept across the country in 2008. Several reasons have been suggested in an attempt to explain the cause of the violence, including claims that migrants are taking ‘our jobs and our women’, migrants are ‘illegal and criminal’ and bringing ‘disease and contamination’ with them from their countries of origin. Although widely accepted that many, if not all, of these beliefs are based on ignorance and hearsay, these extensive generalisations shape and reinforce prejudiced ideologies about migrant communities. It is thus only when confronted with evidence that challenges this dominant discourse, that South Africans are able to reconsider their views. Williams (2008) suggests that for many South Africans, Africa continues to be the ‘dark continent’ that is seen as an ominous, threatening force of which they have very little knowledge. For this reason, anti-immigrant sentiment in a South African context has traditionally been directed at African foreigners. In this study I examine the ways in which African migrants and migrant communities, as well as the overall processes of migration, are depicted by selected South African print media: City Press, Mail & Guardian and Sunday Times. Using a combined Corpus Linguistics and Critical Discourse Analysis approach, I investigate the following questions: How are migrants and the process of migration into South Africa represented by these established newspapers between 2006 and 2010? Are there any differences or similarities between these representations? In particular, what ideologies regarding migrants and migrant communities underlie these representations? My analysis focuses on the landscape of public discourse about migration with an exploration of the rise and fall of the terminologies used to categorise migrants and the social implications of these classifications. Additionally, I analyse the expansive occurrences of negative representations of migrants, particularly through the use of ‘othering’ pronouns ‘us’ versus ‘them’ and through the use of metaphorical language which largely depicts these individuals as en masse natural disasters. I conclude that these discursive elements play a crucial role in contributing to an overall xenophobic rhetoric. Despite subtle differences between the three newspapers which can be accounted for based on their political persuasions and agendas, it is surprising to note how aligned these publications are with regard to their portrayal of migrants. With a few exceptions, this representation positions these individuals as powerless and disenfranchised and maintains the status quo view of migrants as burdens on the South African economy and resources. Overall, the newspaper articles contribute to mainstream dominant discourse on migrants and migration with the underlying ideology that migrants are responsible for the hardships suffered by South African citizens. Thus, this study contributes significantly to existing bodies of research detailing discourse on migrants and emphasises the intrinsic links between language, ideology and society.

APA, Harvard, Vancouver, ISO, and other styles

46

Hô, Dinh Océane. "Caractérisation différentielle de forums de discussion sur le VIH en vietnamien et en français : Éléments pour la fouille comportementale du web social." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCF022/document.

Full text

Abstract:

Les discours normés que produisent les institutions sont concurrencés par les discours informels ou faiblement formalisés issus du web social. La démocratisation de la prise de parole redistribue l’autorité en matière de connaissance et modifie les processus de construction des savoirs. Ces discours spontanés sont accessibles par tous et dans des volumes exponentiels, ce qui offre aux sciences humaines et sociales de nouvelles possibilités d’exploration. Pourtant elles manquent encore de méthodologies pour appréhender ces données complexes et encore peu décrites. L’objectif de la thèse est de montrer dans quelle mesure les discours du web social peuvent compléter les discours institutionnels. Nous y développons une méthodologie de collecte et d’analyse adaptée aux spécificités des discours natifs du numérique (massivité, anonymat, volatilité, caractéristiques structurelles, etc.). Nous portons notre attention sur les forums de discussion comme environnements d’élaboration de ces discours et appliquons la méthodologie développée à une problématique sociale définie : celle de l’épidémie du VIH/SIDA au Viêt Nam. Ce terrain applicatif recouvre plusieurs enjeux de société : sanitaire et social, évolutions des moeurs, concurrence des discours. L’étude est complétée par l’analyse d’un corpus comparable de langue française, relevant des mêmes thématique, genre et discours que le corpus vietnamien, de manière à mettre en évidence les spécificités de contextes socioculturels distincts<br>The standard discourse produced by official organisations is confronted with the unofficial or informal discourse of the social web. Empowering people to express themselves results in a new balance of authority, when it comes to knowledge and changes the way people learn. Social web discourse is available to each and everyone and its size is growing fast, which opens up new fields for both humanities and social sciences to investigate. The latter, however, are not equipped to engage with such complex and little-analysed data. The aim of this dissertation is to investigate how far social web discourse can help supplement official discourse. In it we set out a method to collect and analyse data that is in line with the characteristics of a digital environment, namely data size, anonymity, transience, structure. We focus on forums, where such discourse is built, and test our method on a specific social issue, ie the HIV/AIDS epidemic in Vietnam. This field of investigation encompasses several related questions that have to do with health, society, the evolution of morals, the mismatch between different kinds of discourse. Our study is also grounded in the analysis of a comparable French corpus dealing with the same topic, whose genre and discourse characteristics are equivalent to those of the Vietnamese one: this two-pronged research highlights the specific features of different socio-cultural environments

APA, Harvard, Vancouver, ISO, and other styles

47

Johl, Satirenjit Kaur. "Corporate entrepreneurship and corporate governance : an empirical analysis." Thesis, University of Nottingham, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.430642.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Alexandrou, George A. "Wealth and earnings implications of corporate divestments : an empirical analysis of stock returns and analysts' forecasts of earnings." Thesis, City University London, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.271108.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Arndt, Stephanie, Gunnar Gaitzsch, Carsten Gnauck, et al. "The Relation between Corporate Economic and Corporate Environmental Performance." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-38454.

Full text

Abstract:

For almost 40 years researchers have been trying to identify the relationship between corporate environmental and corporate economic performance. Neither theoretical debate nor empirical studies investigating the relationship show conclusive results. Within a field research seminar at Technische Universität Dresden, nine students conducted a meta-analysis of 124 studies to assess different aspects of the relationship between corporate economic and corporate environmental performance. In the first part of our paper, we analyze and present the theoretical background based on a review of literature. In the second part, we test for empirical evidence. At first, the conceptual frameworks and measurement methods for corporate economic and corporate environmental performance are discussed. We also look at the impact of environmental performance on shareholder value. Thereafter, we examine the influence of time, industries and publication bias. In conclusion, our research indicates that the quality of journals merits further examination to improve results.

APA, Harvard, Vancouver, ISO, and other styles

50

Do, Thi Thu Trang. "Etude de la concession dans une perspective contrastive français - vietnamien à partir de corpus oraux." Thesis, Orléans, 2016. http://www.theses.fr/2016ORLE1152/document.

Full text

Abstract:

Cette thèse étudie l'expression de la concession l'oral dans une perspective contrastive français - vietnamien. A partir d'un corpus d'émissions radiophoniques en français et ensuivant trois approches complémentaires - linguistique, logique et interactionnelle -, le fonctionnement et les caractéristiques des concessions ont été analysés afin de les classer par catégories et d'en proposer un modèle d'expression. Les concessions en vietnamien ont été étudiées à partir de données homologues afin de mettre en évidence les similitudes et les différences<br>The aim of this PhD is a linguistic analysis of the concessive clause in a contrastive French/Vietnamese perspective based on a corpus of radio programs. Three complementary approaches (linguistic, logical and interactional) are used to define the functions and the properties of concessive clauses in French as in Vietnamese in order to observe the similarities and the differences toward a modelling

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!