Dissertations / Theses: 'Language resource'

1

Cardillo, Eileen Robin. "Resource limitation approaches to language comprehension." Thesis, University of Oxford, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Loza, Christian. "Cross Language Information Retrieval for Languages with Scarce Resources." Thesis, University of North Texas, 2009. https://digital.library.unt.edu/ark:/67531/metadc12157/.

Full text

Abstract:

Our generation has experienced one of the most dramatic changes in how society communicates. Today, we have online information on almost any imaginable topic. However, most of this information is available in only a few dozen languages. In this thesis, I explore the use of parallel texts to enable cross-language information retrieval (CLIR) for languages with scarce resources. To build the parallel text I use the Bible. I evaluate different variables and their impact on the resulting CLIR system, specifically: (1) the CLIR results when using different amounts of parallel text; (2) the role of paraphrasing on the quality of the CLIR output; (3) the impact on accuracy when translating the query versus translating the collection of documents; and finally (4) how the results are affected by the use of different dialects. The results show that all these variables have a direct impact on the quality of the CLIR system.

APA, Harvard, Vancouver, ISO, and other styles

3

Jansson, Herman. "Low-resource Language Question Answering Systemwith BERT." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42317.

Full text

Abstract:

The complexity for being at the forefront regarding information retrieval systems are constantly increasing. Recent technology of natural language processing called BERT has reached superhuman performance in high resource languages for reading comprehension tasks. However, several researchers has stated that multilingual model’s are not enough for low-resource languages, since they are lacking a thorough understanding of those languages. Recently, a Swedish pre-trained BERT model has been introduced which is trained on significantly more Swedish data than the multilingual models currently available. This study compares both multilingual and Swedish monolingual inherited BERT model’s for question answering utilizing both a English and a Swedish machine translated SQuADv2 data set during its fine-tuning process. The models are evaluated with SQuADv2 benchmark and within a implemented question answering system built upon the classical retriever-reader methodology. This study introduces a naive and more robust prediction method for the proposed question answering system as well finding a sweet spot for each individual model approach integrated into the system. The question answering system is evaluated and compared against another question answering library at the leading edge within the area, applying a custom crafted Swedish evaluation data set. The results show that the fine-tuned model based on the Swedish pre-trained model and the Swedish SQuADv2 data set were superior in all evaluation metrics except speed. The comparison between the different systems resulted in a higher evaluation score but a slower prediction time for this study’s system.

APA, Harvard, Vancouver, ISO, and other styles

4

Chavula, Catherine. "Using language similarities in retrieval for resource scarce languages: a study of several southern Bantu languages." Doctoral thesis, Faculty of Science, 2021. http://hdl.handle.net/11427/33614.

Full text

Abstract:

Most of the Web is published in languages that are not accessible to many potential users who are only able to read and understand their local languages. Many of these local languages are Resources Scarce Languages (RSLs) and lack the necessary resources, such as machine translation tools, to make available content more accessible. State of the art preprocessing tools and retrieval methods are tailored for Web dominant languages and, accordingly, documents written in RSLs are lowly ranked and difficult to access in search results, resulting in a struggling and frustrating search experience for speakers of RSLs. In this thesis, we propose the use of language similarities to match, re-rank and return search results written in closely related languages to improve the quality of search results and user experience. We also explore the use of shared morphological features to build multilingual stemming tools. Focusing on six Bantu languages spoken in Southeastern Africa, we first explore how users would interact with search results written in related languages. We conduct a user study, examining the usefulness and user preferences for ranking search results with different levels of intelligibility, and the types of emotions users experience when interacting with such results. Our results show that users can complete tasks using related language search results but, as intelligibility decreases, more users struggle to complete search tasks and, consequently, experience negative emotions. Concerning ranking, we find that users prefer that relevant documents be ranked higher, and that intelligibility be used as a secondary criterion. Additionally, we use a User-Centered Design (UCD) approach to investigate enhanced interface features that could assist users to effectively interact with such search results. Usability evaluation of our designed interface scored 86% using the System Usability Scale (SUS). We then investigate whether ranking models that integrate relevance and intelligibility features would improve retrieval effectiveness. We develop these features by drawing from traditional Information Retrieval (IR) models and linguistics studies, and employ Learning To Rank (LTR) and unsupervised methods. Our evaluation shows that models that use both relevance and intelligibility feature(s) have better performance when compared to models that use relevance features only. Finally, we propose and evaluate morphological processing approaches that include multilingual stemming, using rules derived from common morphological features across Bantu family of languages. Our evaluation of the proposed stemming approach shows that its performance is competitive on queries that use general terms. Overall, the thesis provides evidence that considering and matching search results written in closely related languages, as well as ranking and presenting them appropriately, improves the quality of retrieval and user experience for speakers of RSLs.

APA, Harvard, Vancouver, ISO, and other styles

5

Kolak, Okan. "Rapid resource transfer for multilingual natural language processing." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/3182.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Dept. of Linguistics. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Yuan Ph D. Massachusetts Institute of Technology. "Transfer learning for low-resource natural language analysis." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108847.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 131-142).
Expressive machine learning models such as deep neural networks are highly effective when they can be trained with large amounts of in-domain labeled training data. While such annotations may not be readily available for the target task, it is often possible to find labeled data for another related task. The goal of this thesis is to develop novel transfer learning techniques that can effectively leverage annotations in source tasks to improve performance of the target low-resource task. In particular, we focus on two transfer learning scenarios: (1) transfer across languages and (2) transfer across tasks or domains in the same language. In multilingual transfer, we tackle challenges from two perspectives. First, we show that linguistic prior knowledge can be utilized to guide syntactic parsing with little human intervention, by using a hierarchical low-rank tensor method. In both unsupervised and semi-supervised transfer scenarios, this method consistently outperforms state-of-the-art multilingual transfer parsers and the traditional tensor model across more than ten languages. Second, we study lexical-level multilingual transfer in low-resource settings. We demonstrate that only a few (e.g., ten) word translation pairs suffice for an accurate transfer for part-of-speech (POS) tagging. Averaged across six languages, our approach achieves a 37.5% improvement over the monolingual top-performing method when using a comparable amount of supervision. In the second monolingual transfer scenario, we propose an aspect-augmented adversarial network that allows aspect transfer over the same domain. We use this method to transfer across different aspects in the same pathology reports, where traditional domain adaptation approaches commonly fail. Experimental results demonstrate that our approach outperforms different baselines and model variants, yielding a 24% gain on this pathology dataset.
by Yuan Zhang.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

7

Kimutis, Michelle T. "Bilingual Education: A Resource for Teachers." Miami University Honors Theses / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=muhonors1302698144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zouhair, Taha. "Automatic Speech Recognition for low-resource languages using Wav2Vec2 : Modern Standard Arabic (MSA) as an example of a low-resource language." Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37702.

Full text

Abstract:

The need for fully automatic translation at DigitalTolk, a Stockholm-based company providing translation services, leads to exploring Automatic Speech Recognition as a first step for Modern Standard Arabic (MSA). Facebook AI recently released a second version of its Wav2Vec models, dubbed Wav2Vec 2.0, which uses deep neural networks and provides several English pretrained models along with a multilingual model trained in 53 different languages, referred to as the Cross-Lingual Speech Representation (XLSR-53). The small English and the XLSR-53 pretrained models are tested, and the results stemming from them discussed, with the Arabic data from Mozilla Common Voice. In this research, the small model did not yield any results and may have needed more unlabelled data to train whereas the large model proved to be successful in predicting the audio recordings in Arabic and a Word Error Rate of 24.40% was achieved, an unprecedented result. The small model turned out to be not suitable for training especially on languages other than English and where the unlabelled data is not enough. On the other hand, the large model gave very promising results despite the low amount of data. The large model should be the model of choice for any future training that needs to be done on low resource languages such as Arabic.

APA, Harvard, Vancouver, ISO, and other styles

9

Packham, Sean. "Crowdsourcing a text corpus for a low resource language." Master's thesis, University of Cape Town, 2016. http://hdl.handle.net/11427/20436.

Full text

Abstract:

Low resourced languages, such as South Africa's isiXhosa, have a limited number of digitised texts, making it challenging to build language corpora and the information retrieval services, such as search and translation that depend on them. Researchers have been unable to assemble isiXhosa corpora of sufficient size and quality to produce working machine translation systems and it has been acknowledged that there is little to know training data and sourcing translations from professionals can be a costly process. A crowdsourcing translation game which paid participants for their contributions was proposed as a solution to source original and relevant parallel corpora for low resource languages such as isiXhosa. The objectives of this dissertation is to report on the four experiments that were conducted to assess user motivation and contribution quantity under various scenarios using the developed crowdsourcing translation game. The first experiment was a pilot study to test a custom built system and to find out if social network users would volunteer to participate in a translation game for free. The second experiment tested multiple payment schemes with users from the University of Cape Town. The schemes rewarded users with consistent, increasing or decreasing amounts for subsequent contributions. Experiment 3 tested whether the same users from Experiment 2 would continue contributing if payments were taken away. The last experiment tested a payment scheme that did not offer a direct and guaranteed reward. Users were paid based on their leaderboard placement and only a limited number of the top leaderboard spots were allocated rewards. From experiment 1 and 3 we found that people do not volunteer without financial incentives, experiment 2 and 4 showed that people want increased rewards when putting in increased effort , experiment 3 also showed that people will not continue contributing if the financial incentives are taken away and experiment 4 also showed that the possibility of incentives is as attractive as offering guaranteed incentives .

APA, Harvard, Vancouver, ISO, and other styles

10

Louvan, Samuel. "Low-Resource Natural Language Understanding in Task-Oriented Dialogue." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/333813.

Full text

Abstract:

Task-oriented dialogue (ToD) systems need to interpret the user's input to understand the user's needs (intent) and corresponding relevant information (slots). This process is performed by a Natural Language Understanding (NLU) component, which maps the text utterance into a semantic frame representation, involving two subtasks: intent classification (text classification) and slot filling (sequence tagging). Typically, new domains and languages are regularly added to the system to support more functionalities. Collecting domain-specific data and performing fine-grained annotation of large amounts of data every time a new domain and language is introduced can be expensive. Thus, developing an NLU model that generalizes well across domains and languages with less labeled data (low-resource) is crucial and remains challenging. This thesis focuses on investigating transfer learning and data augmentation methods for low-resource NLU in ToD. Our first contribution is a study of the potential of non-conversational text as a source for transfer. Most transfer learning approaches assume labeled conversational data as the source task and adapt the NLU model to the target task. We show that leveraging similar tasks from non-conversational text improves performance on target slot filling tasks through multi-task learning in low-resource settings. Second, we propose a set of lightweight augmentation methods that apply data transformation on token and sentence levels through slot value substitution and syntactic manipulation. Despite its simplicity, the performance is comparable to deep learning-based augmentation models, and it is effective on six languages on NLU tasks. Third, we investigate the effectiveness of domain adaptive pre-training for zero-shot cross-lingual NLU. In terms of overall performance, continued pre-training in English is effective across languages. This result indicates that the domain knowledge learned in English is transferable to other languages. In addition to that, domain similarity is essential. We show that intermediate pre-training data that is more similar – in terms of data distribution – to the target dataset yields better performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Feldman, Anna. "Portable language technology a resource-light approach to morpho-syntactic tagging /." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1153344391.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Saunders, Ryan C. "Beyond media literacy in the language arts classroom [electronic resource] /." Online pdf file accessible through the World Wide Web, 2010. http://archives.evergreen.edu/masterstheses/Accession89-10MIT/Saunders_RCMIT2010.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Jarmasz, Mario. ""Roget's Thesaurus" as a lexical resource for natural language processing." Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26493.

Full text

Abstract:

This dissertation presents an implementation of an electronic lexical knowledge base that uses the 1987 Penguin edition of Roget's Thesaurus as the source for its lexical material---the first implementation of a computerized Roget's to use an entire current edition. It explains the steps necessary for taking a machine-readable file and transforming it into a tractable system. Roget's organization is studied in detail and contrasted with WordNet's. We show two applications of the computerized Thesaurus: computing semantic similarity between words and phrases, and building lexical chains in a text. The experiments are performed using well-known benchmarks and the results are compared to those of other systems that use Roget's, WordNet and statistical techniques. Roget's has turned out to be an excellent resource for measuring semantic similarity; lexical chains are easily built but more difficult to evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be combined.

APA, Harvard, Vancouver, ISO, and other styles

14

Chen, I.-Fan. "Resource-dependent acoustic and language modeling for spoken keyword search." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54919.

Full text

Abstract:

In this dissertation, three research directions were explored to alleviate two major issues, i.e., the use of incorrect models and training/test condition mismatches, in the modeling frameworks of modern spoken keyword search (KWS) systems. Each of the three research directions, which include (i) data-efficient training processes, (ii) system optimization objectives, and (iii) data augmentation, utilizes different types and amounts of training resources in different ways to ameliorate the two issues of acoustic and language modeling in modern KWS systems. To be more specific, resource-dependent keyword modeling, keyword-boosted sMBR (state-level minimum Bayes risk) training, and multilingual acoustic modeling are proposed and investigated for acoustic modeling in this research. For language modeling, keyword-aware language modeling, discriminative keyword-aware language modeling, and web text augmented language modeling are presented and discussed. The dissertation provides a comprehensive collection of solutions and strategies to the acoustic and language modeling problems in KWS. It also offers insights into the realization of good-performance KWS systems. Experimental results show that the data-efficient training process and data augmentation are the two directions providing the most prominent performance improvement for KWS systems. While modifying system optimization objectives provides smaller yet consistent performance enhancement in KWS systems with different configurations. The effects of the proposed acoustic and language modeling approaches in the three directions are also shown to be additive and can be combined to further improve the overall KWS system performance.

APA, Harvard, Vancouver, ISO, and other styles

15

Samson, Juan Sarah Flora. "Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM061/document.

Full text

Abstract:

Les langues en Malaisie meurent à un rythme alarmant. A l'heure actuelle, 15 langues sont en danger alors que deux langues se sont éteintes récemment. Une des méthodes pour sauvegarder les langues est de les documenter, mais c'est une tâche fastidieuse lorsque celle-ci est effectuée manuellement.Un système de reconnaissance automatique de la parole (RAP) serait utile pour accélérer le processus de documentation de ressources orales. Cependant, la construction des systèmes de RAP pour une langue cible nécessite une grande quantité de données d'apprentissage comme le suggèrent les techniques actuelles de l'état de l'art, fondées sur des approches empiriques. Par conséquent, il existe de nombreux défis à relever pour construire des systèmes de transcription pour les langues qui possèdent des quantités de données limitées.L'objectif principal de cette thèse est d'étudier les effets de l'utilisation de données de langues étroitement liées, pour construire un système de RAP pour les langues à faibles ressources en Malaisie. Des études antérieures ont montré que les méthodes inter-lingues et multilingues pourraient améliorer les performances des systèmes de RAP à faibles ressources. Dans cette thèse, nous essayons de répondre à plusieurs questions concernant ces approches: comment savons-nous si une langue est utile ou non dans un processus d'apprentissage trans-lingue ? Comment la relation entre la langue source et la langue cible influence les performances de la reconnaissance de la parole ? La simple mise en commun (pooling) des données d'une langue est-elle une approche optimale ?Notre cas d'étude est l'iban, une langue peu dotée de l'île de Bornéo. Nous étudions les effets de l'utilisation des données du malais, une langue locale dominante qui est proche de l'iban, pour développer un système de RAP pour l'iban, sous différentes contraintes de ressources. Nous proposons plusieurs approches pour adapter les données du malais afin obtenir des modèles de prononciation et des modèles acoustiques pour l'iban.Comme la contruction d'un dictionnaire de prononciation à partir de zéro nécessite des ressources humaines importantes, nous avons développé une approche semi-supervisée pour construire rapidement un dictionnaire de prononciation pour l'iban. Celui-ci est fondé sur des techniques d'amorçage, pour améliorer la correspondance entre les données du malais et de l'iban.Pour augmenter la performance des modèles acoustiques à faibles ressources, nous avons exploré deux techniques de modélisation : les modèles de mélanges gaussiens à sous-espaces (SGMM) et les réseaux de neurones profonds (DNN). Nous avons proposé, dans ce cadre, des méthodes de transfert translingue pour la modélisation acoustique permettant de tirer profit d'une grande quantité de langues “proches” de la langue cible d'intérêt. Les résultats montrent que l'utilisation de données du malais est bénéfique pour augmenter les performances des systèmes de RAP de l'iban. Par ailleurs, nous avons également adapté les modèles SGMM et DNN au cas spécifique de la transcription automatique de la parole non native (très présente en Malaisie). Nous avons proposé une approche fine de fusion pour obtenir un SGMM multi-accent optimal. En outre, nous avons développé un modèle DNN spécifique pour la parole accentuée. Les deux approches permettent des améliorations significatives de la précision du système de RAP. De notre étude, nous observons que les modèles SGMM et, de façon plus surprenante, les modèles DNN sont très performants sur des jeux de données d'apprentissage en quantité limités
Languages in Malaysia are dying in an alarming rate. As of today, 15 languages are in danger while two languages are extinct. One of the methods to save languages is by documenting languages, but it is a tedious task when performed manually.Automatic Speech Recognition (ASR) system could be a tool to help speed up the process of documenting speeches from the native speakers. However, building ASR systems for a target language requires a large amount of training data as current state-of-the-art techniques are based on empirical approach. Hence, there are many challenges in building ASR for languages that have limited data available.The main aim of this thesis is to investigate the effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia. Past studies have shown that cross-lingual and multilingual methods could improve performance of low-resource ASR. In this thesis, we try to answer several questions concerning these approaches: How do we know which language is beneficial for our low-resource language? How does the relationship between source and target languages influence speech recognition performance? Is pooling language data an optimal approach for multilingual strategy?Our case study is Iban, an under-resourced language spoken in Borneo island. We study the effects of using data from Malay, a local dominant language which is close to Iban, for developing Iban ASR under different resource constraints. We have proposed several approaches to adapt Malay data to obtain pronunciation and acoustic models for Iban speech.Building a pronunciation dictionary from scratch is time consuming, as one needs to properly define the sound units of each word in a vocabulary. We developed a semi-supervised approach to quickly build a pronunciation dictionary for Iban. It was based on bootstrapping techniques for improving Malay data to match Iban pronunciations.To increase the performance of low-resource acoustic models we explored two acoustic modelling techniques, the Subspace Gaussian Mixture Models (SGMM) and Deep Neural Networks (DNN). We performed cross-lingual strategies using both frameworks for adapting out-of-language data to Iban speech. Results show that using Malay data is beneficial for increasing the performance of Iban ASR. We also tested SGMM and DNN to improve low-resource non-native ASR. We proposed a fine merging strategy for obtaining an optimal multi-accent SGMM. In addition, we developed an accent-specific DNN using native speech data. After applying both methods, we obtained significant improvements in ASR accuracy. From our study, we observe that using SGMM and DNN for cross-lingual strategy is effective when training data is very limited

APA, Harvard, Vancouver, ISO, and other styles

16

Baayen, Harald R. "Resource requirements for neo-generative modeling in (psycho)linguistics." Universität Potsdam, 2012. http://opus.kobv.de/ubp/volltexte/2012/6231/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Smith, Patrick Henry. "Community as resource for minority language learning: A case study of Spanish-English dual-language schooling." Diss., The University of Arizona, 2000. http://hdl.handle.net/10150/284136.

Full text

Abstract:

This study examines the role of community-based, minority language resources in dual language schooling. A rapidly growing form of bilingual education, dual language programs involve the co-instruction of children from language majority and language minority backgrounds via the languages of both groups. In contrast to studies of English language development, this study is concerned with Spanish language development by children from English-speaking and Spanish-speaking homes. Using a case study design, the study draws on theoretical frameworks from the fields of language planning, language revitalization, and funds of knowledge to propose that dual language programs may support minority language acquisition by incorporating local language resources--linguistic funds of knowledge--to counter the hegemony of English that undermines additive bilingual efforts in many schools. By showing how historical conditions associated with English-only schooling and punitive approaches to use of Spanish in barrio schools and the legacy of local bilingual education pioneers have contributed to the development of a dual language program, it demonstrates the continued importance of past practices in present dual language planning. The study triangulates ethnographic data from participant observation in classrooms, literacy instruction, and other school domains, teacher, parent, and community interviews, and document and archival analysis. These data, along with findings of changing patterns of language dominance in the case study community, indicate that the minority language resources most immediately available--in the form of fluent bilingual elders and recent immigrants from Mexico--are less likely to be incorporated into planned curriculum than the knowledge and experiences of language majority parents. This pattern is a consequence of the social distance between educators and barrio families, the ambivalence of Mexican American parents and school staff toward the use of non-standard varieties of Spanish in schooling, and the need for greater awareness of language shift. Based on these findings, the study proposes that dual language programs move beyond efforts to increase use of the minority language as language of instruction. Instead, the study suggests, programs should consider practices that tap the linguistic funds of knowledge residing in the vital language minority communities in which schools are embedded.

APA, Harvard, Vancouver, ISO, and other styles

18

Feldman, Anna. "Portable language technology: a resource-light approach to morpho-syntactic taggin." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1153344391.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Heyns, Danielle. "Providing a web-based information resource for Afrikaans first language teachers." Pretoria : [s.n.], 2002. http://upetd.up.ac.za/thesis/available/etd-04032003-142408.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Mairidan, Wushouer. "Pivot-Based Bilingual Dictionary Creation for Low-Resource Languages." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/199441.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Martin, Terrence Lance. "Towards improved speech recognition for resource poor languages." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/35771/1/Terrence_Martin_Thesis.pdf.

Full text

Abstract:

In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.

APA, Harvard, Vancouver, ISO, and other styles

22

Lakew, Surafel Melaku. "Multilingual Neural Machine Translation for Low Resource Languages." Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/257906.

Full text

Abstract:

Machine Translation (MT) is the task of mapping a source language to a target language. The recent introduction of neural MT (NMT) has shown promising results for high-resource language, however, poorly performing for low-resource language (LRL) settings. Furthermore, the vast majority of the 7, 000+ languages around the world do not have parallel data, creating a zero-resource language (ZRL) scenario. In this thesis, we present our approach to improving NMT for LRL and ZRL, leveraging a multilingual NMT modeling (M-NMT), an approach that allows building a single NMT to translate across multiple source and target languages. This thesis begins by i) analyzing the effectiveness of M-NMT for LRL and ZRL translation tasks, spanning two NMT modeling architectures (Recurrent and Transformer), ii) presents a self-learning approach for improving the zero-shot translation directions of ZRLs, iii) proposes a dynamic transfer-learning approach from a pre-trained (parent) model to a LRL (child) model by tailoring to the vocabulary entries of the latter, iv) extends M-NMT to translate from a source language to specific language varieties (e.g. dialects), and finally, v) proposes an approach that can control the verbosity of an NMT model output. Our experimental findings show the effectiveness of the proposed approaches in improving NMT of LRLs and ZRLs.

APA, Harvard, Vancouver, ISO, and other styles

23

Lakew, Surafel Melaku. "Multilingual Neural Machine Translation for Low Resource Languages." Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/257906.

Full text

Abstract:

Machine Translation (MT) is the task of mapping a source language to a target language. The recent introduction of neural MT (NMT) has shown promising results for high-resource language, however, poorly performing for low-resource language (LRL) settings. Furthermore, the vast majority of the 7, 000+ languages around the world do not have parallel data, creating a zero-resource language (ZRL) scenario. In this thesis, we present our approach to improving NMT for LRL and ZRL, leveraging a multilingual NMT modeling (M-NMT), an approach that allows building a single NMT to translate across multiple source and target languages. This thesis begins by i) analyzing the effectiveness of M-NMT for LRL and ZRL translation tasks, spanning two NMT modeling architectures (Recurrent and Transformer), ii) presents a self-learning approach for improving the zero-shot translation directions of ZRLs, iii) proposes a dynamic transfer-learning approach from a pre-trained (parent) model to a LRL (child) model by tailoring to the vocabulary entries of the latter, iv) extends M-NMT to translate from a source language to specific language varieties (e.g. dialects), and finally, v) proposes an approach that can control the verbosity of an NMT model output. Our experimental findings show the effectiveness of the proposed approaches in improving NMT of LRLs and ZRLs.

APA, Harvard, Vancouver, ISO, and other styles

24

Whale, Susan Gaye. "Using language as a resource: strategies to teach mathematics in multilingual classes." Thesis, Nelson Mandela Metropolitan University, 2012. http://hdl.handle.net/10948/1669.

Full text

Abstract:

South Africa is a complex multilingual country. In the majority of schools in the Eastern Cape, a province in South Africa, the teachers and learners share the same home language, isiXhosa, but teach and learn mathematics in English. The purpose of this study was to encourage teachers to use the home language as a resource to teach mathematics in multilingual classes. The study follows a mixed method design, using both qualitative and quantitative data. Qualitative data were collected from a survey and poetry, which teachers crafted, in which they highlighted their perceptions about language in their lives. They also reflected on their practices and submitted pieces of contemplative writing. Quantitative data were collected from participating teachers who administered a pre-test to their learners as well as a post- test approximately nine months later after conducting an intervention. The results showed that where strategies, such as the implementation of exploratory talk and code switching which used language as a resource, had been introduced mathematical reasoning improved and classroom climate became more positive. The learners’ lack of confidence in being able to express their reasoning in English was prevalent throughout the reflective writing. By enabling learners to use isiXhosa in discussions the teachers felt that the learners gained in both confidence and mathematical understanding. This study has demonstrated that using the learners’ and teachers’ home language unlocks doors to communication and spotlights mathematical reasoning, but there is still an urgency to encourage learners to become fluent in Mathematical English. It is important to note that a positive classroom climate is essential for learners to build confidence and to encourage them to attempt to formulate sentences in English - to start on the journey from informal to formal usage of language as advocated by Setati and Adler (2001:250). My main conclusion is that an intervention that develops exploratory talk by using language as a resource can improve learners’ mathematical reasoning. I wish to emphasise that I am not advocating teaching mathematics in isiXhosa only, but the research has shown the advantages of using the home language as a resource together with English in Eastern Cape multilingual mathematics classes. Learners need to be able to express themselves in English, written and spoken, in order to achieve mathematically. This study therefore shows that teachers can gauge their learners’ improvement in mathematical reasoning after an intervention that develops exploratory talk in class by using the home language as a resource.

APA, Harvard, Vancouver, ISO, and other styles

25

Neme, Alexis. "An arabic language resource for computational morphology based on the semitic model." Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC2013.

Full text

Abstract:

La morphologie de la langue arabe est riche, complexe, et hautement flexionnelle. Nous avons développé une nouvelle approche pour la morphologie traditionnelle arabe destinés aux traitements automatiques de l’arabe écrit. Cette approche permet de formaliser plus simplement la morphologie sémitique en utilisant Unitex, une suite logicielle fondée sur des ressources lexicales pour l'analyse de corpus. Pour les verbes (Neme, 2011), j’ai proposé une taxonomie flexionnelle qui accroît la lisibilité du lexique et facilite l’encodage, la correction et la mise-à-jour par les locuteurs et linguistes arabes. La grammaire traditionnelle définit les classes verbales par des schèmes et des sous-classes par la nature des lettres de la racine. Dans ma taxonomie, les classes traditionnelles sont réutilisées, et les sous-classes sont redéfinies plus simplement. La couverture lexicale de cette ressource pour les verbes dans un corpus test est de 99 %. Pour les noms et les adjectifs (Neme, 2013) et leurs pluriels brisés, nous sommes allés plus loin dans l’adaptation de la morphologie traditionnelle. Tout d’abord, bien que cette tradition soit basée sur des règles dérivationnelles, nous nous sommes restreints aux règles exclusivement flexionnelles. Ensuite, nous avons gardé les concepts de racine et de schème, essentiels au modèle sémitique. Pourtant, notre innovation réside dans l’inversion du modèle traditionnel de racine-et-schème au modèle schème-et-racine, qui maintient concis et ordonné l’ensemble des classes de modèle et de sous-classes de racine. Ainsi, nous avons élaboré une taxonomie pour le pluriel brisé contenant 160 classes flexionnelles, ce qui simplifie dix fois l’encodage du pluriel brisé. Depuis, j’ai élaboré des ressources complètes pour l’arabe écrit. Ces ressources sont décrites dans Neme et Paumier (2019). Ainsi, nous avons complété ces taxonomies par des classes suffixées pour les pluriels réguliers, adverbes, et d’autres catégories grammaticales afin de couvrir l’ensemble du lexique. En tout, nous obtenons environ 1000 classes de flexion implémentées au moyen de transducteurs concatenatifs et non-concatenatifs. A partir de zéro, j’ai créé 76000 lemmes entièrement voyellisés, et chacun est associé à une classe flexionnelle. Ces lemmes sont fléchis en utilisant ces 1000 FST, produisant un lexique entièrement fléchi de plus 6 millions de formes. J’ai étendu cette ressource entièrement fléchie à l’aide de grammaires d’agglutination pour identifier les mots composés jusqu’à 5 segments, agglutinés autour d’un verbe, d’un nom, d’un adjectif ou d’une particule. Les grammaires d’agglutination étendent la reconnaissance à plus de 500 millions de formes de mots valides, partiellement ou entièrement voyelles. La taille de fichier texte généré est de 340 mégaoctets (UTF-16). Il est compressé en 11 mégaoctets avant d’être chargé en mémoire pour la recherche rapide (fast lookup). La génération, la compression et la minimisation du lexique prennent moins d’une minute sur un MacBook. Le taux de couverture lexical d’un corpus est supérieur à 99 %. La vitesse de tagger est de plus de 200 000 mots/s, si les ressources ont été pré-chargées en mémoire RAM. La précision et la rapidité de nos outils résultent de notre approche linguistique systématique et de l’adoption des meilleurs choix pratiques en matière de méthodes mathématiques et informatiques. La procédure de recherche est rapide parce que nous utilisons l’algorithme de minimisation d’automate déterministique acyclique (Revuz, 1992) pour comprimer le dictionnaire complet, et parce qu’il n’a que des chaînes constantes. La performance du tagger est le résultat des bons choix pratiques dans les technologies automates finis (FSA/FST) car toutes les formes fléchies calculées à l’avance pour une identification précise et pour tirer le meilleur parti de la compression et une recherche des mots déterministes et efficace
We developed an original approach to Arabic traditional morphology, involving new concepts in Semitic lexicology, morphology, and grammar for standard written Arabic. This new methodology for handling the rich and complex Semitic languages is based on good practices in Finite-State technologies (FSA/FST) by using Unitex, a lexicon-based corpus processing suite. For verbs (Neme, 2011), I proposed an inflectional taxonomy that increases the lexicon readability and makes it easier for Arabic speakers and linguists to encode, correct, and update it. Traditional grammar defines inflectional verbal classes by using verbal pattern-classes and root-classes. In our taxonomy, traditional pattern-classes are reused, and root-classes are redefined into a simpler system. The lexicon of verbs covered more than 99% of an evaluation corpus. For nouns and adjectives (Neme, 2013), we went one step further in the adaptation of traditional morphology. First, while this tradition is based on derivational rules, we found our description on inflectional ones. Next, we keep the concepts of root and pattern, which is the backbone of the traditional Semitic model. Still, our breakthrough lies in the reversal of the traditional root-and-pattern Semitic model into a pattern-and-root model, which keeps small and orderly the set of pattern classes and root sub-classes. I elaborated a taxonomy for broken plural containing 160 inflectional classes, which simplifies ten times the encoding of broken plural. Since then, I elaborated comprehensive resources for Arabic. These resources are described in Neme and Paumier (2019). To take into account all aspects of the rich morphology of Arabic, I have completed our taxonomy with suffixal inflexional classes for regular plurals, adverbs, and other parts of speech (POS) to cover all the lexicon. In all, I identified around 1000 Semitic and suffixal inflectional classes implemented with concatenative and non-concatenative FST devices.From scratch, I created 76000 fully vowelized lemmas, and each one is associated with an inflectional class. These lemmas are inflected by using these 1000 FSTs, producing a fully inflected lexicon with more than 6 million forms. I extended this fully inflected resource using agglutination grammars to identify words composed of up to 5 segments, agglutinated around a core inflected verb, noun, adjective, or particle. The agglutination grammars extend the recognition to more than 500 million valid delimited word forms, partially or fully vowelized. The flat file size of 6 million forms is 340 megabytes (UTF-16). It is compressed then into 11 Mbytes before loading to memory for fast retrieval. The generation, compression, and minimization of the full-form lexicon take less than one minute on a common Unix laptop. The lexical coverage rate is more than 99%. The tagger speed is 5000 words/second, and more than 200 000 words/s, if the resources are preloaded/resident in the RAM. The accuracy and speed of our tools result from our systematic linguistic approach and from our choice to embrace the best practices in mathematical and computational methods. The lookup procedure is fast because we use Minimal Acyclic Deterministic Finite Automaton (Revuz, 1992) to compress the full-form dictionary, and because it has only constant strings and no embedded rules. The breakthrough of our linguistic approach remains principally on the reversal of the traditional root-and-pattern Semitic model into a pattern-and-root model.Nonetheless, our computational approach is based on good practices in Finite-State technologies (FSA/FST) as all the full-forms were computed in advance for accurate identification and to get the best from the FSA compression for fast and efficient lookups

APA, Harvard, Vancouver, ISO, and other styles

26

Blair, James M. "Architectures for Real-Time Automatic Sign Language Recognition on Resource-Constrained Device." UNF Digital Commons, 2018. https://digitalcommons.unf.edu/etd/851.

Full text

Abstract:

Powerful, handheld computing devices have proliferated among consumers in recent years. Combined with new cameras and sensors capable of detecting objects in three-dimensional space, new gesture-based paradigms of human computer interaction are becoming available. One possible application of these developments is an automated sign language recognition system. This thesis reviews the existing body of work regarding computer recognition of sign language gestures as well as the design of systems for speech recognition, a similar problem. Little work has been done to apply the well-known architectural patterns of speech recognition systems to the domain of sign language recognition. This work creates a functional prototype of such a system, applying three architectures seen in speech recognition systems, using a Hidden Markov classifier with 75-90% accuracy. A thorough search of the literature indicates that no cloud-based system has yet been created for sign language recognition and this is the first implementation of its kind. Accordingly, there have been no empirical performance analyses regarding a cloud-based Automatic Sign Language Recognition (ASLR) system, which this research provides. The performance impact of each architecture, as well as the data interchange format, is then measured based on response time, CPU, memory, and network usage across an increasing vocabulary of sign language gestures. The results discussed herein suggest that a partially-offloaded client-server architecture, where feature extraction occurs on the client device and classification occurs in the cloud, is the ideal selection for all but the smallest vocabularies. Additionally, the results indicate that for the potentially large data sets transmitted for 3D gesture classification, a fast binary interchange protocol such as Protobuf has vastly superior performance to a text-based protocol such as JSON.

APA, Harvard, Vancouver, ISO, and other styles

27

McGowan, Jessica E. "Training and resource guide for beginning teachers of TESOL." Muncie, Ind. : Ball State University, 2009. http://cardinalscholar.bsu.edu/452.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Kennedy, Jacqueline. "Exploring the Truth and Reconciliation Commission report as a classroom resource." Master's thesis, University of Cape Town, 2006. http://hdl.handle.net/11427/11594.

Full text

Abstract:

Includes bibliographical references (leaves 128-137).
The Truth and Reconciliation Commission (TRC) (1998) report is a five-volume record of the voices of many victims and perpetrators of apartheid giving evidence of their experiences and suffering. It is encoded in sophisticated and often complex English, largely inaccessible to its public South African readership, most of whom use English as a First, Second of even Third Additional language. This study explores the nature and function of the discourse of the TRC Report as a contemporary historical text. The aim of this investigation is to establish the viability of introducing the TRC report into the classroom. It focuses on teenage learners. I examine the ability of Grade 10 and 11 English Primary Language and First Additional Language learners to read the original TRC text and a modified/simplified form of it.

APA, Harvard, Vancouver, ISO, and other styles

29

Plumlee, Marilyn Kay. "Making do with what you've got the use of prosody as a linguistic resource in second language narratives /." Thesis, University of Hawaii at Manoa, 2003. http://proquest.umi.com/pqdweb?index=0&did=765031661&SrchMode=1&sid=4&Fmt=2&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1233341179&clientId=23440.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Seng, Connie Swee Hoon. "Teachers' and students' perceptions of storytelling as a language teaching and learning resource." Thesis, University of Sheffield, 2017. http://etheses.whiterose.ac.uk/18652/.

Full text

Abstract:

Just as teachers’ perceptions of teaching and learning can influence their instructional practices, learners’ perceptions of teaching and learning can affect their motivation and achievement. Yet, research on the link between teachers’ perceptions and practices, or between learners’ perceptions and achievement is not always conclusive. The present study investigated 34 primary four teachers’ and 116 primary four students’ perceptions of storytelling as a language teaching and learning resource. A teacher questionnaire and a student questionnaire were administered to the teacher and student participants respectively. Interviews were also conducted with a subset of teacher and student participants to triangulate the questionnaire data. The questionnaire responses indicated that 98.3% of the student participants enjoyed listening to or reading stories, but relatively fewer students (81.0%) enjoyed acting out stories. The teacher questionnaire findings affirmed that all the teacher participants had a positive perception of storytelling. However, the positive perceptions did not translate into practice for all the 34 teacher participants. Nine of them did not attempt to infuse storytelling into their English language lessons to teach English language skills. They cited a number of reasons which suggested diffidence and a need for some professional development training and support from their school management (Principal, Vice-principal, or Head of the English department). Analysis of the teacher and student interview responses using content analysis and discourse analysis indicated that the interview data endorsed and elaborated on the teacher and student questionnaire responses. Both groups referred to language benefits as well as socio-emotional value that could be gleaned from storytelling activities. This should argue for storytelling to be considered by educators as a plausible pedagogical resource for primary school children.

APA, Harvard, Vancouver, ISO, and other styles

31

Dihangoane, Clifford Kgabo. "The experiences of teachers and learners of being multilingual in resource constrained environments." Diss., University of Pretoria, 2020. http://hdl.handle.net/2263/79228.

Full text

Abstract:

This study aimed to investigate the experiences of teachers and learners in being multilingual in resource-constrained environments where the LoLT is English. The sociocultural theory was used as the theoretical framework for this study. Given the factors involved, a mixed-method approach was favourable for collecting and analysing data. The qualitative data was collected through semi-structured interviews with five teachers and focus group discussions with an overall number of seventeen learners from two different township schools in Pretoria. The quantitative data was collected from the same schools through a survey questionnaire with a total number of forty-seven respondents. Inductive thematic analysis and descriptive statistics were utilised for analysis of data. The key recurring discoveries from the participants were overcrowding, language diversity, insufficient training received by teachers, limited educational resources, lack of LoLT proficiency from learners, and being restricted to use other languages by the school policy. Educational resources serve as a bridge to mediate inefficient language development in multilingual learners, the lack of resources hinders positive learning experiences. Experiences of the participants across languages are regarded as problematic instead of a resource. Although these experiences are known to act as barriers to the process of teaching and learning, this study contributes by providing a deep comprehension of multilingualism in the South African setting. It provides resolutions to enhance the use of multilingualism for effective teaching and learning.
Dissertation (MEd)--University of Pretoria, 2020.
Educational Psychology
MEd
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

32

Droutsas, Nikolaos. "Gamers with the Purpose of Language Resource Acquisition : Personas and Scenarios for the players of Language Resourcing Games-With-A-Purpose." Thesis, Uppsala universitet, Institutionen för speldesign, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445873.

Full text

Abstract:

Ethical, cheap, and scalable, purposeful games leverage player entertainment to incentivise contributors in language resourcing. However, discourse is scarce around the enjoyability of these games, whose playerbases are divided between a tiny minority of reliable contributors and a vast majority of inconsistent contributors. This study aims to deepen the discourse around design possibilities tailored to the unevenly contributing playerbases of such games by building on player-reported data to create three engaging personas and narrative scenarios. Using Pruitt and Grudin’s way of weighing feature suitability in persona-focused design, social incentives and majority voting are indicated as the most and least prominent features, respectively. Indeed, the weight of the primary persona, representing 3.5% of the playerbase, is 72%, more than double the combined weight, 56%, of the remaining 96.5% of the playerbase. Sticking to the original definition of purposeful games is essential for any gaming approach to crowdsourced data collection to remain ethical, cheap, and scalable.

APA, Harvard, Vancouver, ISO, and other styles

33

Williams, A. Lynn. "Speech Disorders Resource Guide for Preschool Children." Digital Commons @ East Tennessee State University, 2002. https://www.amzn.com/0769300804.

Full text

Abstract:

Speech Disorders Resource Guide for Preschool Children provides detailed information about assessment, analysis and intervention methods pertaining to childhood speech disorders. Also covers intervention outcomes and treatment efficacy. A concise, easy-to-use format makes it an ideal clinical resource tool for students and clinicians.
https://dc.etsu.edu/etsu_books/1181/thumbnail.jpg

APA, Harvard, Vancouver, ISO, and other styles

34

Peters, Christy Smith. "Resource guide for guided reading." CSUSB ScholarWorks, 1999. https://scholarworks.lib.csusb.edu/etd-project/1854.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Kamper, Herman. "Unsupervised neural and Bayesian models for zero-resource speech processing." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/25432.

Full text

Abstract:

Zero-resource speech processing is a growing research area which aims to develop methods that can discover linguistic structure and representations directly from unlabelled speech audio. Such unsupervised methods would allow speech technology to be developed in settings where transcriptions, pronunciation dictionaries, and text for language modelling are not available. Similar methods are required for cognitive models of language acquisition in human infants, and for developing robotic applications that are able to automatically learn language in a novel linguistic environment. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. The claim of this thesis is that both top-down modelling (using knowledge of higher-level units to to learn, discover and gain insight into their lower-level constituents) as well as bottom-up modelling (piecing together lower-level features to give rise to more complex higher-level structures) are advantageous in tackling these two problems. The thesis is divided into three parts. The first part introduces a new autoencoder-like deep neural network for unsupervised frame-level representation learning. This correspondence autoencoder (cAE) uses weak top-down supervision from an unsupervised term discovery system that identifies noisy word-like terms in unlabelled speech data. In an intrinsic evaluation of frame-level representations, the cAE outperforms several state-of-the-art bottom-up and top-down approaches, achieving a relative improvement of more than 60% over the previous best system. This shows that the cAE is particularly effective in using top-down knowledge of longer-spanning patterns in the data; at the same time, we find that the cAE is only able to learn useful representations when it is initialized using bottom-up pretraining on a large set of unlabelled speech. The second part of the thesis presents a novel unsupervised segmental Bayesian model that segments unlabelled speech data and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types|the system essentially performs unsupervised speech recognition. In this approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this embedding space while jointly performing segmentation. We first evaluate the approach in a small-vocabulary multi-speaker connected digit recognition task, where we report unsupervised word error rates (WER) by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% WER, outperforming a previous HMM-based system by about 10% absolute. To achieve this performance, the acoustic word embedding function (which maps variable-duration segments to single vectors) is refined in a top-down manner by using terms discovered by the model in an outer loop of segmentation. The third and final part of the study extends the small-vocabulary system in order to handle larger vocabularies in conversational speech data. To our knowledge, this is the first full-coverage segmentation and clustering system that is applied to large-vocabulary multi-speaker data. To improve efficiency, the system incorporates a bottom-up syllable boundary detection method to eliminate unlikely word boundaries. We compare the system on English and Xitsonga datasets to several state-of-the-art baselines. We show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using features from the cAE (which incorporates both top-down and bottom-up learning). The system's discovered clusters are still less pure than those of two multi-speaker unsupervised term discovery systems, but provide far greater coverage. In summary, the different models and systems presented in this thesis show that both top-down and bottom-up modelling can improve representation learning, segmentation and clustering of unlabelled speech data.

APA, Harvard, Vancouver, ISO, and other styles

36

Al, Jallad Mohannad. "REA Business Modeling Language : Toward a REA based Domain Specific Visual Language." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-121295.

Full text

Abstract:

Resources Events Agents (REA) ontology is a profound business modeling ontology that was developed to define the architecture of accounting information systems. Nevertheless, REA did not manage to get the same attention as other business modeling ontologies. One reason of such abandon is the absence of a meaningful visual notation for the ontology, which has resulted in an abstruse ontology to non-academic audience. Another reason for this abandon is the fact that REA does not have a standard formal representation. This has resulted in a humble amount of researches which have focused on defining meta-models of the ontology while neglecting the wider purpose of REA-based information systems development. Consequently, the ontology was deviated away from its original purpose, and rather used in business schools. To solve the aforementioned issues, this research presents a Model Driven Development (MDD) technique in the form of a REA-based Domain Specific Visual Language (DSVL) that is implemented within a modeling and code generation editor. This effort was taken in order to answer the question of “How would a REA-DSVL based tool make the REA ontology implementable in the domain of information systems development?” In order to answer the research question, a design science methodology (DSRM) was implemented as the structure of this research. The DSRM was chosen because this research aims to develop three main artifacts. These are; a meta-model of REA, a visual notation of REA, and a REA-DSVL-based modeling and code generation tool. The first phase of the DSRM was to identify the problems which were mentioned earlier, followed by the requirements identification phase which drew the outline of the; meta-model, the visual notation, and the tool. After that, the development phase was conducted in order to develop the aforementioned artifacts. The editor was then demonstrated using a case study of a local company in Stockholm-Sweden. Finally, the resulted artifacts were evaluated based on the collected requirements and the results from the case study. Based on the analyses of the artifacts and the case study, this research was concluded with the result that a REA-based DSVL tool can help in boosting the planning and analysis phases of the software development lifecycle (SDLC). This is achieved by automating some of the conventional software planning and design tasks, which would lead to more accurate systems’ designs; thus, minimizing the time of the planning and design phases. And it can be achieved by abstracting the direct logic of REA through providing functionalities that help users from different backgrounds (academic and professional) to embrace a business modeling editor rather than an ontology; thus, attracting a wider users base for implementing REA.

APA, Harvard, Vancouver, ISO, and other styles

37

Ahmadniaye, Bosari Benyamin. "Reliable training scenarios for dealing with minimal parallel-resource language pairs in statistical machine translation." Doctoral thesis, Universitat Autònoma de Barcelona, 2017. http://hdl.handle.net/10803/461204.

Full text

Abstract:

La tesis trata sobre sistemas de traducción automática estadística (SMT) de alta calidad, para trabajar con pares de lenguajes con recursos paralelos mínimos, titulado “Reliable Training Scenarios for Dealing with Minimal Parallel-Resource Language Pairs in Statistical Machine Translation”. El desafío principal que abordamos en nuestro enfoque es la carencia de datos paralelos y este se enfrenta en diferentes escenarios. SMT es uno de los enfoques preferidos para traducción automática (MT), y se podrían detectar varias mejoras en este enfoque, específicamente en la calidad de salida en una serie de sistemas para pares de idiomas, desde los avances en potencia computacional, junto con la exploración llevada a cabo de nuevos métodos y algoritmos. Cuando reflexionamos sobre el desarrollo de sistemas SMT para muchos idiomas pares, el principal cuello de botella que encontraremos es la falta de datos paralelos de entrenamiento. Debido al hecho de que se requiere mucho tiempo y esfuerzo para crear estos corpus, están disponibles en cantidad, género e idioma limitados. Los modelos de SMT aprenden cómo podrían hacer la traducción a través del proceso de examen de un corpus paralelo bilingüe que contenga las oraciones alineadas con sus traducciones producidas por humanos. Sin embargo, la calidad de salida de los sistemas de SMT es depende de la disponibilidad de cantidades masivas de texto paralelo dentro de los idiomas de origen y destino. Por lo tanto, los recursos paralelos juegan un papel importante en la mejora de la calidad de los sistemas de SMT. Definimos la mínima configuración de los recursos paralelos de SMT que poseen solo pequeñas cantidades de datos paralelos, que también se puede apreciar en varios pares de idiomas. El rendimiento logrado por el mínimo recurso paralelo en SMT en el estado del arte es altamente apreciable, pero generalmente usan el texto monolingüe y no abordan fundamentalmente la escasez de entrenamiento de textos paralelos. Cuando creamos la ampliación en los datos de entrenamiento paralelos, sin proporcionar ningún tipo de garantía sobre la calidad de los pares de oraciones bilingües que se han generado recientemente, también aumentan las preocupaciones. Las limitaciones que surgen durante el entrenamiento de la SMT de recursos paralelos mínimos demuestran que los sistemas actuales son incapaces de producir resultados de traducción de alta calidad. En esta tesis, hemos propuesto dos escenarios, uno de “direct-bridge combination” y otro escenario de “round-trip training”. El primero se basa en la técnica de “bridge language”, mientras que el segundo se basa en el enfoque de “retraining”, para tratar con SMT de recursos paralelos mínimos. Nuestro objetivo principal para presentar el escenario de “direct-bridge combination” es que podamos acercarlo al rendimiento existente en el estado del arte. Este escenario se ha propuesto para maximizar la ganancia de información, eligiendo las partes apropiadas del sistema de traducción basado en “bridge” que no interfieran con el sistema de traducción directa en el que se confía más. Además, el escenario de “round trip training” ha sido propuesto para aprovechar la fácil disponibilidad del par de frases bilingües generadas para construir un sistema de SMT de alta calidad en un comportamiento iterativo, seleccionando el subconjunto de alta calidad de los pares de oraciones generados en el lado del objetivo, preparando sus oraciones adecuadas correspondientes de origen y juntándolas con los pares de oraciones originales para re-entrenar el sistema de SMT. Los métodos propuestos se evalúan intrínsecamente, y su comparación se realiza en base a los sistemas de traducción de referencia. También hemos llevado a cabo los experimentos en los escenarios propuestos antes mencionados con datos bilingües iniciales mínimos. Hemos demostrado la mejora en el rendimiento a través del uso de los métodos propuestos al construir sistemas de SMT de alta calidad sobre la línea de base que involucra a cada escenario.
The thesis is about the topic of high-quality Statistical Machine Translation (SMT) systems for working with minimal parallel-resource language pairs entitled “Reliable Training Scenarios for Dealing with Minimal Parallel-Resource Language Pairs in Statistical Machine Translation”. Then main challenge we targeted in our approaches is parallel data scarcity, and this challenge is faced in different solution scenarios. SMT is one of the preferred approaches to Machine Translation (MT), and various improvements could be detected in this approach, specifically in the output quality in a number of systems for language pairs since the advances in computational power, together with the exploration of new methods and algorithms have been made. When we ponder over the development of SMT systems for many language pairs, the major bottleneck that we will find is the lack of training parallel data. Due to the fact that lots of time and effort is required to create these corpora, they are available in limited quantity, genre, and language. SMT models learn that how they could do translation through the process of examining a bilingual parallel corpus that contains the sentences aligned with their human-produced translations. However, the output quality of SMT systems is heavily dependent on the availability of massive amounts of parallel text within the source and target languages. Hence, an important role is played by the parallel resources so that the quality of SMT systems could be improved. We define minimal parallel-resource SMT settings possess only small amounts of parallel data, which can also be seen in various pairs of languages. The performance achieved by current state-of-the-art minimal parallel-resource SMT is highly appreciable, but they usually use the monolingual text and do not fundamentally address the shortage of parallel training text. Creating enlargement in the parallel training data without providing any sort of guarantee on the quality of the bilingual sentence pairs that have been newly generated, is also raising concerns. The limitations that emerge during the training of the minimal parallel- resource SMT prove that the current systems are incapable of producing the high- quality translation output. In this thesis, we have proposed the “direct-bridge combination” scenario as well as the “round-trip training” scenario, that the former is based on bridge language technique while the latter one is based on retraining approach, for dealing with minimal parallel-resource SMT systems. Our main aim for putting forward the direct-bridge combination scenario is that we might bring it closer to state-of-the-art performance. This scenario has been proposed to maximize the information gain by choosing the appropriate portions of the bridge-based translation system that do not interfere with the direct translation system which is trusted more. Furthermore, the round-trip training scenario has been proposed to take advantage of the readily available generated bilingual sentence pairs to build high-quality SMT system in an iterative behavior; by selecting high- quality subset of generated sentence pairs in target side, preparing their suitable correspond source sentences, and using them together with the original sentence pairs to retrain the SMT system. The proposed methods are intrinsically evaluated, and their comparison is made against the baseline translation systems. We have also conducted the experiments in the aforementioned proposed scenarios with minimal initial bilingual data. We have demonstrated improvement made in the performance through the use of proposed methods while building high-quality SMT systems over the baseline involving each scenario.

APA, Harvard, Vancouver, ISO, and other styles

38

Shum, Stephen (Stephen Hin-Chung). "Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105952.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 139-149).
We live an era with almost unlimited access to data. Yet without their proper tagging and annotation, we often struggle to make eective use of most of it. And sometimes, the labels we have access to are not even the ones we really need for the task at hand. Asking human experts for input can be time-consuming and expensive, thus bringing to bear a need for better ways to handle and process unlabeled data. In particular, successful methods in unsupervised domain adaptation can automatically recognize and adapt existing algorithms to systematic changes in the input. Furthermore, methods that can organize incoming streams of information can allow us to derive insights with minimal manual labeling effort - this is the notion of weakly supervised learning. In this thesis, we explore these two themes in the context of speaker and language recognition. First, we consider the problem of adapting an existing algorithm for speaker recognition to a systematic change in our input domain. Then we undertake the scenario in which we start with only unlabeled data and are allowed to select a subset of examples to be labeled, with the goal of minimizing the number of actively labeled examples needed to achieve acceptable speaker recognition performance. Turning to language recognition, we aim to decrease our reliance on transcribed speech via the use of a large-scale model for discovering sub-word units from multilingual data in an unsupervised manner. In doing so, we observe the impact of even small bits of linguistic knowledge and use this as inspiration to improve our sub-word unit discovery methods via the use of weak, pronunciation-equivalent constraints.
by Stephen H. Shum.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

39

Westermark, Rolf. "Designing a generalized language resourcesystem and localization tool for the Configura CET Designer®." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-63725.

Full text

Abstract:

This study was made at Configura Sverige AB and investigates different solutions for a generalized language resource system and a localization tool. The software application Configura CET Designer® that the solutions in this thesis should be applicable to is a kind of CAD tool that is used as a sales tool for configurable products that require space planning. Some types of products that CET Designer® is used for are kitchen, office and industrial solutions. The purpose of the thesis is to find an overall solution for the resource system that manages text resources in a more efficient way than the present custom made solution. One of the most important aspects is that translations of text resources to different human languages must be a lot more efficient than in the present solution.In the first part of this report an analysis of Configuras present custom-made resource system that is used in CET Designer® is made.The next part briefly mentions different types of solutions that exist on the market. This information is used to get a background of existing solutions that might be useful for a generalized language resource system and the range of different localization tools that exists.In the main part different solutions are compared and matched to general demands that can be applied to a lot of different applications that uses text resources. All of the solutions are also matched to the specific demands that apply to CET Designer®. Based on the matches to the demands and the analyzing discussions of possible solutions different recommendations of suitable solutions are made for applications in general as well as for CET Designer®.There are basically two different types of solutions and those are the custom-made system and the tools already available on the market. In many cases available tools offer a solution that is well designed and suits the needs well. Custom-made solutions are on the other hand very adaptable to different kinds of special requirements. The part of the management of resources where the biggest difference between those solutions is found is in the localization tool. The most advanced localization tools available on the market offers a functionality for managing localization projects which is economically indefensible to develop to the same advanced level in a custom-made tool. The advanced linguistically functions that exist in some available localization tools are the ones that make the biggest difference from a custom made tool. There exist several different tools that are available and compatible with custom-made tools which make it possible to create a well adjusted and very powerful tool.
Denna studie är genomförd på Configura Sverige AB och den undersöker olika lösningar för ett generellt resurssystem och lokaliseringsverktyg för språkliga resurser. Programvaran som lösningarna i detta arbete ska kunna appliceras på är Configura CET Designer®. Detta är en typ av CAD-verktyg vilket används för konfigurerbara produkter som kräver utrymmesplanering. Några typer av produkter som CET Designer® används för är kök, kontor och industriella lösningar. Syftet med arbetet är att finna en övergripande lösning för resurssystemet som hanterar textresurserna effektivare än den nuvarande egenutvecklade lösningen. En av de viktigaste aspekterna är att översättningar av textresurser måste vara mycket effektivare än i den nuvarande lösningen.Studien är indelad i olika delar där den första delen analyserar Configuras nuvarande resurssystem som är egentillverkat och används i CET Designer®.Nästa del som föregår huvuddelen av rapporten tar i korthet upp olika lösningar som finns på marknaden. Denna information används för att få en bakgrund kring vilka lösningar som kan vara användbara för ett generellt resurssystem för språkliga resurser och vilken vidd av olika befintliga lokaliseringsverktyg som finns.I huvuddelen av rapporten är olika lösningar jämförda och matchade mot generella krav som kan gälla för många olika applikationer som använder sig av textresurser. Alla lösningar är också matchade mot de specifika kraven som gäller för CET Designer®. Baserat på kravmatchningen och den analyserande diskussionen av möjliga lösningar är olika rekommendationer på passande lösningar gjorda. Detta både för generella programvaror likaväl som för CET Designer®.Det finns i huvudsak två olika typer av lösningar, egenutvecklade respektive befintliga verktyg som är tillgängliga på marknaden. I många fall erbjuder tillgängliga verktyg en väldesignad lösning som tillgodoser behoven på ett bra sätt. Egenutvecklade lösningar är å andra sidan väldigt anpassningsbara till olika speciella krav. En av delarna i hanteringen av resurser där skillnaden är som störst mellan dessa två typer av lösningar är lokaliseringsverktyget. De mest avancerade lokaliseringsverktygen som finns tillgängliga på marknaden har en funktionalitet för hantering av resurser vilken är ekonomiskt oförsvarbar att utveckla till samma avancerade nivå i ett egenutvecklat verktyg. De avancerade lingvistiska funktionerna som finns tillgängliga i några lokaliseringsverktyg är de som utgör den största skillnaden mot ett egenutvecklat verktyg. Det finns idag flera olika verktyg som är tillgängliga på marknaden som kan kombineras med egenutvecklade verktyg vilket gör att det är fullt möjligt att skapa ett mycket kraftfullt och välanpassat verktyg.

APA, Harvard, Vancouver, ISO, and other styles

40

Tafreshi, Shabnam. "Cross-Genre, Cross-Lingual, and Low-Resource Emotion Classification." Thesis, The George Washington University, 2021. http://pqdtopen.proquest.com/#viewpdf?dispub=28088437.

Full text

Abstract:

Emotions can be defined as a natural, instinctive state of mind arising from one’s circumstances, mood, and relationships with others. It has long been a question to be answered by psychology that how and what is it that humans feel. Enabling computers to recognize human emotions has been an of interest to researchers since 1990s (Picard et al., 1995). Ever since, this area of research has grown significantly and emotion detection is becoming an important component in many natural language processing tasks. Several theories exist for defining emotions and are chosen by researchers according to their needs. For instance, according to appraisal theory, a psychology theory, emotions are produced by our evaluations (appraisals or estimates) of events that cause a specific reaction in different people. Some emotions are easy and universal, while others are complex and nuanced. Emotion classification is generally the process of labeling a piece of text with one or more corresponding emotion labels. Psychologists have developed numerous models and taxonomies of emotions. The model or taxonomy depends on the problem, and thorough study is often required to select the best model. Early studies of emotion classification focused on building computational models to classify basic emotion categories. In recent years, increasing volumes of social media and the digitization of data have opened a new horizon in this area of study, where emotion classification is a key component of applications, including mood and behavioral studies, as well as disaster relief, amongst many other applications. Sophisticated models have been built to detect and classify emotion in text, but few analyze how well a model is able to learn emotion cues. The ability to learn emotion cues properly and be able to generalize this learning is very important. This work investigates the robustness of emotion classification approaches across genres and languages, with a focus on quantifying how well state-of-the-art models are able to learn emotion cues. First, we use multi-task learning and hierarchical models to build emotion models that were trained on data combined from multiple genres. Our hypothesis is that a multi-genre, noisy training environment will help the classifier learn emotion cues that are prevalent across genres. Second, we explore splitting text (i.e. sentence) into its clauses and testing whether the model’s performance improves. Emotion analysis needs fine-grained annotation and clause-level annotation can be beneficial to design features to improve emotion detection performance. Intuitively, clause-level annotations may help the model focus on emotion cues, while ignoring irrelevant portions of the text. Third, we adopted a transfer learning approach for cross-lingual/genre emotion classification to focus the classifier’s attention on emotion cues which are consistent across languages. Fourth, we empirically show how to combine different genres to be able to build robust models that can be used as source models for emotion transfer to low-resource target languages. Finally, this study involved curating and re-annotating popular emotional data sets in different genres, and annotating a multi-genre corpus of Persian tweets and news, and generating a collection of emotional sentences for a low-resource language, Azerbaijani, a language spoken in the north west of Iran.

APA, Harvard, Vancouver, ISO, and other styles

41

Gonzalez, Herrera Inti Yulien. "Supporting resource awareness in managed runtime environment." Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S090/document.

Full text

Abstract:

Aujourd'hui, les systèmes logiciels sont omniprésents. Parfois, les applications doivent fonctionner sur des dispositifs à ressources limitées. Toutefois, les applications nécessitent un support d'exécution pour faire face à de telles limitations. Cette thèse aborde le problème de la programmation pour créer des systèmes "conscient des ressources" supporté par des environnements d'exécution adaptés (MRTEs). En particulier, cette thèse vise à offrir un soutien efficace pour recueillir des données sur la consommation de ressources de calcul (par exemple, CPU, mémoire), ainsi que des mécanismes efficaces pour réserver des ressources pour des applications spécifiques. Dans les solutions existantes, nous trouvons deux inconvénients importants. Les solutions imposent un impact important sur les performances à l'exécution des applications. La création d'outils permettant de gérer finement les ressources pour ces abstractions est encore une tâche complexe. Les résultats de cette thèse forment trois contributions : Un cadre de surveillance des ressources optimiste qui réduit le coût de la collecte des données de consommation de ressources ; une méthodologie pour sélectionner les le support d'exécution des composants au moment du déploiement afin d’exécuter la réservation de ressources ; un langage pour construire des profileurs de mémoire personnalisées qui peuvent être utilisés à la fois au cours du développement des applications, ainsi que dans un environnement de production
Software systems are more pervasive than ever nowadays. Occasionally, applications run on top of resource-constrained devices where efficient resource management is required ; hence, they must be capable of coping with such limitations. However, applications require support from the run-time environment to properly deal with resource limitations. This thesis addresses the problem of supporting resource-aware programming in execution environments. In particular, it aims at offering efficient support for collecting data about the consumption of computational resources (e.g., CPU, memory), as well as efficient mechanisms to reserve resources for specific applications. In existing solutions we find two important drawbacks. First, they impose performance overhead on the execution of applications. Second, creating resource management tools for these abstractions is still a daunting task. The outcomes of this thesis are three contributions: an optimistic resource monitoring framework that reduces the cost of collecting resource consumption data ; a methodology to select components' bindings at deployment time in order to perform resource reservation ; a language to build customized memory profilers that can be used both during applications' development, and also in a production environment

APA, Harvard, Vancouver, ISO, and other styles

42

Habte, Abrahaley. "The development of supplementary materials for English language teaching in a scarce resource environment: an action research study." University of Western Cape, 2001. http://hdl.handle.net/11394/7387.

Full text

Abstract:

Magister Philosophiae - MPhil
Task based language instruction has generated some debate among researchers. Some argue in favour of task based language instruction by claiming that tasks focus learners' attention on meaning and thus facilitate second language acquisition (Prahbu, 1987; Pica and Doughty, 1986; Pica, Kanagy, and Falodun,1993). Others argue against task based language instruction and call into question the concept of comprehensible input, the idea upon which the whole task based approach is based (Sheen, 1994).

APA, Harvard, Vancouver, ISO, and other styles

43

Chau, Mong. "Digital media as a resource for English learners in 1-3." Thesis, Malmö högskola, Fakulteten för lärande och samhälle (LS), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-33491.

Full text

Abstract:

Digital media has a huge impact on today‟s society. Technologies give opportunities for teachers to access different tasks, exercises, games and videos for teaching students a new language. This also means that teachers have opportunity to conduct more varied lessons to motivate students‟ learning. However, digital media comes with advantages as well as disadvantages. This project will therefore discuss the use of digital media for learning English as a foreign language and discover what teachers thinks about using digital media for teaching. To examine the use of digital media in today‟s school and teachers‟ views of using digital media for learning. This project will carry out interviews and observations on teachers from a selected school. Participants from the selected school are teachers who use digital media regularly to teach Swedish students the English language. Moreover, the participants also discussed the advantages and disadvantages with using of digital media in teaching.

APA, Harvard, Vancouver, ISO, and other styles

44

FERREIRA, MERGENFEL A. VAZ. "ADVERTISING AS A RESOURCE FOR CONTEXTUALIZATION IN THE TEACHING OF GERMAN AS A FOREIGN LANGUAGE (GFL)." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2005. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=6698@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
Pensando a linguagem como um construto social inseparável do contexto situacional e cultural em que está ancorada, este estudo tem por objetivo principal descrever e analisar anúncios publicitários alemães veiculados na mídia impressa, verificando nos mesmos as relações entre seus diferentes componentes verbais e não-verbais. Esta análise baseou-se fundamentalmente nos pressupostos da Gramática Sistêmico-Funcional (Halliday, 1994), assim como nas categorias da Gramática Visual (Kress e Van Leeuwen, 1996). Deste modo, a descrição e análise dos componentes lingüísticos e extralingüísticos dos anúncios são usadas para a observação das possíveis relações entre língua e contexto. Somando-se à análise das propriedades verbais e não-verbais dos anúncios, este estudo fez uso de um questionário distribuído a professores de Alemão como Língua Estrangeira (ALE), atuantes em diferentes instituições de ensino desta língua na cidade do Rio de Janeiro. Através das respostas ao questionário, o presente trabalho discute questões relacionadas à abordagem dos aspectos culturais nas aulas de ALE, assim como o uso e exploração significativos dos elementos visuais presentes nos diversos materiais de ensino utilizados pelos professores. Salientando, então, a importância de se promover um ensino da LE contextualizada, este estudo pretende também contribuir para a discussão que envolve o ensino de ALE, trazendo em seu escopo teorias que tratam de gêneros textuais, contexto, cultura e comunicação visual.
From the view that language is a social construct that is inseparable from its situational and cultural context, the main objective of the present study is to describe and analyze German advertisements in printed form, focusing on the relationship between their verbal and non-verbal components. The analysis is primarily based on the principles of Halliday`s (1994) systemic-functional grammar and on the categories in the visual grammar of Kress and van Leeuwen (1996). Thus, the description and analysis of the linguistic and extra-linguistic components in the advertisements are used to observe the relationships between the advertisements as genre. In addition to the analysis of the verbal and nonverbal features of advertisements, the study made use of a questionnaire distributed to professionals that teach German as a foreign language in various institutions in Rio de Janeiro. In light of the responses to the questionnaire, the present study discusses questions related to how culture is approached in the classroom and whether visual elements are significantly explored in the teaching materials used by these teachers. The aim of the research is twofold: it proposes the importance of encouraging contextualized foreign language teaching and it attempts to contribute to the discussion on the teaching of German as a foreign language, from a multiple perspective that includes genre, context, culture and visual communication.

APA, Harvard, Vancouver, ISO, and other styles

45

De, Villiers Pieter Theunis. "Lecture transcription systems in resource-scarce environments / Pieter Theunis de Villiers." Thesis, North-West University, 2014. http://hdl.handle.net/10394/10620.

Full text

Abstract:

Classroom note taking is a fundamental task performed by learners on a daily basis. These notes provide learners with valuable offline study material, especially in the case of more difficult subjects. The use of class notes has been found to not only provide students with a better learning experience, but also leads to an overall higher academic performance. In a previous study, an increase of 10.5% in student grades was observed after these students had been provided with multimedia class notes. This is not surprising, as other studies have found that the rate of successful transfer of information to humans increases when provided with both visual and audio information. Note taking might seem like an easy task; however, students with hearing impairments, visual impairments, physical impairments, learning disabilities or even non-native listeners find this task very difficult to impossible. It has also been reported that even non-disabled students find note taking time consuming and that it requires a great deal of mental effort while also trying to pay full attention to the lecturer. This is illustrated by a study where it was found that college students were only able to record ~40% of the data presented by the lecturer. It is thus reasonable to expect an automatic way of generating class notes to be beneficial to all learners. Lecture transcription (LT) systems are used in educational environments to assist learners by providing them with real-time in-class transcriptions or recordings and transcriptions for offline use. Such systems have already been successfully implemented in the developed world where all required resources were easily obtained. These systems are typically trained on hundreds to thousands of hours of speech while their language models are trained on millions or even hundreds of millions of words. These amounts of data are generally not available in the developing world. In this dissertation, a number of approaches toward the development of LT systems in resource-scarce environments are investigated. We focus on different approaches to obtaining sufficient amounts of well transcribed data for building acoustic models, using corpora with few transcriptions and of variable quality. One approach investigates the use of alignment using a dynamic programming phone string alignment procedure to harvest as much usable data as possible from approximately transcribed speech data. We find that target-language acoustic models are optimal for this purpose, but encouraging results are also found when using models from another language for alignment. Another approach entails using unsupervised training methods where an initial low accuracy recognizer is used to transcribe a set of untranscribed data. Using this poorly transcribed data, correctly recognized portions are extracted based on a word confidence threshold. The initial system is retrained along with the newly recognized data in order to increase its overall accuracy. The initial acoustic models are trained using as little as 11 minutes of transcribed speech. After several iterations of unsupervised training, a noticeable increase in accuracy was observed (47.79% WER to 33.44% WER). Similar results were however found (35.97% WER) after using a large speaker-independent corpus to train the initial system. Usable LMs were also created using as few as 17955 words from transcribed lectures; however, this resulted in large out-of-vocabulary rates. This problem was solved by means of LM interpolation. LM interpolation was found to be very beneficial in cases where subject specific data (such as lecture slides and books) was available. We also introduce our NWU LT system, which was developed for use in learning environments and was designed using a client/server based architecture. Based on the results found in this study we are confident that usable models for use in LT systems can be developed in resource-scarce environments.
MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014

APA, Harvard, Vancouver, ISO, and other styles

46

Sun, Xiantang. "Domain independent generation from RDF instance date." Thesis, Available from the University of Aberdeen Library and Historic Collections Digital Resources, 2008. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?application=DIGITOOL-3&owner=resourcediscovery&custom_att_2=simple_viewer&pid=24972.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Lilla, Nanine Yvonne [Verfasser]. "Everyday Multiple Language Use as a Potential Resource for the Self : Positive Emotional and Motivational Consequences of a Language-Dependent Self-Representation / Nanine Yvonne Lilla." Berlin : Freie Universität Berlin, 2019. http://d-nb.info/1177152703/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

McAloon, Patrick O. "Chinese at Work: Evaluating Advanced Language Use in China-related Careers." The Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1218548897.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Withrow, Brandon. "Jonathan Edwards as a resource for current evangelical discussion over the language of the doctrine of justification." Theological Research Exchange Network (TREN), 1999. http://www.tren.com.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Singh, Mittul [Verfasser], and Dietrich [Akademischer Betreuer] Klakow. "Handling long-term dependencies and rare words in low-resource language modelling / Mittul Singh ; Betreuer: Dietrich Klakow." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2017. http://d-nb.info/1141677962/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Language resource'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles