To see the other types of publications on this topic, follow the link: Automated language translation.

Dissertations / Theses on the topic 'Automated language translation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 47 dissertations / theses for your research on the topic 'Automated language translation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Marshall, Susan LaVonne. "Concept of Operations (CONOPS) for foreign language and speech translation technologies in a coalition military environment." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2005. http://library.nps.navy.mil/uhtbin/hyperion/05Mar%5FMarshall.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Doni, Pracner. "Translation and Transformation of Low Level Programs." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=110184&source=NDLTD&language=en.

Full text
Abstract:
This thesis presents an approach for working with low level source code that enables automatic restructuring and raising the abstraction level of the programs. This makes it easier to understand the logic of the program, which in turn reduces the development time.The process in this thesis was designed to be flexible and consists of several independent tools. This makes the process easy to adapt as needed, while at the same time the developed tools can be used for other processes. There are usually two basic steps. First is the translation to WSL language, which has a great number of semantic preserving program transformations. The second step are the transformations of the translated WSL. Two tools were developed for translation: one that works with a subset of x86 assembly, and another that works with MicroJava bytecode. The result of the translation is a low level program in WSL.The primary goal of this thesis was to fully automate the selection of the transformations. This enables users with no domain  knowledge to efficiently use this process as needed. At the same time, the flexibility of the process enables experienced users to adapt it as needed or integrate it into other processes. The automation was achieved with a hill climbing algorithm.Experiments that were run on several types of input programs showed that the results can be excellent. The fitness function used was a built-in metric that gives the “weight” of structures in a program. On input samples that had original high level source codes, the end result metrics of the translated and transformed programs were comparable. On some samples the result was even better than the originals, on some others they were somewhat more complex. When comparing with low level original source code, the end results was always significantly improved.
U okviru ove teze se predstavlja pristup radu sa programima niskog nivoa koji omogućava automatsko restrukturiranje i podizanje na više nivoe. Samim tim postaje mnogo lakše razumeti logiku programa što smanjuje vreme razvoja.Proces je dizajniran tako da bude fleksibilan i sastoji se od više nezavisnih alata. Samim tim je lako menjati proces po potrebi, ali i upotrebiti razvijene alate u drugim procesima. Tipično se mogu razlikovati dva glavna koraka. Prvi je prevođenje u jezik WSL,za koji postoji veliki broj transformacija programa koje očuvavaju semantiku. Drugi su transformacije u samom WSL-u. Za potrebe prevođenja su razvijena dva alata, jedan koji radi sa podskupom x86 asemblera i drugi koji radi sa MikroJava bajtkôdom. Rezultat prevođenja je program niskog nivoa u WSL jeziku.Primarni cilj ovog istraživanja je bila potpuna automatizacija odabira transformacija, tako da i korisnici bez iskustva u radu sa sistemom mogu efikasno da primene ovaj proces za svoje potrebe. Sa druge strane zbog fleksibilnosti procesa, iskusni korisnici mogu lakoda ga prošire ili da ga integrišu u neki drugi već postojeći   proces.Automatizacija je  postignuta pretraživanjem usponom (eng. hill climbing).Eksperimenti vršeni na nekoliko tipova ulaznih programa niskog nivoa su pokazali da rezultati mogu biti  izuzetni. Za funkciju pogodnosti je korišćena ugrađena metrika koja daje “težinu” struktura u programu. Kod ulaza za koje je originalni izvorni kôd bio dostupan, krajnje metrike najboljih varijanti prevedenih i transformisanih programa su bile na sličnom nivou. Neki primeri su bolji od originala, dok su drugi bili nešto kompleksniji. Rezultati su uvek pokazivali značajna unapređenja u odnosu na originalni kôd niskog nivoa.
APA, Harvard, Vancouver, ISO, and other styles
3

Zogheib, Ali. "Automatic language translation /." Göteborg : IT-universitetet, Chalmers tekniska högskola och Göteborgs universitet, 2007. http://www.ituniv.se/w/index.php?option=com_itu_thesis&Itemid=319.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Davis, Paul C. "Stone Soup Translation: The Linked Automata Model." Connect to this title online, 2002. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1023806593.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2002.
Title from first page of PDF file. Document formatted into pages; contains xvi, 306 p.; includes graphics. Includes abstract and vita. Advisor: Chris Brew, Dept. of Linguistics. Includes indexes. Includes bibliographical references (p. 284-293).
APA, Harvard, Vancouver, ISO, and other styles
5

Clark, D. P. "Automatic translation of scene description languages." Thesis, Swansea University, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.636259.

Full text
Abstract:
This work presents a novel approach to translation, targeted in particular towards the translation of graphical scene description languages. A new approach to automatic language translation has been proposed. It is based on the concept of using an Independent Stylesheet for the specification of each language concerned in terms of lexis, syntax and semantics, and using a Generic Translator to perform translation between two languages according mainly to the corresponding stylesheets. This new approach is called Independent Stylesheet Language Translation (ISLT). The ISLT approach focuses on a class of translation problems, where accurate mapping between two languages cannot be fully accomplished. For instance, such a scenario is common among graphical scene description languages. Therefore, the aim of translation is to achieve a close semantic approximation of the source program in a target language, and that the approximation is syntactically correct with respect to a declared stylesheet of the target language. A generic software architecture for ISLT has been proposed, which consists of three main phases, namely Extraction, Transformation and Reconstruction. The Extraction phase involves the automatic generation of a parser based on the stylesheet of a source language, and the parser is then used to decompose a program, in the source language into an abstract program in the form of a Program Component List. The Transformation phase involves a series of iterative mapping process, supported by a Generic Mapping Thesaurus, for the transformation of an abstract program related to the source language, to that related to the target language. The Reconstruction phase utilises XSLT for the construction of a program in the target language based on an abstract program. A domain-specific implementation of ISLT, called Graphical Anamorphic Language Environment (GALE), has been developed for the translation of graphical scene description languages. Three example languages have been considered, and the results have demonstrated the technical feasibility and scalability of the proposed approach. The ISLT approach does not suffer from the huge cost of direct translation based approaches or the restriction on functionality and program content imposed upon by the use of an intermediary language. Furthermore, the semantic approximation in translation helps retain programmatic intent. It is believed that, in the long term, the ISLT approach is more cost-effective than the traditional approaches of direct translation and intermediate translation.
APA, Harvard, Vancouver, ISO, and other styles
6

Dürlich, Luise. "Automatic Recognition and Classification of Translation Errors in Human Translation." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420289.

Full text
Abstract:
Grading assignments is a time-consuming part of teaching translation. Automatic tools that facilitate this task would allow teachers of professional translation to focus more on other aspects of their job. Within Natural Language Processing, error recognitionhas not been studied for human translation in particular. This thesis is a first attempt at both error recognition and classification with both mono- and bilingual models. BERT– a pre-trained monolingual language model – and NuQE – a model adapted from the field of Quality Estimation for Machine Translation – are trained on a relatively small hand annotated corpus of student translations. Due to the nature of the task, errors are quite rare in relation to correctly translated tokens in the corpus. To account for this,we train the models with both under- and oversampled data. While both models detect errors with moderate success, the NuQE model adapts very poorly to the classification setting. Overall, scores are quite low, which can be attributed to class imbalance and the small amount of training data, as well as some general concerns about the corpus annotations. However, we show that powerful monolingual language models can detect formal, lexical and translational errors with some success and that, depending on the model, simple under- and oversampling approaches can already help a great deal to avoid pure majority class prediction.
APA, Harvard, Vancouver, ISO, and other styles
7

Chatterjee, Rajen. "Automatic Post-Editing for Machine Translation." Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/242495.

Full text
Abstract:
Automatic Post-Editing (APE) aims to correct systematic errors in a machine translated text. This is primarily useful when the machine translation (MT) system is not accessible for improvement, leaving APE as a viable option to improve translation quality as a downstream task - which is the focus of this thesis. This field has received less attention compared to MT due to several reasons, which include: the limited availability of data to perform a sound research, contrasting views reported by different researchers about the effectiveness of APE, and limited attention from the industry to use APE in current production pipelines. In this thesis, we perform a thorough investigation of APE as a down- stream task in order to: i) understand its potential to improve translation quality; ii) advance the core technology - starting from classical methods to recent deep-learning based solutions; iii) cope with limited and sparse data; iv) better leverage multiple input sources; v) mitigate the task-specific problem of over-correction; vi) enhance neural decoding to leverage external knowledge; and vii) establish an online learning framework to handle data diversity in real-time. All the above contributions are discussed across several chapters, and most of them are evaluated in the APE shared task organized each year at the Conference on Machine Translation. Our efforts in improving the technology resulted in the best system at the 2017 APE shared task, and our work on online learning received a distinguished paper award at the Italian Conference on Computational Linguistics. Overall, outcomes and findings of our work have boost interest among researchers and attracted industries to examine this technology to solve real-word problems.
APA, Harvard, Vancouver, ISO, and other styles
8

Huang, X. "XTRA : The design and implementation of a fully automatic machine translation system." Thesis, University of Essex, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.379393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Averboch, Guillermo Andres. "A system for document analysis, translation, and automatic hypertext linking." Thesis, Virginia Tech, 1995. http://hdl.handle.net/10919/43809.

Full text
Abstract:
A digital library database is a heterogeneous collection of documents. Documents may become available in different formats (e.g., ASCII, SGML, typesetter languages) and they may have to be translated to a standard document representation scheme used by the digital library. This work focuses on the design of a framework that can be used to convert text documents in any format to equivalent documents in different formats and, in particular, to SGML (Standard Generalized Markup Language). In addition, the framework must be able to extract information about the analyzed documents, store that information in a permanent database, and construct hypertext links between documents and the information contained in that database and between the document themselves. For example, information about the author of a document could be extracted and stored in the database. A link can then be established between the document and the information about its author and from there to other documents by the same author. These tasks must be performed without any human intervention, even at the risk of making a small number of mistakes. To accomplish these goals we developed a language called DELTO (Description Language for Textual Objects) that can be used to describe a document format. Given a description for a particular format, our system is able to extract information from documents in that format, to store part of that information in a permanent database, and to use that information in constructing an abstract representation of those documents that can be used to generate equivalent documents in different formats. The system originated from this work is used for constructing the database of Envision, a Virginia Tech digital library research project.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
10

Saers, Markus. "Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine Translation." Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-135704.

Full text
Abstract:
Automatic translation has seen tremendous progress in recent years, mainly thanks to statistical methods applied to large parallel corpora. Transductions represent a principled approach to modeling translation, but existing transduction classes are either not expressive enough to capture structural regularities between natural languages or too complex to support efficient statistical induction on a large scale. A common approach is to severely prune search over a relatively unrestricted space of transduction grammars. These restrictions are often applied at different stages in a pipeline, with the obvious drawback of committing to irrevocable decisions that should not have been made. In this thesis we will instead restrict the space of transduction grammars to a space that is less expressive, but can be efficiently searched. First, the class of linear transductions is defined and characterized. They are generated by linear transduction grammars, which represent the natural bilingual case of linear grammars, as well as the natural linear case of inversion transduction grammars (and higher order syntax-directed transduction grammars). They are recognized by zipper finite-state transducers, which are equivalent to finite-state automata with four tapes. By allowing this extra dimensionality, linear transductions can represent alignments that finite-state transductions cannot, and by keeping the mechanism free of auxiliary storage, they become much more efficient than inversion transductions. Secondly, we present an algorithm for parsing with linear transduction grammars that allows pruning. The pruning scheme imposes no restrictions a priori, but guides the search to potentially interesting parts of the search space in an informed and dynamic way. Being able to parse efficiently allows learning of stochastic linear transduction grammars through expectation maximization. All the above work would be for naught if linear transductions were too poor a reflection of the actual transduction between natural languages. We test this empirically by building systems based on the alignments imposed by the learned grammars. The conclusion is that stochastic linear inversion transduction grammars learned from observed data stand up well to the state of the art.
APA, Harvard, Vancouver, ISO, and other styles
11

Lindgren, Anna. "Semi-Automatic Translation of Medical Terms from English to Swedish : SNOMED CT in Translation." Thesis, Linköpings universitet, Medicinsk informatik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69736.

Full text
Abstract:
The Swedish National Board of Health and Welfare has been overseeing translations of the international clinical terminology SNOMED CT from English to Swedish. This study was performed to find whether semi-automatic methods of translation could produce a satisfactory translation while requiring fewer resources than manual translation. Using the medical English-Swedish dictionary TermColl translations of select subsets of SNOMED CT were produced by ways of translation memory and statistical translation. The resulting translations were evaluated via BLEU score using translations provided by the Swedish National Board of Health and Welfare as reference before being compared with each other. The results showed a strong advantage for statistical translation over use of a translation memory; however, overall translation results were far from satisfactory.
Den internationella kliniska terminologin SNOMED CT har översatts från engelska till svenska under ansvar av Socialstyrelsen. Den här studien utfördes för att påvisa om semiautomatiska översättningsmetoder skulle kunna utföra tillräckligt bra översättning med färre resurser än manuell översättning. Den engelsk-svenska medicinska ordlistan TermColl användes som bas för översättning av delmängder av SNOMED CT via översättnings­minne och genom statistisk översättning. Med Socialstyrelsens översättningar som referens poängsattes the semiautomatiska översättningarna via BLEU. Resultaten visade att statistisk översättning gav ett betydligt bättre resultat än översättning med översättningsminne, men över lag var resultaten alltför dåliga för att semiautomatisk översättning skulle kunna rekommenderas i detta fall.
APA, Harvard, Vancouver, ISO, and other styles
12

Büchse, Matthias. "Algebraic decoder specification: coupling formal-language theory and statistical machine translation." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-159266.

Full text
Abstract:
The specification of a decoder, i.e., a program that translates sentences from one natural language into another, is an intricate process, driven by the application and lacking a canonical methodology. The practical nature of decoder development inhibits the transfer of knowledge between theory and application, which is unfortunate because many contemporary decoders are in fact related to formal-language theory. This thesis proposes an algebraic framework where a decoder is specified by an expression built from a fixed set of operations. As yet, this framework accommodates contemporary syntax-based decoders, it spans two levels of abstraction, and, primarily, it encourages mutual stimulation between the theory of weighted tree automata and the application.
APA, Harvard, Vancouver, ISO, and other styles
13

Shi, Chunqi. "User-Centered Design of Translation Systems." 京都大学 (Kyoto University), 2013. http://hdl.handle.net/2433/180468.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Öktem, Alp. "Incorporating prosody into neural speech processing pipelines: applications on automatic speech transcription and spoken language machine translation." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/666222.

Full text
Abstract:
In this dissertation, I study the inclusion of prosody into two applications that involve speech understanding:~automatic speech transcription and spoken language translation. In the former case, I propose a method that uses an attention mechanism over parallel sequences of prosodic and morphosyntactic features. Results indicate an $F_1$ score of 70.3\% in terms of overall punctuation generation accuracy. In the latter problem I deal with enhancing spoken language translation with prosody. A neural machine translation system trained with movie-domain data is adapted with pause features using a prosodically annotated bilingual dataset. Results show that prosodic punctuation generation as a preliminary step to translation increases translation accuracy by 1\% in terms of BLEU scores. Encoding pauses as an extra encoding feature gives an additional 1\% increase to this number. The system is further extended to jointly predict pause features in order to be used as an input to a text-to-speech system.
En aquesta tesi estudio la inclusió de la prosòdia en dues aplicacions que involucren la comprensió de la parla:~la transcripció automàtica de la parla i la traducció de la llengua oral. En el primer cas, proposo un mètode que utilitza un mecanisme d’atenció sobre seqüències paral·leles de característiques prosòdiques i morfosintàctiques. Els resultats indiquen una precisió de $F_1$=70.3\% en la generació de la puntuació. En el segon cas m'ocupo de la millora de la traducció de la llengua oral utilitzant la prosòdia. Un sistema neural de traducció automàtica format amb un corpus de text en el domini del cinema s’adapta amb característiques de pauses afegides utilitzant un conjunt de dades bilingües prosòdicament anotada. Els resultats mostren que la generació de puntuació prosòdica com a pas previ a la traducció augmenta la precisió de la traducció en un 1\% en termes de BLEU. La codificació de les pauses com a característica addicional encara incrementa la precisió en un altre 1\%. A més a més, amplio el sistema de traducció per a predir conjuntament les característiques de pausa i poder-les utilitzar com a entrada en un sistema de síntesi de veu.
APA, Harvard, Vancouver, ISO, and other styles
15

Pereira, José Casimiro. "Natural language generation in the context of multimodal interaction in Portuguese : Data-to-text based in automatic translation." Doctoral thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/21767.

Full text
Abstract:
Doutoramento em Informática
Resumo em português não disponivel
To enable the interaction by text and/or speech it is essential that we devise systems capable of translating internal data into sentences or texts that can be shown on screen or heard by users. In this context, it is essential that these natural language generation (NLG) systems provide sentences in the native languages of the users (in our case European Portuguese) and enable an easy development and integration process while providing an output that is perceived as natural. The creation of high quality NLG systems is not an easy task, even for a small domain. The main di culties arise from: classic approaches being very demanding in know-how and development time; a lack of variability in generated sentences of most generation methods; a di culty in easily accessing complete tools; shortage of resources, such as large corpora; and support being available in only a limited number of languages. The main goal of this work was to propose, develop and test a method to convert Data-to-Portuguese, which can be developed with the smallest amount possible of time and resources, but being capable of generating utterances with variability and quality. The thesis defended argues that this goal can be achieved adopting data-driven language generation { more precisely generation based in language translation { and following an Engineering Research Methodology. In this thesis, two Data2Text NLG systems are presented. They were designed to provide a way to quickly develop an NLG system which can generate sentences with good quality. The proposed systems use tools that are freely available and can be developed by people with low linguistic skills. One important characteristic is the use of statistical machine translation techniques and this approach requires only a small natural language corpora resulting in easier and cheaper development when compared to more common approaches. The main result of this thesis is the demonstration that, by following the proposed approach, it is possible to create systems capable of translating information/data into good quality sentences in Portuguese. This is done without major e ort regarding resources creation and with the common knowledge of an experienced application developer. The systems created, particularly the hybrid system, are capable of providing a good solution for problems in data to text conversion.
APA, Harvard, Vancouver, ISO, and other styles
16

Mostofian, Nasrin. "A Study on Manual and Automatic Evaluation Procedures and Production of Automatic Post-editing Rules for Persian Machine Translation." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325818.

Full text
Abstract:
Evaluation of machine translation is an important step towards improving MT. One way to evaluate the output of MT is to focus on different types of errors occurring in the translation hypotheses, and to think of possible solutions to fix those errors. An error categorization is a rather beneficent tool that makes it easy to analyze the translation errors and can also be utilized to manually generate post-editing rules to be applied automatically to the product of machine translation. In this work, we define a categorization for the errors occurring in Swedish--Persian machine translation by analyzing the errors that occur in three data-sets from two websites: 1177.se, and Linköping municipality. We define three types of monolingual reference free evaluation (MRF), and use two automatic metrics BLEU and TER, to conduct a bilingual evaluation for Swedish-Persian translation. Later on, based on the experience of working with the errors that occur in the corpora, we manually generate automatic post-editing (APE) rules and apply them to the product of machine translation. Three different sets of results are obtained: (1) The results of analyzing MT errors show that the three most common types of errors that occur in the translation hypotheses are mistranslated words, wrong word order, and extra prepositions. These types of errors are placed in semantic and syntactic categories respectively. (2) The results of comparing the correlation between the automatic and manual evaluation show a low correlation between the two evaluations. (3) Lastly, applying the APE rules to the product of machine translation gives an increase in BLEU score on the largest data-set while remaining almost unchanged on the other two data-sets. The results for TER show a better score on one data-set, while the scores on the two other data-sets remain unchanged.
APA, Harvard, Vancouver, ISO, and other styles
17

Quernheim, Daniel. "Bimorphism Machine Translation." Doctoral thesis, Universitätsbibliothek Leipzig, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-223667.

Full text
Abstract:
The field of statistical machine translation has made tremendous progress due to the rise of statistical methods, making it possible to obtain a translation system automatically from a bilingual collection of text. Some approaches do not even need any kind of linguistic annotation, and can infer translation rules from raw, unannotated data. However, most state-of-the art systems do linguistic structure little justice, and moreover many approaches that have been put forward use ad-hoc formalisms and algorithms. This inevitably leads to duplication of effort, and a separation between theoretical researchers and practitioners. In order to remedy the lack of motivation and rigor, the contributions of this dissertation are threefold: 1. After laying out the historical background and context, as well as the mathematical and linguistic foundations, a rigorous algebraic model of machine translation is put forward. We use regular tree grammars and bimorphisms as the backbone, introducing a modular architecture that allows different input and output formalisms. 2. The challenges of implementing this bimorphism-based model in a machine translation toolkit are then described, explaining in detail the algorithms used for the core components. 3. Finally, experiments where the toolkit is applied on real-world data and used for diagnostic purposes are described. We discuss how we use exact decoding to reason about search errors and model errors in a popular machine translation toolkit, and we compare output formalisms of different generative capacity.
APA, Harvard, Vancouver, ISO, and other styles
18

Kučera, Jiří. "A Combination of Automata and Grammars." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236222.

Full text
Abstract:
V této práci byly zavedeny a studovány nové systémy formálních modelů, zvané stavově synchronizované automatové systémy stupně n . Výpočet je v těchto prezentovaných systémech řízen pomocí slov patřících do konečného řídícího jazyka, kde každé slovo z tohoto jazyka je složeno ze stavů komponent systému. Dále byla v této práci studována výpočetní síla zavedených systémů. Praktické použití zavedených systémů bylo demonstrováno na příkladu z oblasti překladu přirozených jazyků a dále na příkladu z oblasti paralelního překladu.
APA, Harvard, Vancouver, ISO, and other styles
19

Papadopoulou, Anthi. "Automatic Error Detection and Correction in Neural Machine Translation : A comparative study of Swedish to English and Greek to English." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385085.

Full text
Abstract:
Automatic detection and automatic correction of machine translation output are important steps to ensure an optimal quality of the final output. In this work, we compared the output of neural machine translation of two different language pairs, Swedish to English and Greek to English. This comparison was made using common machine translation metrics (BLEU, METEOR, TER) and syntax-related ones (POSBLEU, WPF, WER on POS classes). It was found that neither common metrics nor purely syntax-related ones were able to capture the quality of the machine translation output accurately, but the decomposition of WER over POS classes was the most informative one. A sample of each language was taken, so as to aid in the comparison between manual and automatic error categorization of five error categories, namely reordering errors, inflectional errors, missing and extra words, and incorrect lexical choices. Both Spearman’s ρ and Pearson’s r showed that there is a good correlation with human judgment with values above 0.9. Finally, based on the results of this error categorization, automatic post editing rules were implemented and applied, and their performance was checked against the sample, and the rest of the data set, showing varying results. The impact on the sample was greater, showing improvement in all metrics, while the impact on the rest of the data set was negative. An investigation of that, alongside the fact that correction was not possible for Greek due to extremely free reference translations and lack of error patterns in spoken speech, reinforced the belief that automatic post-editing is tightly connected to consistency in the reference translation, while also proving that in machine translation output handling, potentially more than one reference translations would be needed to ensure better results.
APA, Harvard, Vancouver, ISO, and other styles
20

Zhou, Mingjie. "Deep networks for sign language video caption." HKBU Institutional Repository, 2020. https://repository.hkbu.edu.hk/etd_oa/848.

Full text
Abstract:
In the hearing-loss community, sign language is a primary tool to communicate with people while there is a communication gap between hearing-loss people with normal hearing people. Sign language is different from spoken language. It has its own vocabulary and grammar. Recent works concentrate on the sign language video caption which consists of sign language recognition and sign language translation. Continuous sign language recognition, which can bridge the communication gap, is a challenging task because of the weakly supervised ordered annotations where no frame-level label is provided. To overcome this problem, connectionist temporal classification (CTC) is the most widely used method. However, CTC learning could perform badly if the extracted features are not good. For better feature extraction, this thesis presents the novel self-attention-based fully-inception (SAFI) networks for vision-based end-to-end continuous sign language recognition. Considering the length of sign words differs from each other, we introduce the fully inception network with different receptive fields to extract dynamic clip-level features. To further boost the performance, the fully inception network with an auxiliary classifier is trained with aggregation cross entropy (ACE) loss. Then the encoder of self-attention networks as the global sequential feature extractor is used to model the clip-level features with CTC. The proposed model is optimized by jointly training with ACE on clip-level feature learning and CTC on global sequential feature learning in an end-to-end fashion. The best method in the baselines achieves 35.6% WER on the validation set and 34.5% WER on the test set. It employs a better decoding algorithm for generating pseudo labels to do the EM-like optimization to fine-tune the CNN module. In contrast, our approach focuses on the better feature extraction for end-to-end learning. To alleviate the overfitting on the limited dataset, we employ temporal elastic deformation to triple the real-world dataset RWTH- PHOENIX-Weather 2014. Experimental results on the real-world dataset RWTH- PHOENIX-Weather 2014 demonstrate the effectiveness of our approach which achieves 31.7% WER on the validation set and 31.2% WER on the test set. Even though sign language recognition can, to some extent, help bridge the communication gap, it is still organized in sign language grammar which is different from spoken language. Unlike sign language recognition that recognizes sign gestures, sign language translation (SLT) converts sign language to a target spoken language text which normal hearing people commonly use in their daily life. To achieve this goal, this thesis provides an effective sign language translation approach which gains state-of-the-art performance on the largest real-life German sign language translation database, RWTH-PHOENIX-Weather 2014T. Besides, a direct end-to-end sign language translation approach gives out promising results (an impressive gain from 9.94 to 13.75 BLEU and 9.58 to 14.07 BLEU on the validation set and test set) without intermediate recognition annotations. The comparative and promising experimental results show the feasibility of the direct end-to-end SLT
APA, Harvard, Vancouver, ISO, and other styles
21

Silvestre, Cerdà Joan Albert. "Different Contributions to Cost-Effective Transcription and Translation of Video Lectures." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/62194.

Full text
Abstract:
[EN] In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions.
[ES] Durante estos últimos años, los repositorios multimedia on-line han experimentado un gran crecimiento que les ha hecho establecerse como fuentes fundamentales de conocimiento, especialmente en el área de la educación, donde se han creado grandes repositorios de vídeo charlas educativas para complementar e incluso reemplazar los métodos de enseñanza tradicionales. No obstante, la mayoría de estas charlas no están transcritas ni traducidas debido a la ausencia de soluciones de bajo coste que sean capaces de hacerlo garantizando una calidad mínima aceptable. Soluciones de este tipo son claramente necesarias para hacer que las vídeo charlas sean más accesibles para hablantes de otras lenguas o para personas con discapacidades auditivas. Además, dichas soluciones podrían facilitar la aplicación de funciones de búsqueda y de análisis tales como clasificación, recomendación o detección de plagios, así como el desarrollo de funcionalidades educativas avanzadas, como por ejemplo la generación de resúmenes automáticos de contenidos para ayudar al estudiante a tomar apuntes. Por este motivo, el principal objetivo de esta tesis es desarrollar una solución de bajo coste capaz de transcribir y traducir vídeo charlas con un nivel de calidad razonable. Más específicamente, abordamos la integración de técnicas estado del arte de Reconocimiento del Habla Automático y Traducción Automática en grandes repositorios de vídeo charlas educativas para la generación de subtítulos multilingües de alta calidad sin requerir intervención humana y con un reducido coste computacional. Además, también exploramos los beneficios potenciales que conllevaría la explotación de la información de la que disponemos a priori sobre estos repositorios, es decir, conocimientos específicos sobre las charlas tales como el locutor, la temática o las transparencias, para crear sistemas de transcripción y traducción especializados mediante técnicas de adaptación masiva. Las soluciones propuestas en esta tesis han sido testeadas en escenarios reales llevando a cabo nombrosas evaluaciones objetivas y subjetivas, obteniendo muy buenos resultados. El principal legado de esta tesis, The transLectures-UPV Platform, ha sido liberado públicamente como software de código abierto, y, en el momento de escribir estas líneas, está sirviendo transcripciones y traducciones automáticas para diversos miles de vídeo charlas educativas en nombrosas universidades e instituciones Españolas y Europeas.
[CAT] Durant aquests darrers anys, els repositoris multimèdia on-line han experimentat un gran creixement que els ha fet consolidar-se com a fonts fonamentals de coneixement, especialment a l'àrea de l'educació, on s'han creat grans repositoris de vídeo xarrades educatives per tal de complementar o inclús reemplaçar els mètodes d'ensenyament tradicionals. No obstant això, la majoria d'aquestes xarrades no estan transcrites ni traduïdes degut a l'absència de solucions de baix cost capaces de fer-ho garantint una qualitat mínima acceptable. Solucions d'aquest tipus són clarament necessàries per a fer que les vídeo xarres siguen més accessibles per a parlants d'altres llengües o per a persones amb discapacitats auditives. A més, aquestes solucions podrien facilitar l'aplicació de funcions de cerca i d'anàlisi tals com classificació, recomanació o detecció de plagis, així com el desenvolupament de funcionalitats educatives avançades, com per exemple la generació de resums automàtics de continguts per ajudar a l'estudiant a prendre anotacions. Per aquest motiu, el principal objectiu d'aquesta tesi és desenvolupar una solució de baix cost capaç de transcriure i traduir vídeo xarrades amb un nivell de qualitat raonable. Més específicament, abordem la integració de tècniques estat de l'art de Reconeixement de la Parla Automàtic i Traducció Automàtica en grans repositoris de vídeo xarrades educatives per a la generació de subtítols multilingües d'alta qualitat sense requerir intervenció humana i amb un reduït cost computacional. A més, també explorem els beneficis potencials que comportaria l'explotació de la informació de la que disposem a priori sobre aquests repositoris, és a dir, coneixements específics sobre les xarrades tals com el locutor, la temàtica o les transparències, per a crear sistemes de transcripció i traducció especialitzats mitjançant tècniques d'adaptació massiva. Les solucions proposades en aquesta tesi han estat testejades en escenaris reals duent a terme nombroses avaluacions objectives i subjectives, obtenint molt bons resultats. El principal llegat d'aquesta tesi, The transLectures-UPV Platform, ha sigut alliberat públicament com a programari de codi obert, i, en el moment d'escriure aquestes línies, està servint transcripcions i traduccions automàtiques per a diversos milers de vídeo xarrades educatives en nombroses universitats i institucions Espanyoles i Europees.
Silvestre Cerdà, JA. (2016). Different Contributions to Cost-Effective Transcription and Translation of Video Lectures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62194
TESIS
APA, Harvard, Vancouver, ISO, and other styles
22

Salimi, Jonni. "Machine Translation Of Fictional And Non-fictional Texts : An examination of Google Translate's accuracy on translation of fictional versus non-fictional texts." Thesis, Stockholms universitet, Engelska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-106670.

Full text
Abstract:
This study focuses on and tries to identify areas where machine translation can be useful by examining translated fictional and non-fictional texts, and the extent to which these different text types are better or worse suited for machine translation.  It additionally evaluates the performance of the free online translation tool Google Translate (GT). The BLEU automatic evaluation metric for machine translation was used for this study, giving a score of 27.75 BLEU value for fictional texts and 32.16 for the non-fictional texts. The non-fictional texts are samples of law documents, (commercial) company reports, social science texts (religion, welfare, astronomy) and medicine. These texts were selected because of their degree of difficulty. The non-fictional sentences are longer than those of the fictional texts and in this regard MT systems have struggled. In spite of having longer sentences, the non-fictional texts got a higher BLUE score than the fictional ones. It is speculated that one reason for the higher score of non-fictional texts might be that more specific terminology is used in these texts, leaving less room for subjective interpretation than for the fictional texts. There are other levels of meaning at work in the fictional texts that the human translator needs to capture.
APA, Harvard, Vancouver, ISO, and other styles
23

Le, Hai Son. "Continuous space models with neural networks in natural language processing." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00776704.

Full text
Abstract:
The purpose of language models is in general to capture and to model regularities of language, thereby capturing morphological, syntactical and distributional properties of word sequences in a given language. They play an important role in many successful applications of Natural Language Processing, such as Automatic Speech Recognition, Machine Translation and Information Extraction. The most successful approaches to date are based on n-gram assumption and the adjustment of statistics from the training data by applying smoothing and back-off techniques, notably Kneser-Ney technique, introduced twenty years ago. In this way, language models predict a word based on its n-1 previous words. In spite of their prevalence, conventional n-gram based language models still suffer from several limitations that could be intuitively overcome by consulting human expert knowledge. One critical limitation is that, ignoring all linguistic properties, they treat each word as one discrete symbol with no relation with the others. Another point is that, even with a huge amount of data, the data sparsity issue always has an important impact, so the optimal value of n in the n-gram assumption is often 4 or 5 which is insufficient in practice. This kind of model is constructed based on the count of n-grams in training data. Therefore, the pertinence of these models is conditioned only on the characteristics of the training text (its quantity, its representation of the content in terms of theme, date). Recently, one of the most successful attempts that tries to directly learn word similarities is to use distributed word representations in language modeling, where distributionally words, which have semantic and syntactic similarities, are expected to be represented as neighbors in a continuous space. These representations and the associated objective function (the likelihood of the training data) are jointly learned using a multi-layer neural network architecture. In this way, word similarities are learned automatically. This approach has shown significant and consistent improvements when applied to automatic speech recognition and statistical machine translation tasks. A major difficulty with the continuous space neural network based approach remains the computational burden, which does not scale well to the massive corpora that are nowadays available. For this reason, the first contribution of this dissertation is the definition of a neural architecture based on a tree representation of the output vocabulary, namely Structured OUtput Layer (SOUL), which makes them well suited for large scale frameworks. The SOUL model combines the neural network approach with the class-based approach. It achieves significant improvements on both state-of-the-art large scale automatic speech recognition and statistical machine translations tasks. The second contribution is to provide several insightful analyses on their performances, their pros and cons, their induced word space representation. Finally, the third contribution is the successful adoption of the continuous space neural network into a machine translation framework. New translation models are proposed and reported to achieve significant improvements over state-of-the-art baseline systems.
APA, Harvard, Vancouver, ISO, and other styles
24

Kervajan, LoÏc. "Contribution à la traduction automatique français/langue des signes française (LSF) au moyen de personnages virtuels : Contribution à la génération automatique de la LSF." Thesis, Aix-Marseille 1, 2011. http://www.theses.fr/2011AIX10172.

Full text
Abstract:
Depuis la loi du 11-02-2005 pour l’égalité des droits et des chances, les lieux ouverts au public doivent accueillir les Sourds en Langue des Signes Française (LSF). C’est dans le cadre du développement d’outils technologiques de diffusion de LSF que nous avons travaillé, plus particulièrement au niveau de la traduction automatique du français écrit vers la LSF. Notre thèse commence par un état de l’art relatif aux connaissances sur la LSF (ressources disponibles et supports d’expression de la LSF) avant d’approfondir des notions de grammaire descriptive. Notre hypothèse de travail est la suivant : la LSF est une langue et, à ce titre, la traduction automatique lui est applicable.Nous décrivons ensuite les spécifications linguistiques pour le traitement automatique, en fonction des observations mises en avant dans l’état de l’art et des propositions de nos informateurs. Nous détaillons notre méthodologie et présentons l’avancée de nos travaux autour de la formalisation des données linguistiques à partir des spécificités de la LSF dont certaines (model verbal, modification adjectivale et adverbiale, organisation des substantifs, problématiques de l’accord) ont nécessité un traitement plus approfondi. Nous présentons le cadre applicatif dans lequel nous avons travaillé : les systèmes de traduction automatique et d’animation de personnage virtuel de France Telecom R&D. Puis, après un rapide état de l’art sur les technologies avatar nous décrivons nos modalités de contrôle du moteur de synthèse de geste grâce au format d’échange mis au point. Enfin, nous terminons par nos évaluations et perspectives de recherche et de développements qui pourront suivre cette Thèse.Notre approche a donné ses premiers résultats puisque nous avons atteint notre objectif de faire fonctionner la chaîne complète de traduction : de la saisie d'un énoncé en français jusqu'à la réalisation de l'énoncé correspondant en LSF par un personnage de synthèse
Since the law was voted the 11-02-2005 for equal rights and opportunities: places open to anyone (public places, shops, internet, etc.) should welcome the Deaf in French Sign Language (FSL). We have worked on the development of technological tools to promote LSF, especially in machine translation from written French to FSL.Our thesis begins with a presentation of knowledge on FSL (theoretical resources and ways to edit FSL) and follows by further concepts of descriptive grammar. Our working hypothesis is: FSL is a language and, therefore, machine translation is relevant.We describe the language specifications for automatic processing, based on scientific knowledge and proposals of our native FSL speaker informants. We also expose our methodology, and do present the advancement of our work in the formalization of linguistic data based on the specificities of FSL which certain (verbs scheme, adjective and adverb modification, organization of nouns, agreement patterns) require further analysis.We do present the application framework in which we worked on: the machine translation system and virtual characters animation system of France Telecom R&D.After a short avatar technology presentation, we explain our control modalities of the gesture synthesis engine through the exchange format that we developed.Finally, we conclude with an evaluation, researches and developments perspectives that could follow this thesis.Our approach has produced its first results since we have achieved our goal of running the full translation chain: from the input of a sentence in French to the realization of the corresponding sentence in FSL with a synthetic character
APA, Harvard, Vancouver, ISO, and other styles
25

Ngo, Ho Anh Khoa. "Generative Probabilistic Alignment Models for Words and Subwords : a Systematic Exploration of the Limits and Potentials of Neural Parametrizations." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG014.

Full text
Abstract:
L'alignement consiste à mettre en correspondance des unités au sein de bitextes, associant un texte en langue source et sa traduction dans une langue cible. L'alignement peut se concevoir à plusieurs niveaux: entre phrases, entre groupes de mots, entre mots, voire à un niveau plus fin lorsque l'une des langues est morphologiquement complexe, ce qui implique d'aligner des fragments de mot (morphèmes). L'alignement peut être envisagé également sur des structures linguistiques plus complexes des arbres ou des graphes. Il s'agit d'une tâche complexe, sous-spécifiée, que les humains réalisent avec difficulté. Son automatisation est un problème exemplaire du traitement des langues, historiquement associé aux premiers modèles de traduction probabilistes. L'arrivée à maturité de nouveaux modèles pour le traitement automatique des langues, reposant sur des représentationts distribuées calculées par des réseaux de neurones permet de reposer la question du calcul de ces alignements. Cette recherche vise donc à concevoir des modèles neuronaux susceptibles d'être appris sans supervision pour dépasser certaines des limitations des modèles d'alignement statistique et améliorer l'état de l'art en matière de précision des alignements automatiques
Alignment consists of establishing a mapping between units in a bitext, combining a text in a source language and its translation in a target language. Alignments can be computed at several levels: between documents, between sentences, between phrases, between words, or even between smaller units end when one of the languages is morphologically complex, which implies to align fragments of words (morphemes). Alignments can also be considered between more complex linguistic structures such as trees or graphs. This is a complex, under-specified task that humans accomplish with difficulty. Its automation is a notoriously difficult problem in natural language processing, historically associated with the first probabilistic word-based translation models. The design of new models for natural language processing, based on distributed representations computed by neural networks, allows us to question and revisit the computation of these alignments. This research project, therefore, aims to comprehensively understand the limitations of existing statistical alignment models and to design neural models that can be learned without supervision to overcome these drawbacks and to improve the state of art in terms of alignment accuracy
APA, Harvard, Vancouver, ISO, and other styles
26

Zamora, Martínez Francisco Julián. "Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática." Doctoral thesis, Universitat Politècnica de València, 2012. http://hdl.handle.net/10251/18066.

Full text
Abstract:
El procesamiento del lenguaje natural es un área de aplicación de la inteligencia artificial, en particular, del reconocimiento de formas que estudia, entre otras cosas, incorporar información sintáctica (modelo de lenguaje) sobre cómo deben juntarse las palabras de una determinada lengua, para así permitir a los sistemas de reconocimiento/traducción decidir cual es la mejor hipótesis �con sentido común�. Es un área muy amplia, y este trabajo se centra únicamente en la parte relacionada con el modelado de lenguaje y su aplicación a diversas tareas: reconocimiento de secuencias mediante modelos ocultos de Markov y traducción automática estadística. Concretamente, esta tesis tiene su foco central en los denominados modelos conexionistas de lenguaje, esto es, modelos de lenguaje basados en redes neuronales. Los buenos resultados de estos modelos en diversas áreas del procesamiento del lenguaje natural han motivado el desarrollo de este estudio. Debido a determinados problemas computacionales que adolecen los modelos conexionistas de lenguaje, los sistemas que aparecen en la literatura se construyen en dos etapas totalmente desacopladas. En la primera fase se encuentra, a través de un modelo de lenguaje estándar, un conjunto de hipótesis factibles, asumiendo que dicho conjunto es representativo del espacio de búsqueda en el cual se encuentra la mejor hipótesis. En segundo lugar, sobre dicho conjunto, se aplica el modelo conexionista de lenguaje y se extrae la hipótesis con mejor puntuación. A este procedimiento se le denomina �rescoring�. Este escenario motiva los objetivos principales de esta tesis: � Proponer alguna técnica que pueda reducir drásticamente dicho coste computacional degradando lo mínimo posible la calidad de la solución encontrada. � Estudiar el efecto que tiene la integración de los modelos conexionistas de lenguaje en el proceso de búsqueda de las tareas propuestas. � Proponer algunas modificaciones del modelo original que permitan mejorar su calidad
Zamora Martínez, FJ. (2012). Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18066
Palancia
APA, Harvard, Vancouver, ISO, and other styles
27

Fränne, Ellen. "Google Traduction et le texte idéologique : dans quelle mesure une traduction automatique transmet-elle le contenu idéologique d'un texte?" Thesis, Linnéuniversitetet, Institutionen för språk (SPR), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-64460.

Full text
Abstract:
Automatic translations, or machine translations, get more and more advanced and common. This paper aims to examine how well Google Traduction works for translating an ideological text. To what extent can a computer program interpret such a text, and render the meaning of complex thoughts and ideas into another language ? In order to study this, UNESCOS World Report Investing in Cultural Diversity and Intercultural Dialogue has been translated from french to swedish, first automatically and then manually. Focusing on denotations, connotations, grammar and style, the two versions have been analysed and compared. The conclusion drawn is that while Google Traduction impresses by its speed and possibilites, editing the automatically translated text in order to correctly transmit the mening and the message of the text to the target language reader, would probably be a more time-consuming process than writing a direct translation manually.
APA, Harvard, Vancouver, ISO, and other styles
28

Ahlert, Hubert. "Um modelo não procedural de especificação e implementação voltado a sistemas transacionais em banco de dados." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1994. http://hdl.handle.net/10183/9006.

Full text
Abstract:
Esta tese de doutorado apresenta um modelo de especificação, textual e grafico, para sistemas transacionais em banco de dados (ER/T+) e, também, um modelo de implementação desta especificação. Sugere uma técnica de proceduralização de especificações declarativas, usando um grafo de dependencia de fluxos de dados para estabelecer a relação de precedecia entre os fluxos do diagrama da linguagem gráfica de especificação. Apresenta, também, os mecanismos de execução da linguagem de especificação proposta e as regras de mapeamento da linguagem de especificação, em seus aspectos estruturais (dados) e comportamentais (transações), para correspondentes construções na linguagem de implementação (C e SQL). Adicionalmente, são discutidos aspectos de otimização de consultas no âmbito da linguagem de especificação de transações e, também, aspectos de aninhamento de consultas para combinar diversos fluxos do diagrama ER/T+ em expressões complexas de consultas SQL.
This Ph.D thesis presents a graphic and textual specification model for database transactions systems (ER/T+) and, also, an implementation model for this specification. Suggest a proceduralization technique for declarative specifications using a data flow dependency graph to establish a precedence relation between the diagram flows of the graphics specification language. Furthermore it presents the execution mechanism of the proposal specification language and the behavioral and structural rules for mapping the specification language into corresponding implementation language (C and SQL) constructions. Additionaly, are discussed query optimization aspects for transaction specification language and aspects of nested queries to combine various ER/T+ diagram flows into complex SQL query expressions
APA, Harvard, Vancouver, ISO, and other styles
29

Kouniali, Samy Habib. "Désambigüisation de groupes nominaux complexes en conformité avec les connaissances du domaine : application a la traduction automatique." Vandoeuvre-les-Nancy, INPL, 1993. http://www.theses.fr/1993INPL096N.

Full text
Abstract:
Un des problèmes majeurs rencontrés en traduction automatique (et plus généralement en traitement automatique multi-domaine du langage naturel) est celui de la surgénération d'interprétations profondes des énoncés lors de l'analyse. Le système dont nous exposons la conception et l'implémentation, et que nous avons nomme syvac (système de validation de la cohérence), est destiné à réduire la surgénération d'interprétations sémantiques des énoncés générés par des analyseurs utilisés en traduction automatique, en éliminant celles qui sont incohérentes par rapport aux connaissances de base, générales et sur le domaine traité par le texte, dont on dispose. A cet effet, nous munissons les mots et les concepts d'un certain nombre de relations conceptuelles potentielles qui décrivent la faculté qu'ont les concepts mis en jeu à s'associer au sein d'une interprétation donnée. Nous montrons enfin qu'en vérifiant la cohérence de plusieurs interprétations d'un énoncé analysé, avec les informations disponibles sur le domaine (relations potentielles et quelques règles de raisonnements simples), syvac arrive à déterminer celles qui ont le plus de chances d'avoir un sens, sans obligatoirement donner une description fine de ce sens. L'implémentation est réalisée en nexpertobjet, ce qui permet une représentation objet des connaissances avec héritage multiple des propriétés, d'une part, et l'emploi d'un moteur d'inférences à base de règles, de l'autre
APA, Harvard, Vancouver, ISO, and other styles
30

Shibata, Danilo Picagli. "Tradução grafema-fonema para a língua portuguesa baseada em autômatos adaptativos." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-30052008-101100/.

Full text
Abstract:
Este trabalho apresenta um estudo sobre a utilização de dispositivos adaptativos para realizar tradução texto-voz. O foco do trabalho é a criação de um método para a tradução grafema-fonema para a língua portuguesa baseado em autômatos adaptativos e seu uso em um software de tradução texto-voz. O método apresentado busca mimetizar o comportamento humano no tratamento de regras de tonicidade, separação de sílabas e as influências que as sílabas exercem sobre suas vizinhas. Essa característica torna o método facilmente utilizável para outras variações da língua portuguesa, considerando que essas características são invariantes em relação à localidade e a época da variedade escolhida. A variação contemporânea da língua falada na cidade de São Paulo foi escolhida como alvo de análise e testes neste trabalho. Para essa variação, o modelo apresenta resultados satisfatórios superando 95% de acerto na tradução grafema-fonema de palavras, chegando a 90% de acerto levando em consideração a resolução de dúvidas geradas por palavras que podem possuir duas representações sonoras e gerando uma saída sonora inteligível aos nativos da língua por meio da síntese por concatenação baseada em sílabas. Como resultado do trabalho, além do modelo para tradução grafema-fonema de palavras baseado em autômatos adaptativos, foi criado um método para escolha da representação fonética correta em caso de ambigüidade e foram criados dois softwares, um para simulação de autômatos adaptativos e outro para a tradução grafema-fonema de palavras utilizando o modelo de tradução criado e o método de escolha da representação correta. Esse último software foi unificado ao sintetizador desenvolvido por Koike et al. (2007) para a criação de um tradutor texto-voz para a língua portuguesa. O trabalho mostra a viabilidade da utilização de autômatos adaptativos como base ou como um elemento auxiliar para o processo de tradução texto-voz na língua portuguesa.
This work presents a study on the use of adaptive devices for text-to-speech translation. The work focuses on the development of a grapheme-phoneme translation method for Portuguese based on Adaptive Automata and the use of this method in a text-to-speech translation software. The presented method resembles human behavior when handling syllable separation rules, syllable stress definition and influences syllables have on each other. This feature makes the method easy to use with different variations of Portuguese, since these characteristics are invariants of the language. Portuguese spoken nowadays in São Paulo, Brazil has been chosen as the target for analysis and tests in this work. The method has good results for such variation of Portuguese, reaching 95% accuracy rate for grapheme-phoneme translation, clearing the 90% mark after resolution of ambiguous cases in which different representations are accepted for a grapheme and generating phonetic output intelligible for native speakers based on concatenation synthesis using syllables as concatenation units. As final results of this work, a model is presented for grapheme-phoneme translation for Portuguese words based on Adaptive Automata, a methodology to choose the correct phonetic representation for the grapheme in ambiguous cases, a software for Adaptive Automata simulation and a software for grapheme-phoneme translation of texts using both the model of translation and methodology for disambiguation. The latter software was unified with the speech synthesizer developed by Koike et al. (2007) to create a text-to-speech translator for Portuguese. This work evidences the feasibility of text-to-speech translation for Portuguese using Adaptive Automata as the main instrument for such task.
APA, Harvard, Vancouver, ISO, and other styles
31

Fehri, Héla. "Reconnaissance automatique des entités nommées arabes et leur traduction vers le français." Thesis, Besançon, 2012. http://www.theses.fr/2012BESA1031/document.

Full text
Abstract:
La traduction des Entités Nommées (EN) est un axe de recherche d'actualité vu la multitude des documents électroniques échangés à travers Internet. Ainsi, le besoin de traiter ces documents par des outils de TALN est devenu nécessaire et intéressant. La modélisation formelle ou semi formelle de ces EN peut intervenir dans les processus de reconnaissance et de traduction. En effet, elle permet de rendre plus fiable la constitution des ressources linquistiques, de limiter l'impact des spécificités linguistiques ct de faciliter les transformations d'une représentation à une autre. Dans ce contexte, nous proposons un outil de reconnaissance ct de traduction vers le français des EN arabes basé essentiellement sur une représentation formelle et sur un ensemble de transducteurs. L'outil prend en compte l'intégration d'un module de translittération. L'implémentation de cet outil a été effectuée en utilisant la plateforme NooJ. Les résultats obtenus sont satisfaisants
The translation of named entities (NEs) is a current research topic with regard to the proliferation of electronic documents exchanged through the Internet. So, the need to process these documents with NLP tools becomes necessary and interesting. Formal or semi-formal modeling of these NEs may intervene in both processes of recognition and translation. Indeed, it makes the accumulation of linguistic resources more reliable, limits the impact of linguistic specificities and facilitates the transformation from one representation to another. In this context, we propose a tool for the recognition and translation of Arabic NEs into French, based primarily on formal .representation and a set of transducers. This tool takes into account the integration of a module of transliteration. Its implementation was performed using the NooJ platform and the results obtained proved to be satisfactory
APA, Harvard, Vancouver, ISO, and other styles
32

Menacer, Mohamed Amine. "Reconnaissance et traduction automatique de la parole de vidéos arabes et dialectales." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0157.

Full text
Abstract:
Les travaux de recherche ont été développés dans le cadre du projet AMIS (Access to Multilingual Information and opinionS) dont l'objectif principal est de développer un système d’aide à la compréhension de vidéos dans des langues étrangères en générant un résumé automatique de ces dernières dans une langue compréhensible par l'utilisateur. Dans le cadre de cette thèse, nous nous sommes concentrés sur la reconnaissance et la traduction automatique de la parole de vidéos arabes et dialectales. Les approches statistiques proposées dans la littérature pour la reconnaissance automatique de la parole (RAP) sont indépendantes de la langue et elles sont applicables à l'arabe standard. Cependant, cette dernière présente quelques caractéristiques que nous devons prendre en considération afin de booster les performances du système de RAP. Parmi ces caractéristiques on peut citer l'absence de l'indication des voyelles dans le texte ce qui rend difficile leur apprentissage par le modèle acoustique. Nous avons proposé plusieurs approches de modélisation acoustique et/ou de langage afin de mieux reconnaître la parole arabe. L'arabe standard n'est pas la langue maternelle, c'est pourquoi dans les conversations quotidiennes, on utilise le dialecte, un arabe inspiré de l'arabe standard, mais pas seulement. Nous avons travaillé sur l'adaptation du système développé pour l'arabe standard au dialecte algérien qui est l'une des variantes de la langue arabe les plus difficiles à reconnaître par les systèmes de RAP. Cela est dû aux mots empruntés d'autres langues, au code-switching et au manque de ressources. Notre proposition pour remédier à ces problèmes est de tirer profit des données orales et textuelles d'autres langues impactant le dialecte. Le texte résultant de la RAP arabe a été utilisé pour la traduction automatique (TA). Nous avons réalisé dans un premier temps une étude comparative entre l'approche statistique à base de segments et l'approche neuronale utilisées dans le cadre de la TA. Ensuite, nous nous sommes intéressés à l’adaptation de ces deux approches pour traduire le texte code-switché. Notre étude portait sur le mélange de l'arabe et de l'anglais dans des documents officiels des nations unies. Pour pallier les différents problèmes dus à la propagation des erreurs dans le système séquentiel, nous avons travaillé sur l'adaptation du vocabulaire du système de RAP et sur la proposition d'une nouvelle modélisation permettant la traduction directe de la parole
This research has been developed in the framework of the project AMIS (Access to Multilingual Information and opinionS). AMIS is an European project which aims to help people to understand the main idea of a video in a foreign language by generating an automatic summary of it. In this thesis, we focus on the automatic recognition and translation of the speech of Arabic and dialectal videos. The statistical approaches proposed in the literature for automatic speech recognition are language independent and they are applicable to modern standard Arabic. However, this language presents some characteristics that we need to take into consideration in order to boost the performance of the speech recognition system. Among these characteristics we can mention the absence of short vowels in the text, which makes their training by the acoustic model difficult. We proposed several approaches to acoustic and/or language modeling in order to better recognize the Arabic speech. In the Arab world, modern standard Arabic is not the mother tongue, that is why daily conversations are carried out with dialect, an Arabic inspired from modern standard Arabic, but not only. We worked on the adaptation of the speech recognition system developed for the modern standard Arabic to the Algerian dialect, which is one of the most difficult variants of the Arabic language to recognize by automatic speech recognition systems. This is mainly due to the borrowed words from other languages, the code-switching and the lack of resources. Our approach to overcome all these problems is to take advantage from oral and textual data of other languages that have an impact on the dialect in order to train the required models for dialect speech recognition. The resulting text from Arabic speech recognition system was then used for machine translation. As a starting point, we conducted a comparative study between the phrase based approach and the neural approach used in machine translation. Then, we adapted these two approaches to translate the code-switched text. Our study focused on the mix of Arabic and English in a parallel corpus extracted from official documents of the United Nations. In order to prevent the error propagation in the pipeline system, we worked on the adaptation of the vocabulary of the automatic speech recognition system and on the proposition of a new model that directly transforms a speech signal in language A into a sequence of words in another language B
APA, Harvard, Vancouver, ISO, and other styles
33

Potet, Marion. "Vers l'intégration de post-éditions d'utilisateurs pour améliorer les systèmes de traduction automatiques probabilistes." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00995104.

Full text
Abstract:
Les technologies de traduction automatique existantes sont à présent vues comme une approche prometteuse pour aider à produire des traductions de façon efficace et à coût réduit. Cependant, l'état de l'art actuel ne permet pas encore une automatisation complète du processus et la coopération homme/machine reste indispensable pour produire des résultats de qualité. Une pratique usuelle consiste à post-éditer les résultats fournis par le système, c'est-à-dire effectuer une vérification manuelle et, si nécessaire, une correction des sorties erronées du système. Ce travail de post-édition effectué par les utilisateurs sur les résultats de traduction automatique constitue une source de données précieuses pour l'analyse et l'adaptation des systèmes. La problématique abordée dans nos travaux s'intéresse à développer une approche capable de tirer avantage de ces retro-actions (ou post-éditions) d'utilisateurs pour améliorer, en retour, les systèmes de traduction automatique. Les expérimentations menées visent à exploiter un corpus d'environ 10 000 hypothèses de traduction d'un système probabiliste de référence, post-éditées par des volontaires, par le biais d'une plateforme en ligne. Les résultats des premières expériences intégrant les post-éditions, dans le modèle de traduction d'une part, et par post-édition automatique statistique d'autre part, nous ont permis d'évaluer la complexité de la tâche. Une étude plus approfondie des systèmes de post-éditions statistique nous a permis d'évaluer l'utilisabilité de tels systèmes ainsi que les apports et limites de l'approche. Nous montrons aussi que les post-éditions collectées peuvent être utilisées avec succès pour estimer la confiance à accorder à un résultat de traduction automatique. Les résultats de nos travaux montrent la difficulté mais aussi le potentiel de l'utilisation de post-éditions d'hypothèses de traduction automatiques comme source d'information pour améliorer la qualité des systèmes probabilistes actuels.
APA, Harvard, Vancouver, ISO, and other styles
34

Ngô, Van Chan. "Formal verification of a synchronous data-flow compiler : from Signal to C." Phd thesis, Université Rennes 1, 2014. http://tel.archives-ouvertes.fr/tel-01067477.

Full text
Abstract:
Synchronous languages such as Signal, Lustre and Esterel are dedicated to designing safety-critical systems. Their compilers are large and complicated programs that may be incorrect in some contexts, which might produce silently bad compiled code when compiling source programs. The bad compiled code can invalidate the safety properties that are guaranteed on the source programs by applying formal methods. Adopting the translation validation approach, this thesis aims at formally proving the correctness of the highly optimizing and industrial Signal compiler. The correctness proof represents both source program and compiled code in a common semantic framework, then formalizes a relation between the source program and its compiled code to express that the semantics of the source program are preserved in the compiled code.
APA, Harvard, Vancouver, ISO, and other styles
35

Faab, Gertrud. "A morphosyntacic description of Northern Sotho as a basis for an automated translation from Northern Sotho into English." Thesis, 2010. http://hdl.handle.net/2263/28569.

Full text
Abstract:
This PhD thesis provides a morpho-syntactic description of Northern Sotho from a computational perspective. While a number of publications describe morphological and syntactical aspects of this language, may it be in the form of prescriptive study books (inter alia Lombard (1985); Van Wyk et al. (1992); Poulos and Louwrens (1994)) or of descriptive articles in linguistic journals or conference proceedings (inter alia Anderson and Kotz´e (2006); Kosch (2006); De Schryver and Taljard (2006)), so far no comprehensive description is available that would provide a basis for developing a rule-based parser to analyse Northern Sotho on sentence level. This study attempts to fill the gap by describing a substantial grammar fragment. Therefore, Northern Sotho morpho-syntactic phenomena are explored which results in the following descriptions:
  • language units of Northern Sotho are identified, i.e. the tokens and words that form the language. These are sorted into word class categories (parts of speech), using the descriptions of Taljard et al. (2008) as a basis;
  • the formal relationships between these units, wherever possible on the level of parts of speech, are described in the form of productive morpho-syntactic phrase grammar rules. These rules are defined within the framework of generative grammar.
Additionally, an attempt is made to find generalisations on the contextual distribution of the many items contained in verbs which are polysemous in terms of their parts of speech. The grammar rules described in the preceding chapter are now explored in order to find patterns in the co-occurrence of parts of speech leading towards a future, more general linguistic modelling of Northern Sotho verbs. It is also shown how a parser could work his way step-by-step doing an analysis of a complete sentence making use of a lexicon and the rules developed here. We have also implemented some relevant phrase grammar rules as a constraint-based grammar fragment, in line with the theory of Lexical-Functional Grammar (Kaplan and Bresnan, 1982). Here, we utilized the Xerox Linguistic Environment (XLE) with the friendly permission of the Xerox Palo Alto Research Centre (PARC). Lastly, the study contains some basic definitions for a proposed machine translation (MT) into English attempting to support the development of MT-rules. An introduction to MT and a first contrastive description of phenomena of both languages is provided.
Thesis (PhD)--University of Pretoria, 2010.
African Languages
unrestricted
APA, Harvard, Vancouver, ISO, and other styles
36

Nemutamvuni, Mulalo Edward. "Investigating the effectiveness of available tools for translating into tshiVenda." Diss., 2018. http://hdl.handle.net/10500/25563.

Full text
Abstract:
Text in English
Abstracts in English and Venda
This study has investigated the effectiveness of available tools used for translating from English into Tshivenḓa and vice versa with the aim to investigate and determine the effectiveness of these tools. This study dealt with the problem of lack of effective translation tools used to translate between English and Tshivenḓa. Tshivenḓa is one of South Africa’s minority languages. Its (Tshivenḓa) lack of effective translation tools negatively affects language practitioners’ work. This situation is perilous for translation quality assurance. Translation tools, both computer technology and non-computer technology tools abound for developed languages such as English, French and others. Based on the results of this research project, the researcher did make recommendations that could remedy the situation. South Africa is a democratic country that has a number of language-related policies. This then creates a conducive context for stakeholders with language passion to fully develop Tshivenḓa language in all dimensions. The fact is that all languages have evolved and they were all underdeveloped. This vividly shows that Tshivenḓa language development is also possible just like Afrikaans, which never existed on earth before 1652. It (Afrikaans) has evolved and overtaken all indigenous South African languages. This study did review the literature regarding translation and translation tools. The literature was obtained from both published and unpublished sources. The study has used mixed methods research, i.e. quantitative and qualitative research methods. These methods successfully complemented each other throughout the entire research. Data were gathered through questionnaires and interviews wherein both open and closed-ended questions were employed. Both purposive/judgemental and snowball (chain) sampling have been applied in this study. Data analysis was addressed through a combination of methods owing to the nature of mixed methods research. Guided by analytic comparison approach when grouping together related data during data analysis and presentation, both statistical and textual analyses have been vital in this study. Themes were constructed to lucidly present the gathered data. At the last chapters, the researcher discussed the findings and evaluated the entire research before making recommendations and conclusion.
Iyi ṱhoḓisiso yo ita tsedzuluso nga ha kushumele kwa zwishumiswa zwi re hone zwine zwa shumiswa u pindulela u bva kha luambo lwa English u ya kha Tshivenḓa na u bva kha Tshivenḓa u ya kha English ndivho I ya u sedzulusa na u lavhelesa kushumele kwa izwi zwishumiswa uri zwi a thusa naa. Ino ṱhoḓisiso yo shumana na thaidzo ya ṱhahelelo ya zwishumiswa zwa u pindulela zwine zwa shumiswa musi hu tshi pindulelwa vhukati ha English na Tshivenḓa. Tshivenḓa ndi luṅwe lwa nyambo dza Afrika Tshipembe dzine dza ambiwa nga vhathu vha si vhanzhi. U shaea ha zwishumiswa zwa u pindulela zwine zwa shuma nga nḓila I thusaho zwi kwama mushumo wa vhashumi vha zwa nyambo nga nḓila I si yavhuḓi. Iyi nyimele I na mulingo u kwamaho khwaḽithi ya zwo pindulelwaho. Zwishumiswa zwa u pindulela, zwa thekhnoḽodzhi ya khomphiyutha na zwi sa shumisi thekhnoḽodzhi ya khomphiyutha zwo ḓalesa kha nyambo dzo bvelelaho u tou fana na kha English, French na dziṅwe. Zwo sendeka kha mvelelo dza ino thandela ya ṱhoḓisiso, muṱoḓisisi o ita themendelo dzine dza nga fhelisa thaidzo ya nyimele. Afrika Tshipembe ndi shango ḽa demokirasi ḽine ḽa vha na mbekanyamaitele dzo vhalaho nga ha dzinyambo. Izwi zwi ita uri hu vhe na nyimele ine vhafaramikovhe vhane vha funesa nyambo vha kone u bveledza Tshivenḓa kha masia oṱhe. Zwavhukuma ndi zwa uri nyambo dzoṱhe dzi na mathomo nahone dzoṱhe dzo vha dzi songo bvelela. Izwi zwi ita uri zwi vhe khagala uri luambo lwa Tshivenḓa na lwone lu nga bveledzwa u tou fana na luambo lwa Afrikaans lwe lwa vha lu si ho ḽifhasini phanḓa ha ṅwaha wa 1652. Ulu luambo (Afrikaans) lwo vha hone shangoni lwa mbo bveledzwa lwa fhira nyambo dzoṱhe dza fhano hayani Afrika Tshipembe. Kha ino ṱhoḓisiso ho vhaliwa maṅwalwa ane a amba nga ha u pindulela na nga ha zwishumiswa zwa u pindulela. Maṅwalwa e a vhalwa o wanala kha zwiko zwo kanḓiswaho na zwiko zwi songo kanḓiswaho. Ino ṱhoḓisiso yo shumisa ngona dza ṱhoḓisiso dzo ṱanganyiswaho, idzo ngona ndi khwanthithethivi na khwaḽithethivi. Idzi ngona dzo shumisana zwavhuḓisa kha ṱhoḓisiso yoṱhe. Data yo kuvhanganywa hu tshi khou shumiswa dzimbudziso na u tou vhudzisa hune afho ho shumiswa mbudziso dzo vuleaho na dzo valeaho. Ngona dza u nanga sambula muṱoḓisisi o shumisa khaṱulo yawe uri ndi nnyi ane a nga vha a na data yo teaho na u humbela vhavhudziswa uri vha bule vhaṅwe vhathu vha re na data yo teaho ino ṱhoḓisiso. viii Tsenguluso ya data ho ṱanganyiswa ngona dza u sengulusa zwo itiswa ngauri ṱhoḓisiso ino yo ṱanganyisa ngona dza u ita ṱhoḓisiso. Sumbanḓila ho shumiswa tsenguluso ya mbambedzo kha u sengulusa data. Data ine ya fana yo vhewa fhethu huthihi musi hu tshi khou senguluswa na u vhiga. Tsenguluso I shumisaho mbalo/tshivhalo (khwanthithethivi) na I shumisaho maipfi kha ino ngudo dzo shumiswa. Ho vhumbiwa dziṱhoho u itela u ṱana data ye ya kuvhanganywa. Ngei kha ndima dza u fhedza, muṱodisisi o rera nga ha mawanwa, o ṱhaṱhuvha ṱhoḓisiso yoṱhe phanḓa ha u ita themendelo na u vhina.
African Languages
M.A. (African Languages)
APA, Harvard, Vancouver, ISO, and other styles
37

Wang, Lih-der, and 王立德. "The Implementation of Automated Translator from Programming Language to Timed Automata." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/y8d82v.

Full text
Abstract:
碩士
國立臺灣大學
電機工程學研究所
92
General purpose programming languages are rich in functionalities and therefore have complicated structures. Hence verifying software programs written in such programming languages can be very difficult. Even if the technical specifications had been verified to be error-free, there may still be errors introduced by the actual implementation. It is important that the verification is performed on the final program. Based on the work flow, a program can be considered as a real-time system with many processes, where each process is represented by a fragment of the original program. According to the grammars of the specific programming language, an automated translator can be used to translate the programs into corresponding formal verification models. The whole program can then be verified by applying the standard verification techniques to each model individually. The automation of translation can simplify the task of creating a formal model and identify potential errors in the implementation. We will present our implementation of a program that translates basic C programs into timed automata described by Red.
APA, Harvard, Vancouver, ISO, and other styles
38

"Semi-automatic grammar induction for bidirectional machine translation." 2002. http://library.cuhk.edu.hk/record=b5895983.

Full text
Abstract:
Wong, Chin Chung.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.
Includes bibliographical references (leaves 137-143).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Objectives --- p.3
Chapter 1.2 --- Thesis Outline --- p.5
Chapter 2 --- Background in Natural Language Understanding --- p.6
Chapter 2.1 --- Rule-based Approaches --- p.7
Chapter 2.2 --- Corpus-based Approaches --- p.8
Chapter 2.2.1 --- Stochastic Approaches --- p.8
Chapter 2.2.2 --- Phrase-spotting Approaches --- p.9
Chapter 2.3 --- The ATIS Domain --- p.10
Chapter 2.3.1 --- Chinese Corpus Preparation --- p.11
Chapter 3 --- Semi-automatic Grammar Induction - Baseline Approach --- p.13
Chapter 3.1 --- Background in Grammar Induction --- p.13
Chapter 3.1.1 --- Simulated Annealing --- p.14
Chapter 3.1.2 --- Bayesian Grammar Induction --- p.14
Chapter 3.1.3 --- Probabilistic Grammar Acquisition --- p.15
Chapter 3.2 --- Semi-automatic Grammar Induction 一 Baseline Approach --- p.16
Chapter 3.2.1 --- Spatial Clustering --- p.16
Chapter 3.2.2 --- Temporal Clustering --- p.18
Chapter 3.2.3 --- Post-processing --- p.19
Chapter 3.2.4 --- Four Aspects for Enhancements --- p.20
Chapter 3.3 --- Chapter Summary --- p.22
Chapter 4 --- Semi-automatic Grammar Induction - Enhanced Approach --- p.23
Chapter 4.1 --- Evaluating Induced Grammars --- p.24
Chapter 4.2 --- Stopping Criterion --- p.26
Chapter 4.2.1 --- Cross-checking with Recall Values --- p.29
Chapter 4.3 --- Improvements on Temporal Clustering --- p.32
Chapter 4.3.1 --- Evaluation --- p.39
Chapter 4.4 --- Improvements on Spatial Clustering --- p.46
Chapter 4.4.1 --- Distance Measures --- p.48
Chapter 4.4.2 --- Evaluation --- p.57
Chapter 4.5 --- Enhancements based on Intelligent Selection --- p.62
Chapter 4.5.1 --- Informed Selection between Spatial Clustering and Tem- poral Clustering --- p.62
Chapter 4.5.2 --- Selecting the Number of Clusters Per Iteration --- p.64
Chapter 4.5.3 --- An Example for Intelligent Selection --- p.64
Chapter 4.5.4 --- Evaluation --- p.68
Chapter 4.6 --- Chapter Summary --- p.71
Chapter 5 --- Bidirectional Machine Translation using Induced Grammars ´ؤBaseline Approach --- p.73
Chapter 5.1 --- Background in Machine Translation --- p.75
Chapter 5.1.1 --- Rule-based Machine Translation --- p.75
Chapter 5.1.2 --- Statistical Machine Translation --- p.76
Chapter 5.1.3 --- Knowledge-based Machine Translation --- p.77
Chapter 5.1.4 --- Example-based Machine Translation --- p.78
Chapter 5.1.5 --- Evaluation --- p.79
Chapter 5.2 --- Baseline Configuration on Bidirectional Machine Translation System --- p.84
Chapter 5.2.1 --- Bilingual Dictionary --- p.84
Chapter 5.2.2 --- Concept Alignments --- p.85
Chapter 5.2.3 --- Translation Process --- p.89
Chapter 5.2.4 --- Two Aspects for Enhancements --- p.90
Chapter 5.3 --- Chapter Summary --- p.91
Chapter 6 --- Bidirectional Machine Translation ´ؤ Enhanced Approach --- p.92
Chapter 6.1 --- Concept Alignments --- p.93
Chapter 6.1.1 --- Enhanced Alignment Scheme --- p.95
Chapter 6.1.2 --- Experiment --- p.97
Chapter 6.2 --- Grammar Checker --- p.100
Chapter 6.2.1 --- Components for Grammar Checking --- p.101
Chapter 6.3 --- Evaluation --- p.117
Chapter 6.3.1 --- Bleu Score Performance --- p.118
Chapter 6.3.2 --- Modified Bleu Score --- p.122
Chapter 6.4 --- Chapter Summary --- p.130
Chapter 7 --- Conclusions --- p.131
Chapter 7.1 --- Summary --- p.131
Chapter 7.2 --- Contributions --- p.134
Chapter 7.3 --- Future work --- p.136
Bibliography --- p.137
Chapter A --- Original SQL Queries --- p.144
Chapter B --- Seeded Categories --- p.146
Chapter C --- 3 Alignment Categories --- p.147
Chapter D --- Labels of Syntactic Structures in Grammar Checker --- p.148
APA, Harvard, Vancouver, ISO, and other styles
39

"Automatic construction of English/Chinese parallel corpus." 2001. http://library.cuhk.edu.hk/record=b5890676.

Full text
Abstract:
Li Kar Wing.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 88-96).
Abstracts in English and Chinese.
ABSTRACT --- p.i
ACKNOWLEDGEMENTS --- p.v
LIST OF TABLES --- p.viii
LIST OF FIGURES --- p.ix
CHAPTERS
Chapter 1. --- INTRODUCTION --- p.1
Chapter 1.1 --- Application of corpus-based techniques --- p.2
Chapter 1.1.1 --- Machine Translation (MT) --- p.2
Chapter 1.1.1.1 --- Linguistic --- p.3
Chapter 1.1.1.2 --- Statistical --- p.4
Chapter 1.1.1.3 --- Lexicon construction --- p.4
Chapter 1.1.2 --- Cross-lingual Information Retrieval (CLIR) --- p.6
Chapter 1.1.2.1 --- Controlled vocabulary --- p.6
Chapter 1.1.2.2 --- Free text --- p.7
Chapter 1.1.2.3 --- Application corpus-based approach in CLIR --- p.9
Chapter 1.2 --- Overview of linguistic resources --- p.10
Chapter 1.3 --- Written language corpora --- p.12
Chapter 1.3.1 --- Types of corpora --- p.13
Chapter 1.3.2 --- Limitation of comparable corpora --- p.16
Chapter 1.4 --- Outline of the dissertation --- p.17
Chapter 2. --- LITERATURE REVIEW --- p.19
Chapter 2.1 --- Research in automatic corpus construction --- p.20
Chapter 2.2 --- Research in translation alignment --- p.25
Chapter 2.2.1 --- Sentence alignment --- p.27
Chapter 2.2.2 --- Word alignment --- p.28
Chapter 2.3 --- Research in alignment of sequences --- p.33
Chapter 3. --- ALIGNMENT AT WORD LEVEL AND CHARACTER LEVEL --- p.35
Chapter 3.1 --- Title alignment --- p.35
Chapter 3.1.1 --- Lexical features --- p.37
Chapter 3.1.2 --- Grammatical features --- p.40
Chapter 3.1.3 --- The English/Chinese alignment model --- p.41
Chapter 3.2 --- Alignment at word level and character level --- p.42
Chapter 3.2.1 --- Alignment at word level --- p.42
Chapter 3.2.2 --- Alignment at character level: Longest matching --- p.44
Chapter 3.2.3 --- Longest common subsequence(LCS) --- p.46
Chapter 3.2.4 --- Applying LCS in the English/Chinese alignment model --- p.48
Chapter 3.3 --- Reduce overlapping ambiguity --- p.52
Chapter 3.3.1 --- Edit distance --- p.52
Chapter 3.3.2 --- Overlapping in the algorithm model --- p.54
Chapter 4. --- ALIGNMENT AT TITLE LEVEL --- p.59
Chapter 4.1 --- Review of score functions --- p.59
Chapter 4.2 --- The Score function --- p.60
Chapter 4.2.1 --- (C matches E) and (E matches C) --- p.60
Chapter 4.2.2 --- Length similarity --- p.63
Chapter 5. --- EXPERIMENTAL RESULTS --- p.69
Chapter 5.1 --- Hong Kong government press release articles --- p.69
Chapter 5.2 --- Hang Seng Bank economic monthly reports --- p.76
Chapter 5.3 --- Hang Seng Bank press release articles --- p.78
Chapter 5.4 --- Hang Seng Bank speech articles --- p.81
Chapter 5.5 --- Quality of the collections and future work --- p.84
Chapter 6. --- CONCLUSION --- p.87
Bibliography
APA, Harvard, Vancouver, ISO, and other styles
40

Büchse, Matthias. "Algebraic decoder specification: coupling formal-language theory and statistical machine translation: Algebraic decoder specification: coupling formal-language theory and statistical machine translation." Doctoral thesis, 2014. https://tud.qucosa.de/id/qucosa%3A28493.

Full text
Abstract:
The specification of a decoder, i.e., a program that translates sentences from one natural language into another, is an intricate process, driven by the application and lacking a canonical methodology. The practical nature of decoder development inhibits the transfer of knowledge between theory and application, which is unfortunate because many contemporary decoders are in fact related to formal-language theory. This thesis proposes an algebraic framework where a decoder is specified by an expression built from a fixed set of operations. As yet, this framework accommodates contemporary syntax-based decoders, it spans two levels of abstraction, and, primarily, it encourages mutual stimulation between the theory of weighted tree automata and the application.
APA, Harvard, Vancouver, ISO, and other styles
41

Polák, Peter. "Strojový překlad mluvené řeči přes fonetickou reprezentaci zdrojové řeči." Master's thesis, 2020. http://www.nusl.cz/ntk/nusl-416019.

Full text
Abstract:
We refactor the traditional two-step approach of automatic speech recognition for spoken language translation. Instead of conventional graphemes, we use phonemes as an intermediate speech representation. Starting with the acoustic model, we revise the cross-lingual transfer and propose a coarse-to-fine method providing further speed-up and performance boost. Further, we review the translation model. We experiment with source and target encoding, boosting the robustness by utilizing the fine-tuning and transfer across ASR and SLT. We empirically document that this conventional setup with an alternative representation not only performs well on standard test sets but also provides robust transcripts and translations on challenging (e.g., non-native) test sets. Notably, our ASR system outperforms commercial ASR systems. 1
APA, Harvard, Vancouver, ISO, and other styles
42

"Semi-automatic acquisition of domain-specific semantic structures." 2000. http://library.cuhk.edu.hk/record=b5890445.

Full text
Abstract:
Siu, Kai-Chung.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.
Includes bibliographical references (leaves 99-106).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Thesis Outline --- p.5
Chapter 2 --- Background --- p.6
Chapter 2.1 --- Natural Language Understanding --- p.6
Chapter 2.1.1 --- Rule-based Approaches --- p.7
Chapter 2.1.2 --- Stochastic Approaches --- p.8
Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.9
Chapter 2.2 --- Grammar Induction --- p.10
Chapter 2.2.1 --- Semantic Classification Trees --- p.11
Chapter 2.2.2 --- Simulated Annealing --- p.12
Chapter 2.2.3 --- Bayesian Grammar Induction --- p.12
Chapter 2.2.4 --- Statistical Grammar Induction --- p.13
Chapter 2.3 --- Machine Translation --- p.14
Chapter 2.3.1 --- Rule-based Approach --- p.15
Chapter 2.3.2 --- Statistical Approach --- p.15
Chapter 2.3.3 --- Example-based Approach --- p.16
Chapter 2.3.4 --- Knowledge-based Approach --- p.16
Chapter 2.3.5 --- Evaluation Method --- p.19
Chapter 3 --- Semi-Automatic Grammar Induction --- p.20
Chapter 3.1 --- Agglomerative Clustering --- p.20
Chapter 3.1.1 --- Spatial Clustering --- p.21
Chapter 3.1.2 --- Temporal Clustering --- p.24
Chapter 3.1.3 --- Free Parameters --- p.26
Chapter 3.2 --- Post-processing --- p.27
Chapter 3.3 --- Chapter Summary --- p.29
Chapter 4 --- Application to the ATIS Domain --- p.30
Chapter 4.1 --- The ATIS Domain --- p.30
Chapter 4.2 --- Parameters Selection --- p.32
Chapter 4.3 --- Unsupervised Grammar Induction --- p.35
Chapter 4.4 --- Prior Knowledge Injection --- p.40
Chapter 4.5 --- Evaluation --- p.43
Chapter 4.5.1 --- Parse Coverage in Understanding --- p.45
Chapter 4.5.2 --- Parse Errors --- p.46
Chapter 4.5.3 --- Analysis --- p.47
Chapter 4.6 --- Chapter Summary --- p.49
Chapter 5 --- Portability to Chinese --- p.50
Chapter 5.1 --- Corpus Preparation --- p.50
Chapter 5.1.1 --- Tokenization --- p.51
Chapter 5.2 --- Experiments --- p.52
Chapter 5.2.1 --- Unsupervised Grammar Induction --- p.52
Chapter 5.2.2 --- Prior Knowledge Injection --- p.56
Chapter 5.3 --- Evaluation --- p.58
Chapter 5.3.1 --- Parse Coverage in Understanding --- p.59
Chapter 5.3.2 --- Parse Errors --- p.60
Chapter 5.4 --- Grammar Comparison Across Languages --- p.60
Chapter 5.5 --- Chapter Summary --- p.64
Chapter 6 --- Bi-directional Machine Translation --- p.65
Chapter 6.1 --- Bilingual Dictionary --- p.67
Chapter 6.2 --- Concept Alignments --- p.68
Chapter 6.3 --- Translation Procedures --- p.73
Chapter 6.3.1 --- The Matching Process --- p.74
Chapter 6.3.2 --- The Searching Process --- p.76
Chapter 6.3.3 --- Heuristics to Aid Translation --- p.81
Chapter 6.4 --- Evaluation --- p.82
Chapter 6.4.1 --- Coverage --- p.83
Chapter 6.4.2 --- Performance --- p.86
Chapter 6.5 --- Chapter Summary --- p.89
Chapter 7 --- Conclusions --- p.90
Chapter 7.1 --- Summary --- p.90
Chapter 7.2 --- Future Work --- p.92
Chapter 7.2.1 --- Suggested Improvements on Grammar Induction Process --- p.92
Chapter 7.2.2 --- Suggested Improvements on Bi-directional Machine Trans- lation --- p.96
Chapter 7.2.3 --- Domain Portability --- p.97
Chapter 7.3 --- Contributions --- p.97
Bibliography --- p.99
Chapter A --- Original SQL Queries --- p.107
Chapter B --- Induced Grammar --- p.109
Chapter C --- Seeded Categories --- p.111
APA, Harvard, Vancouver, ISO, and other styles
43

Sefara, Tshephisho Joseph. "The development of an automatic pronunciation assistant." Thesis, 2019. http://hdl.handle.net/10386/2906.

Full text
Abstract:
Thesis (M. Sc. (Computer Science)) -- University of Limpopo, 2019
The pronunciation of words and phrases in any language involves careful manipulation of linguistic features. Factors such as age, motivation, accent, phonetics, stress and intonation sometimes cause a problem of inappropriate or incorrect pronunciation of words from non-native languages. Pronunciation of words using different phonological rules has a tendency of changing the meaning of those words. This study presents the development of an automatic pronunciation assistant system for under-resourced languages of Limpopo Province, namely, Sepedi, Xitsonga, Tshivenda and isiNdebele. The aim of the proposed system is to help non-native speakers to learn appropriate and correct pronunciation of words/phrases in these under-resourced languages. The system is composed of a language identification module on the front-end side and a speech synthesis module on the back-end side. A support vector machine was compared to the baseline multinomial naive Bayes to build the language identification module. The language identification phase performs supervised multiclass text classification to predict a person’s first language based on input text before the speech synthesis phase continues with pronunciation issues using the identified language. The speech synthesis on the back-end phase is composed of four baseline text-to-speech synthesis systems in selected target languages. These text-to-speech synthesis systems were based on the hidden Markov model method of development. Subjective listening tests were conducted to evaluate the performance of the quality of the synthesised speech using a mean opinion score test. The mean opinion score test obtained good performance results on all targeted languages for naturalness, pronunciation, pleasantness, understandability, intelligibility, overall quality of the system and user acceptance. The developed system has been implemented on a “real-live” production web-server for performance evaluation and stability testing using live data.
APA, Harvard, Vancouver, ISO, and other styles
44

Ouyang, Jessica Jin. "Adapting Automatic Summarization to New Sources of Information." Thesis, 2019. https://doi.org/10.7916/d8-5nar-6b61.

Full text
Abstract:
English-language news articles are no longer necessarily the best source of information. The Web allows information to spread more quickly and travel farther: first-person accounts of breaking news events pop up on social media, and foreign-language news articles are accessible to, if not immediately understandable by, English-speaking users. This thesis focuses on developing automatic summarization techniques for these new sources of information. We focus on summarizing two specific new sources of information: personal narratives, first-person accounts of exciting or unusual events that are readily found in blog entries and other social media posts, and non-English documents, which must first be translated into English, often introducing translation errors that complicate the summarization process. Personal narratives are a very new area of interest in natural language processing research, and they present two key challenges for summarization. First, unlike many news articles, whose lead sentences serve as summaries of the most important ideas in the articles, personal narratives provide no such shortcuts for determining where important information occurs in within them; second, personal narratives are written informally and colloquially, and unlike news articles, they are rarely edited, so they require heavier editing and rewriting during the summarization process. Non-English documents, whether news or narrative, present yet another source of difficulty on top of any challenges inherent to their genre: they must be translated into English, potentially introducing translation errors and disfluencies that must be identified and corrected during summarization. The bulk of this thesis is dedicated to addressing the challenges of summarizing personal narratives found on the Web. We develop a two-stage summarization system for personal narrative that first extracts sentences containing important content and then rewrites those sentences into summary-appropriate forms. Our content extraction system is inspired by contextualist narrative theory, using changes in writing style throughout a narrative to detect sentences containing important information; it outperforms both graph-based and neural network approaches to sentence extraction for this genre. Our paraphrasing system rewrites the extracted sentences into shorter, standalone summary sentences, learning to mimic the paraphrasing choices of human summarizers more closely than can traditional lexicon- or translation-based paraphrasing approaches. We conclude with a chapter dedicated to summarizing non-English documents written in low-resource languages – documents that would otherwise be unreadable for English-speaking users. We develop a cross-lingual summarization system that performs even heavier editing and rewriting than does our personal narrative paraphrasing system; we create and train on large amounts of synthetic errorful translations of foreign-language documents. Our approach produces fluent English summaries from disdisfluent translations of non-English documents, and it generalizes across languages.
APA, Harvard, Vancouver, ISO, and other styles
45

Quernheim, Daniel. "Bimorphism Machine Translation." Doctoral thesis, 2016. https://ul.qucosa.de/id/qucosa%3A15589.

Full text
Abstract:
The field of statistical machine translation has made tremendous progress due to the rise of statistical methods, making it possible to obtain a translation system automatically from a bilingual collection of text. Some approaches do not even need any kind of linguistic annotation, and can infer translation rules from raw, unannotated data. However, most state-of-the art systems do linguistic structure little justice, and moreover many approaches that have been put forward use ad-hoc formalisms and algorithms. This inevitably leads to duplication of effort, and a separation between theoretical researchers and practitioners. In order to remedy the lack of motivation and rigor, the contributions of this dissertation are threefold: 1. After laying out the historical background and context, as well as the mathematical and linguistic foundations, a rigorous algebraic model of machine translation is put forward. We use regular tree grammars and bimorphisms as the backbone, introducing a modular architecture that allows different input and output formalisms. 2. The challenges of implementing this bimorphism-based model in a machine translation toolkit are then described, explaining in detail the algorithms used for the core components. 3. Finally, experiments where the toolkit is applied on real-world data and used for diagnostic purposes are described. We discuss how we use exact decoding to reason about search errors and model errors in a popular machine translation toolkit, and we compare output formalisms of different generative capacity.
APA, Harvard, Vancouver, ISO, and other styles
46

van, Merriënboer Bart. "Sequence-to-sequence learning for machine translation and automatic differentiation for machine learning software tools." Thèse, 2018. http://hdl.handle.net/1866/21743.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Variš, Dušan. "Automatická korektura chyb ve výstupu strojového překladu." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-352609.

Full text
Abstract:
We present MLFix, an automatic statistical post-editing system, which is a spiritual successor to the rule- based system, Depfix. The aim of this thesis was to investigate the possible approaches to automatic identification of the most common morphological errors produced by the state-of-the-art machine translation systems and to train sufficient statistical models built on the acquired knowledge. We performed both automatic and manual evaluation of the system and compared the results with Depfix. The system was mainly developed on the English-to- Czech machine translation output, however, the aim was to generalize the post-editing process so it can be applied to other language pairs. We modified the original pipeline to post-edit English-German machine translation output and performed additional evaluation of this modification. Powered by TCPDF (www.tcpdf.org)
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography