Dissertations / Theses: 'Parallel corpus'

1

Adesam, Yvonne. "The Multilingual Forest : Investigating High-quality Parallel Corpus Development." Doctoral thesis, Stockholms universitet, Institutionen för lingvistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-79076.

Full text

Abstract:

This thesis explores the development of parallel treebanks, collections of language data consisting of texts and their translations, with syntactic annotation and alignment, linking words, phrases, and sentences to show translation equivalence. We describe the semi-manual annotation of the SMULTRON parallel treebank, consisting of 1,000 sentences in English, German and Swedish. This description is the starting point for answering the first of two questions in this thesis. What issues need to be considered to achieve a high-quality, consistent,parallel treebank? The units of annotation and the choice of annotation schemes are crucial for quality, and some automated processing is necessary to increase the size. Automatic quality checks and evaluation are essential, but manual quality control is still needed to achieve high quality. Additionally, we explore improving the automatically created annotation for one language, using information available from the annotation of the other languages. This leads us to the second of the two questions in this thesis. Can we improve automatic annotation by projecting information available in the other languages? Experiments with automatic alignment, which is projected from two language pairs, L1–L2 and L1–L3, onto the third pair, L2–L3, show an improvement in precision, in particular if the projected alignment is intersected with the system alignment. We also construct a test collection for experiments on annotation projection to resolve prepositional phrase attachment ambiguities. While majority vote projection improves the annotation, compared to the basic automatic annotation, using linguistic clues to correct the annotation before majority vote projection is even better, although more laborious. However, some structural errors cannot be corrected by projection at all, as different languages have different wording, and thus different structures.
I denna doktorsavhandling utforskas skapandet av parallella trädbanker. Dessa är språkliga data som består av texter och deras översättningar, som har märkts upp med syntaktisk information samt länkar mellan ord, fraser och meningar som motsvarar varandra i översättningarna. Vi beskriver den delvis manuella uppmärkningen av den parallella trädbanken SMULTRON, med 1.000 engelska, tyska och svenska meningar. Denna beskrivning är utgångspunkt för att besvara den första av två frågor i avhandlingen. Vilka frågor måste beaktas för att skapa en högkvalitativ parallell trädbank? De enheter som märks upp samt valet av uppmärkningssystemet är viktiga för kvaliteten, och en viss andel automatisk bearbetning är nödvändig för att utöka storleken. Automatiska kvalitetskontroller och automatisk utvärdering är av vikt, men viss manuell granskning är nödvändig för att uppnå hög kvalitet. Vidare utforskar vi att använda information som finns i uppmärkningen, för att förbättra den automatiskt skapade uppmärkningen för ett annat språk. Detta leder oss till den andra av de två frågorna i avhandlingen. Kan vi förbättra automatisk uppmärkning genom att överföra information som finns i de andra språken? Experimenten visar att automatisk länkning som överförs från två språkpar, L1–L2 och L1–L3, till det tredje språkparet, L2–L3, får förbättrad precision, framför allt för skärningspunkten mellan den överförda länkningen och den automatiska länkningen. Vi skapar även en testsamling för experiment med överföring av uppmärkning för att lösa upp strukturella flertydigheter hos prepositionsfraser. Överföring enligt majoritetsprincipen förbättrar uppmärkningen, jämfört med den grundläggande automatiska uppmärkningen, men att använda språkliga ledtrådar för att korrigera uppmärkningen innan majoritetsöverföring är ännu bättre, om än mer arbetskrävande. Vissa felaktiga strukturer kan dock inte korrigeras med hjälp av överföring, eftersom de olika språken använder olika formuleringar, och därmed har olika strukturer.

APA, Harvard, Vancouver, ISO, and other styles

2

Cho, Joon-Hyung. "Analyse textométrique des corpus parallèles francais-coréens." Thesis, Paris 3, 2010. http://www.theses.fr/2010PA030012.

Full text

Abstract:

Les équivalences traductionnelles extraites à partir d’un corpus parallèle deviendraient une ressource précieuse permettant d’étudier différents contextes traductionnels envisagés entre les deux langues distinctes. L’utilisation des textes traductionnels constitue aujourd’hui un thème essentiel en traductologie et en études contrastives des langues. Les méthodes textométriques opèrent une série de calculs statistiques portant sur les unités textuelles dans un corpus parallèle segmenté en occurrences. Elles fournissent les indices quantitatifs permettant de mettre en évidence le lien traductionnel de ces unités. En examinant des formes bilingues issues des corpus parallèles français-coréens, nous avons vérifié l’utilité de cette méthodologie appliquée aux textes traductionnels en français-coréen. Elles ont effectivement donné un résultat positif, d’une part, et un résultat négatif, d’autre part, tout au long de nos travaux. Pourtant, grâce à ces méthodes, nous avons pu étudier divers liens traductionnels entre unités textuelles du français et du coréen. La plupart de méthodes automatisées consacrées au corpus parallèle en langues hétérogènes n’ont pas produit de résultat acceptable. À ce titre, la textométrie, qui vise à l’observation quantitative des éléments lexicaux d’un corpus, serait très intéressante lorsqu’il s’agit notamment d’un corpus parallèle en langues sans parenté
The translational equivalences extracted from a parallel corpus become a valuable resource enable to study the various translational contexts between the two distinct languages. The use of translational texts is now a principal subject in the translation studies and the contrastive studies of languages. The textometry operate a set of statistical calculations on the textual units in a parallel corpus divided into the tokens. They provide the quantitative evidence that verify the translational relation of the linguistic units. In exploring bilingual words in the French-Korean parallel corpora, we verified the usefulness of this methodology applied to the French-Korean translational texts. They produced actually a positive result, on the one hand, and a negative result, on the other hand, throughout our work. Yet, these methods made also observe the various translational relations of textual units between French and Korean. The most automated methods devoted to the parallel corpora of heterogeneous language pairs have not produced the approvable result. For the reason, the textometry, which aims to observe the lexical elements of a corpus from a statistical point of view, would be very practical method when we deal with a parallel corpus that consists of different language pairs

APA, Harvard, Vancouver, ISO, and other styles

3

Gao, Z. M. "Automatic extraction of translation equivalents from a parallel Chinese - English corpus." Thesis, University of Manchester, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488455.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Jilani, Aisha. "Parallel corpus multi stream question answering with applications to the Qu'ran." Thesis, University of Huddersfield, 2013. http://eprints.hud.ac.uk/id/eprint/23852/.

Full text

Abstract:

Question-Answering (QA) is an important research area, which is concerned with developing an automated process that answers questions posed by humans in a natural language. QA is a shared task for the Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing communities (NLP). A technical review of different QA system models and methodologies reveals that a typical QA system consists of different components to accept a natural language question from a user and deliver its answer(s) back to the user. Existing systems have been usually aimed at structured/ unstructured data collected from everyday English text, i.e. text collected from television programmes, news wires, conversations, novels and other similar genres. Despite all up-to-date research in the subject area, a notable fact is that none of the existing QA Systems has been tested on a Parallel Corpus of religious text with the aim of question answering. Religious text has peculiar characteristics and features which make it more challenging for traditional QA methods than other kinds of text. This thesis proposes PARMS (Parallel Corpus Multi Stream) Methodology; a novel method applying existing advanced IR (Information Retrieval) techniques, and combining them with NLP (Natural Language Processing) methods and additional semantic knowledge to implement QA (Question Answering) for a parallel corpus. A parallel Corpus involves use of multiple forms of the same corpus where each form differs from others in a certain aspect, e.g. translations of a scripture from one language to another by different translators. Additional semantic knowledge can be referred as a stream of information related to a corpus. PARMS uses Multiple Streams of semantic knowledge including a general ontology (WordNet) and domain-specific ontologies (QurTerms, QurAna, QurSim). This additional knowledge has been used in embedded form for Query Expansion, Corpus Enrichment and Answer Ranking. The PARMS Methodology has wider applications. This thesis applies it to the Quran – the core text of Islam; as a first case study. The PARMS Method uses parallel corpus comprising ten different English translations of the Quran. An individual Quranic verse is treated as an answer to questions asked in a natural language, English. This thesis also implements PARMS QA Application as a proof of concept for the PARMS methodology. The PARMS Methodology aims to evaluate the range of semantic knowledge streams separately and in combination; and also to evaluate alternative subsets of the DATA source: QA from one stream vs. parallel corpus. Results show that use of Parallel Corpus and Multiple Streams of semantic knowledge have obvious advantages. To the best of my knowledge, this method is developed for the first time and it is expected to be a benchmark for further research area.

APA, Harvard, Vancouver, ISO, and other styles

5

Ribas, Bruguer Marta. "Alineació de textos jurídics paral·lels (català-castellà): alguns problemes." Doctoral thesis, Universitat Pompeu Fabra, 2006. http://hdl.handle.net/10803/7502.

Full text

Abstract:

El desenvolupament que han tingut recentment els programes d'alineació de corpus bilingües obre noves perspectives en l'estudi dels textos d'especialitat. La seva utilització permet contrastar i evidenciar diferències discursives entre textos especialitzats paral·lels en llengües diferents, fet que constitueix un benefici a l'hora de tractar el coneixement comparatiu entre un i altre discurs. Tanmateix, la formalització d'aquest coneixement resulta una tasca complexa i així ho demostren els casos de soroll en els resultats dels programes.

Partint d'un corpus de textos jurisprudencials paral·lels catalans i castellans i utilitzant el programa ALINEA, fem un estudi descriptiu de detall sobre les diferències discursives entre els textos jurisprudencials catalans i castellans per tal de formalitzar el coneixement comparatiu del discurs jurídic (jurisprudencial) català i castellà. Establim una tipologia dels fenòmens lingüístics propis d'aquest discurs que poden generar alineacions insatisfactòries, n'estudiem les causes i fem una proposta de tractament lexicogràfic i d'estratègies complementàries (regles lingüístiques) per millorar els resultats de l'alineació d'aquest tipus de textos.
Recent development in alignment programs of bilingual corpora open horizons in studies about specialized texts. Its use let to contrast and to show discoursive differences between parallel specialized texts in different languages. This constitues a benefit in the treatment of comparative knowledge between one discourse and the other. Nevertheless, the formalization of this knowledge is a complex task and, so, the cases of noise in the results of the programs show it.

Considering a corpus of Catalan and Spanish jurisprudencial parallel texts and using the ALINEA program, we present a descriptive study of detail about the discoursive differences between Catalan and Spanish jurisprudencial texts in order to formalize the comparative knowledge of Catalan and Spanish legal (jurisprundencial) discourse. We set a typology of own linguistic phenomena about this type of discourse which can generate non satisfactory alignments, we study the causes of this and we make a proposal of lexicographic treatment and of supplementary strategies (linguistic rules) in order to improve the results of the alignment of this type of texts.

APA, Harvard, Vancouver, ISO, and other styles

6

Silva, Carlos Eduardo da. "Developing online parallel corpus-based processing tools for translation research and pedagogy." reponame:Repositório Institucional da UFSC, 2013. https://repositorio.ufsc.br/xmlui/handle/123456789/130880.

Full text

Abstract:

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Programa de Pós-Graduação em Letras/Inglês e Literatura Correspondente, Florianópolis, 2013.
Made available in DSpace on 2015-03-18T20:46:23Z (GMT). No. of bitstreams: 1 332777.pdf: 8216934 bytes, checksum: d9c6b777d9c9b0f2a3212787858b8619 (MD5) Previous issue date: 2013
Abstract : This study describes the key steps in developing online parallel corpus-based tools for processing COPA-TRAD (copa-trad.ufsc.br), a parallel corpus compiled for translation research and pedagogy. The study draws on FernandesÂ s (2009) proposal for corpus compilation, which divides the compiling process into three main parts: corpus design, corpus building and corpus processing. This compiling process received contributions from the good development practices of Software Engineering, especially the ones advocated by Pressman (2011). The tools developed can, for example, assist in the investigation of certain types of texts and translational practices related to certain linguistic patterns such as collocations and semantic prosody. As a result of these applications, COPA-TRAD becomes a suitable tool for the investigation of empirical phenomena with a view to translation research and pedagogy.

Este estudo descreve as principais etapas no desenvolvimento de ferramentas online com base em corpus para o processamento do COPA-TRAD (Corpus Paralelo de Tradução - www.copa-trad.ufsc.br), um corpus paralelo compilado para a pesquisa e ensino de tradução. Para a compilação do corpus, o estudo utiliza a proposta de Fernandes (2009) que divide o processo de compilação em três etapas principais: desenho do corpus, construção do corpus e processamento do corpus. Este processo de compilação recebeu contribuições das boas práticas de desenvolvimento fornecidas pela Engenharia de Software, especialmente as que foram sugeridas por Pressman (2011). As ferramentas desenvolvidas podem, por exemplo, auxiliar na investigação de certos tipos de textos, bem como em práticas tradutórias relacionadas a certos padrões linguísticos tais como colocações e prosódia semântica. Como resultado dessas aplicações, o COPA-TRAD configura-se em uma ferramenta útil para a investigação empírica de fenômenos tradutórios com vistas à pesquisa e ao ensino de tradução.

APA, Harvard, Vancouver, ISO, and other styles

7

Piao, Scott. "Sentence and word alignment between Chinese and English." Thesis, Lancaster University, 2000. http://eprints.lancs.ac.uk/52143/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Al-Qaisi, Fu'ad. "Apport de la linguistique de corpus à la lexicographie bilingue (français-arabe) : macrostructure et microstructure d'un dictionnaire de collocations." Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO20115.

Full text

Abstract:

L'objet de la présente étude est d’examiner l’apport de la linguistique de corpus à la lexicographie bilingue français-arabe. L’intérêt est porté tout particulièrement à la collocation. Ainsi, la quête commence dès la compilation du corpus jusqu'à l'intégration des collocations au lexique. Les notions fondamentales telle que la linguistique de corpus, le corpus et la collocation sont examinées. Ensuite, la recherche prend une tournure empirique qui se base sur un corpus. Pour pallier la non disponibilité des outils de traitement de corpus en langue arabe, une approche a été élaborée au sein de cette étude, que nous avons baptisée stratégie de passerelle. L’idée est de partir d’un corpus parallèle (traduit) français-arabe. Ce corpus est constitué de la version française du journal Le Monde Diplomatique, ainsi que sa traduction arabe. Le recours à un corpus parallèle a pour vocation de faciliter le repérage des phénomènes contrastifs. Les résultats obtenus seront vérifiés par la suite dans un corpus monolingue arabe (comparable) constitué de trois journaux, à savoir Alrai, Alayam, Algomhuria. Tout au long de cette partie, les résultats sont comparés dans un premiers temps entre corpus et dictionnaires, dans un deuxième temps entre types de corpus (parallèle et comparable), et dans un troisième temps entre journaux du corpus comparable (Alrai, Alayam et Algomhuria). Ensuite, un certain nombre des collocations est soumis à un examen structurel et à un examen sémantique. Ces exploitations apportent non seulement des éléments sur l’environnement collocationnel entre langue et discours, mais également sur une éventuelle approche pour la prise en compte des collocations. Des interrogations légitimes naissent au fur et à mesure des exploitations sur la ressemblance entre les collocations des deux langues. Les résultats mettent en évidence des points comme l’enchaînement collocationnel, la synonymie collocationnelle et d’autres aspects. L’étude est couronnée par la conception d’un dictionnaire informatique de collocations. Il s’agit d’un dictionnaire actif bilingue, qui s’adresse à un public arabisant et aux traducteurs
The aim of this study is to examine the contribution of corpus linguistics to bilingual French-Arabic lexicography. We particularly focus on collocations, as our research begins with the compilation of a bilingual corpus leading up to the integration of collocations in the lexicon. Fundamentals such as corpus linguistics, corpora and collocation are examined. Our research then takes an empirical turn that is based on the use of our corpus. To overcome the unavailability of corpus processing tools in Arabic, an approach was developed in this study that we called the footbridge strategy. The idea is to start from a French-Arabic (translated) parallel corpus. This corpus consists of the French version of Le Monde Diplomatique, and its translation. Using a parallel corpus aims to facilitate the identification of contrastive phenomena. The results obtained in the translated corpus (in its Arabic component) will be subsequently checked in an Arabic monolingual corpus. The latter is a corpus consisting of three newspapers: Alrai, Alayyam, Algouhouria. Throughout the exploitation of the corpus, results are compared first between corpora and dictionaries, secondly between corpus types (parallel and comparable), and thirdly between newspapers (Alrai, Alayyam, Algouhouria). Then a number of collocations are subjected to semantic and structural review and consideration. This review process not only brings some clarifications on the environment of collocations between language and speech but also about a possible approach for their integration in the dictionary. Legitimate questions gradually arise regarding the resemblance of collocations in French and Arabic. The results highlight phenomena such as collocational chains (clusters), collocational synonyms, etc. The study culminates in the design of a computer dictionary of collocations, i.e. an active bilingual dictionary aimed at Arabic language specialists and translators

APA, Harvard, Vancouver, ISO, and other styles

9

Abdulhay, Authoul. "Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00836764.

Full text

Abstract:

Cette thèse vise à la mise en œuvre et à l'évaluation de techniques d'extraction de relations sémantiques à partir d'un corpus multilingue aligné. Ces relations seront extraites par transitivité de l'équivalence traductionnelle, deux lexèmes possédant les mêmes équivalents dans une langue cible étant susceptibles de partager un même sens. D'abord, nos observations porteront sur la comparaison sémantique d'équivalents traductionnels dans des corpus multilingues alignés. A partir des équivalences, nous tâcherons d'extraire des "cliques", ou sous-graphes maximaux complets connexes, dont toutes les unités sont en interrelation, du fait d'une probable intersection sémantique. Ces cliques présentent l'intérêt de renseigner à la fois sur la synonymie et la polysémie des unités, et d'apporter une forme de désambiguïsation sémantique. Elles seront créées à partir de l'extraction automatique de correspondances lexicales, basée sur l'observation des occurrences et cooccurrences en corpus. Le recours à des techniques de lemmatisation sera envisagé. Ensuite nous tâcherons de relier ces cliques avec un lexique sémantique (de type Wordnet) afin d'évaluer la possibilité de récupérer pour les unités arabes des relations sémantiques définies pour des unités en anglais ou en français. Ces relations permettraient de construire automatiquement un réseau utile pour certaines applications de traitement de la langue arabe, comme les moteurs de question-réponse, la traduction automatique, les systèmes d'alignement, la recherche d'information, etc.

APA, Harvard, Vancouver, ISO, and other styles

10

Teixeira, Lílian Figueiró. "A semântica dos compostos nominais – um estudo de corpus paralelo inglês/português." Universidade do Vale do Rio do Sinos, 2009. http://www.repositorio.jesuita.org.br/handle/UNISINOS/2574.

Full text

Abstract:

Made available in DSpace on 2015-03-05T18:11:58Z (GMT). No. of bitstreams: 0 Previous issue date: 10
Nenhuma
Os compostos nominais são construções produtivas em diversos idiomas, ou seja, novas combinações são facilmente criadas em contextos de uso da língua. No entanto, esse fenômeno linguístico é idiossincrático, fato que torna o seu estudo um desafio para a linguística e para as investigações que se ocupam do Processamento da Linguagem Natural. Neste trabalho, é feita uma investigação sobre a forma como os elementos constituintes dos compostos nominais em inglês formados por dois substantivos (compostos NN) se relacionam semanticamente e quais as características dos seus correspondentes de tradução em língua portuguesa encontrados em dez edições da revista National Geographic. O objetivo desta investigação é identificar as relações mais frequentes no corpus a fim de que se possa propor uma tipologia que expresse a composicionalidade semântica dessas construções. Para alcançar esse fim, o trabalho está dividido em três etapas. A primeira etapa consiste em apresentar os pressupostos teóricos adotados no trabalho. P
Noun compounds are productive constructions in many languages. However, they are idiosyncratic, fact that makes the study of this linguistic phenomenon a challenge for the linguistics and for the Natural Language researches. The purpose of this paper is to study the semantics of the noun compounds formed by two nouns (NN compounds). It is also intended to identify the translation equivalents in Portuguese found in ten editions of the National Geographic Magazine. The final product is a proposal of typology which expresses the compositionality of the NN compounds according to the data found in the corpus. This paper has three distinctive parts, where the following subjects are introduced: the theoretical bases for this paper; the methodological resources from Corpus Linguistics that were adopted; the analysis and discussion about the data. Concepts about the semantics of nominal compounds as productivity, semantic transparency, headness, lexicalization and nominalization are commented. Two theories were used f

APA, Harvard, Vancouver, ISO, and other styles

11

NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.

Full text

Abstract:

La Linguistica dei corpora - che fa uso di corpora elettronici annotati per lo studio delle lingue - è un approccio ormai diffuso e consolidato. I corpora paralleli, in particolare, in cui i testi in una lingua A sono allineati con la traduzione in lingua B, sono uno strumento molto utile nell’analisi contrastiva. La mancata disponibilità di corpora paralleli di qualità per le lingue di nostro interesse - russo e italiano - ci ha portati a volere ampliare e migliorare il corpus parallelo italiano-russo presente come corpus pilota nel Nacional’nyj Korpus Russkogo Jazyka (Corpus Nazionale della Lingua Russa). Il presente lavoro ha avuto pertanto uno scopo applicativo e uno teorico. Da un lato, dopo aver studiato le questioni imprescindibili per la progettazione di un corpus di qualità, sono stati stabiliti i criteri per l’ampliamento e inseriti nuovi testi, consentendo così al corpus parallelo di passare da 700.000 a più di 4 milioni di parole, entità che consente ora di condurre ricerche scientificamente valide. In seguito, sono state proposte tre analisi corpus-based così da mettere in luce le potenzialità del corpus ampliato: lo studio dei verbi prefissali di memoria russi e la loro resa in italiano; il confronto tra il causativo analitico italiano “fare + infinito” e il causativo russo; l’analisi comparata di quindici versioni italiane de Il Cappotto di N. Gogol’. Le tre analisi hanno consentito di avanzare innanzitutto osservazioni di carattere metodologico in vista di un ulteriore ampliamento e miglioramento del corpus parallelo italiano-russo. In secondo luogo, la prospettiva corpus-based si è dimostrata utile per approfondire lo studio di questi temi dal punto di vista teorico.
Corpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.

APA, Harvard, Vancouver, ISO, and other styles

12

NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.

Full text

Abstract:

La Linguistica dei corpora - che fa uso di corpora elettronici annotati per lo studio delle lingue - è un approccio ormai diffuso e consolidato. I corpora paralleli, in particolare, in cui i testi in una lingua A sono allineati con la traduzione in lingua B, sono uno strumento molto utile nell’analisi contrastiva. La mancata disponibilità di corpora paralleli di qualità per le lingue di nostro interesse - russo e italiano - ci ha portati a volere ampliare e migliorare il corpus parallelo italiano-russo presente come corpus pilota nel Nacional’nyj Korpus Russkogo Jazyka (Corpus Nazionale della Lingua Russa). Il presente lavoro ha avuto pertanto uno scopo applicativo e uno teorico. Da un lato, dopo aver studiato le questioni imprescindibili per la progettazione di un corpus di qualità, sono stati stabiliti i criteri per l’ampliamento e inseriti nuovi testi, consentendo così al corpus parallelo di passare da 700.000 a più di 4 milioni di parole, entità che consente ora di condurre ricerche scientificamente valide. In seguito, sono state proposte tre analisi corpus-based così da mettere in luce le potenzialità del corpus ampliato: lo studio dei verbi prefissali di memoria russi e la loro resa in italiano; il confronto tra il causativo analitico italiano “fare + infinito” e il causativo russo; l’analisi comparata di quindici versioni italiane de Il Cappotto di N. Gogol’. Le tre analisi hanno consentito di avanzare innanzitutto osservazioni di carattere metodologico in vista di un ulteriore ampliamento e miglioramento del corpus parallelo italiano-russo. In secondo luogo, la prospettiva corpus-based si è dimostrata utile per approfondire lo studio di questi temi dal punto di vista teorico.
Corpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.

APA, Harvard, Vancouver, ISO, and other styles

13

Knobloch, Nina. "The encoding of bad and evil : A cross-linguistic study using a parallel Bible corpus." Thesis, Stockholms universitet, Institutionen för lingvistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-196656.

Full text

Abstract:

This study investigates the cross-linguistic encoding of bad and evil expressions. Using parallel data from the Bible corpus consisting of translations of the New Testament into 30 languages, probabilistic semantic maps have been created using Multi-Dimensional scaling. Special attention has been paid to the presence of morphological and syntactic negation withinthe domain. The results show that languages either have one broader expression that is used within the entire domain, or they have at least two expressions of which one is broader, i.e. expresses a bad state, action or character flaw, and the other one narrower, i.e. is restricted to themost evil actions or characters which require a moral agent. Languages with several expressions vary largely in how broad or restricted the expressions are within the domain. Therefore, a scalar view of the domain has been proposed, rather than dividing the domain into discrete semantic categories. In the languages where negation marking was present within the domain, it only occurred in the broader expressions.
I denna studie undersöks den tvärspråkliga kodningen av uttryck med dålig och ond. Probabilistiska semantiska kartor har skapats med hjälp av Multi-Dimensional scaling genom att använda parallel data från Bibelkorpusen som består av 30 översättningar av Nya Testamentet. Förekomsten av eventuell morfologisk och syntaktisk negation inom domänen har tillägnats särskild uppmärksamhet. Resultaten visar att de flesta språken antingen har ett bredare uttryck som används inom hela domänen, eller har minst två uttryck varav ett är bredare, dvs används för dåliga tillstånd, handlingar eller karaktärsdrag, och det andra är mer begränsad, dvs används endast för de mest onda handlingar och karaktärer som kräver en moralisk agent. Språk med flera uttryck varierar mycktet i hur breda eller begränsade uttrycken är. En representation av den semantiska domänen som en skala föreslås därför, snarare än att dela uppdomänen i diskreta semantiska kategorier. I de språken där negation förekom inom domänen fanns det endast i de bredare uttrycken.

APA, Harvard, Vancouver, ISO, and other styles

14

Bouamor, Dhouha. "Constitution de ressources linguistiques multilingues à partir de corpus de textes parallèles et comparables." Phd thesis, Université Paris Sud - Paris XI, 2014. http://tel.archives-ouvertes.fr/tel-00994222.

Full text

Abstract:

Les lexiques bilingues sont des ressources particulièrement utiles pour la Traduction Automatique et la Recherche d'Information Translingue. Leur construction manuelle nécessite une expertise forte dans les deux langues concernées et est un processus coûteux. Plusieurs méthodes automatiques ont été proposées comme une alternative, mais elles qui ne sont disponibles que dans un nombre limité de langues et leurs performances sont encore loin derrière la qualité des traductions manuelles.Notre travail porte sur l'extraction de ces lexiques bilingues à partir de corpus de textes parallèles et comparables, c'est à dire la reconnaissance et l'alignement d'un vocabulaire commun multilingue présent dans ces corpus.

APA, Harvard, Vancouver, ISO, and other styles

15

Zennaki, Othman. "Construction automatique d'outils et de ressources linguistiques à partir de corpus parallèles." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM006/document.

Full text

Abstract:

Cette thèse porte sur la construction automatique d’outils et de ressources pour l’analyse linguistique de textes des langues peu dotées. Nous proposons une approche utilisant des réseaux de neurones récurrents (RNN - Recurrent Neural Networks) et n'ayant besoin que d'un corpus parallèle ou mutli-parallele entre une langue source bien dotée et une ou plusieurs langues cibles moins bien ou peu dotées. Ce corpus parallèle ou mutli-parallele est utilisé pour la construction d'une représentation multilingue des mots des langues source et cible. Nous avons utilisé cette représentation multilingue pour l’apprentissage de nos modèles neuronaux et nous avons exploré deux architectures neuronales : les RNN simples et les RNN bidirectionnels. Nous avons aussi proposé plusieurs variantes des RNN pour la prise en compte d'informations linguistiques de bas niveau (informations morpho-syntaxiques) durant le processus de construction d'annotateurs linguistiques de niveau supérieur (SuperSenses et dépendances syntaxiques). Nous avons démontré la généricité de notre approche sur plusieurs langues ainsi que sur plusieurs tâches d'annotation linguistique. Nous avons construit trois types d'annotateurs linguistiques multilingues: annotateurs morpho-syntaxiques, annotateurs en SuperSenses et annotateurs en dépendances syntaxiques, avec des performances très satisfaisantes. Notre approche a les avantages suivants : (a) elle n'utilise aucune information d'alignement des mots, (b) aucune connaissance concernant les langues cibles traitées n'est requise au préalable (notre seule supposition est que, les langues source et cible n'ont pas une grande divergence syntaxique), ce qui rend notre approche applicable pour le traitement d'un très grand éventail de langues peu dotées, (c) elle permet la construction d'annotateurs multilingues authentiques (un annotateur pour N langages)
This thesis focuses on the automatic construction of linguistic tools and resources for analyzing texts of low-resource languages. We propose an approach using Recurrent Neural Networks (RNN) and requiring only a parallel or multi-parallel corpus between a well-resourced language and one or more low-resource languages. This parallel or multi-parallel corpus is used to construct a multilingual representation of words of the source and target languages. We used this multilingual representation to train our neural models and we investigated both uni and bidirectional RNN models. We also proposed a method to include external information (for instance, low-level information from Part-Of-Speech tags) in the RNN to train higher level taggers (for instance, SuperSenses taggers and Syntactic dependency parsers). We demonstrated the validity and genericity of our approach on several languages and we conducted experiments on various NLP tasks: Part-Of-Speech tagging, SuperSenses tagging and Dependency parsing. The obtained results are very satisfactory. Our approach has the following characteristics and advantages: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages)

APA, Harvard, Vancouver, ISO, and other styles

16

Neifar, Wafa. "Méthodes d'acquisition terminologique en arabe : Application au domaine médical." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS085/document.

Full text

Abstract:

L'objectif de cette thèse est de lever les verrous que constituent le manque de disponibilité de ressources ou d'outils TAL pour la langue arabe dans les domaines de spécialité en proposant des méthodes permettant l'extraction de termes à partir de textes en arabe standard moderne. Dans ce contexte, nous avons d'abord construit un corpus parallèle anglais-arabe dans un domaine de spécialité. Il s'agit d'un ensemble de textes médicaux produits par la bibliothèque nationale de médecine américaine (NLM). Par la suite, nous avons proposé des méthodes d'acquisition terminologique, permettant d'extraire des termes ou d'acquérir des relations entre ces termes, pour la langue arabe en se basant sur: i)adaptation d'un extracteur terminologique existant pour la languefrançaise ou anglaise, ii) l'exploitation de la translittération des termes anglais en caractères arabes et iii) l'application de la la notion de transfert translingue. Appliqué au niveau terminologique, le transfert consiste à mettre en œuvre un processus d'extraction de termes ou d'acquisition de relations entre termes sur des textes d'une langue source (ici, le français ou l'anglais) puis à transférer les informations extraites sur des textes d'une langue cible (ici, l’arabe standard moderne) pour ainsi identifier le même type d'informations terminologiques. Nous avons évalué les listes de termes monolingues et bilingues obtenues lors des différentes expériences que nous avons réalisées, suivant une méthode transparente, directe et semi-automatique: les termes candidats extraits sont confrontés à une terminologie de référence avant d'être vérifiés manuellement. Cette évaluation suit un protocole que nous avons proposé
The goal of this thesis is to reduce the lack of available resources and NLP tools for Arabic language in specialised domains by proposing methods allowing the extraction of terms from texts in Modern Standard Arabic. In this context, we first constructed an English-Arabic parallel corous in a specific domain.It is a set of medical texts produced by the US National Library of Medicine (NLM). Thereafter, we have proposed terminological acquisition methods, toextract terms or acquire relations between these terms, for Arabic based on: i) the adaptation of an existing terminology extractor for French or English, ii) the transliteration of English terms in Arabic characters and iii) cross-lingual transfer. Applied at the terminological level, transfer aims to implement a process of term extraction or relationship acquisition between terms in the texts of a source language (here, French or English) and then to transfer the extracted information to target language texts (in this case, Modern Standard Arabic), thereby identifying the same type of terminologicalinformation. We have evaluated the monolingual and bilingual term lists that we have obtained by the experiments we carried out, according to a transparent, direct and semi-automatic method: the extracted term candidates are confronted with a reference terminology before being validated manually. This evaluation follows a protocol that we proposed

APA, Harvard, Vancouver, ISO, and other styles

17

Wang, Lixum. "The use of parallel texts in language learning : computer software and teaching materials for English and Chinese." Thesis, University of Birmingham, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368990.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Teixeira, Luiz Gustavo [UNESP]. "Colocações criativas presentes no corpus literário paralelo Memórias póstumas de Brás Cubas sob a perspectiva de um novo olhar." Universidade Estadual Paulista (UNESP), 2016. http://hdl.handle.net/11449/143885.

Full text

Abstract:

Submitted by Luiz Gustavo Teixeira null (guteixeiranh@hotmail.com) on 2016-09-13T00:13:33Z No. of bitstreams: 1 dissertação final_corrigida.pdf: 1490041 bytes, checksum: c24052ade68282564ca0f8ec02c73aa8 (MD5)
Approved for entry into archive by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br) on 2016-09-14T20:13:10Z (GMT) No. of bitstreams: 1 teixeira_lg_me_sjrp.pdf: 1490041 bytes, checksum: c24052ade68282564ca0f8ec02c73aa8 (MD5)
Made available in DSpace on 2016-09-14T20:13:10Z (GMT). No. of bitstreams: 1 teixeira_lg_me_sjrp.pdf: 1490041 bytes, checksum: c24052ade68282564ca0f8ec02c73aa8 (MD5) Previous issue date: 2016-07-29
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
O presente trabalho tem como objetivo a análise das traduções das colocações criativas presentes em um corpus literário paralelo, constituído pela obra originalmente escrita em português, Memórias Póstumas de Brás Cubas (TO), de Machado de Assis e de suas três versões para língua inglesa: Epitaph of a Small Winner (TT¹), de Grossman (1953), Posthumous Reminiscences of Braz Cubas (TT²), de Ellis (1955) e, The Posthumous Memoirs of Brás Cubas (TT³), de Rabassa (1997). Como fundamentação teórica e metodológica, apoiamo-nos nos pressupostos teóricos da Linguística de Corpus e de sua interface com os Estudos da Tradução Baseados em Corpus e a Literatura, no conceito de colocações criativas, bem como nos estudos machadianos, de Bosi (1999, 2006) e Schwarz (1990), mostrando como o olhar do defunto autor é retratado pelos olhos dos personagens, nas passagens selecionadas. Para o levantamento das palavras de maior índice de chavicidade, utilizamos o programa WordSmith Tools (SCOTT, 2012), o qual nos possibilitou realizar uma análise mais abrangente e dinâmica dos dados. Como corpora de referências em inglês e português, usamos respectivamente o Brown Corpus e o corpus Lácio-Ref. O levantamento das palavras-chave apontou a significativa chavicidade dos nódulos “olhos”, no texto original (TO) e de eyes, nos textos traduzidos (TT¹, TT², TT³), a partir dos quais extraímos e analisamos as colocações criativas relacionadas aos referidos nódulos. Tanto o levantamento das palavras-chave quanto a análise das traduções das colocações criativas, nas passagens selecionadas, mostram-nos que, apesar de todos os tradutores repetirem as traduções de algumas colocações criativas, Grossman (TT¹) as repete com mais frequência e, portanto, não explora a criatividade presente no estilo machadiano. A análise também nos sugere que em determinadas passagens, os tradutores não absorvem o sentido das colocações criativas originalmente empregadas, revelando a dificuldade de tradução do estilo machadiano.
This study aims to analyze the creative collocations in a literary parallel corpus comprised of the original text in Portuguese Memórias Póstumas de Brás Cubas, by Machado de Assis (1891), and its three translations into English Epitaph of a Small Winner (TT¹), by Grossman (1953), Posthumous Reminiscences of Braz Cubas (TT²), by Ellis(1955) and, The Posthumous Memoirs of Brás Cubas (TT³), by Rabassa (1997). The theoretical and methodological approach was based on Corpus Linguistics and its relations with Corpus-based Translation Studies and Literature, on the study of creative collocations, and some literary concepts from Alfredo Bosi (1999, 2006) and Schwarz (1990), trying to show the implications of the dead Brás Cubas‘ looks on the characters in the selected fragments. In order to extract the most significant key words, we used the computer program WordSmith Tools (SCOTT, 2012) which allowed us to accomplish a broader analysis of data. As reference corpora we used the Brown Corpus in English and the Lacio-Ref corpus in Portuguese. The extraction of the keywords has shown a significant keyness value of the nodes ―olhos‖ in the original text (TO) and eyes in translated texts (TT¹, TT², TT³) and thus the creative collocations related to these nodes were analyzed. Both the extraction of keywords and the analysis of the translations of creative collocations, in the selected fragments, show us that in spite of the translators repeating the translation of some creative collocations, Grossman (TT¹) did it more frequently, and did not explored the creativity that Machado‘s writing entails. The analysis also suggests that in some fragments the translators do not render the very sense of the collocations, revealing how difficult the task of translating Machado‘s style is.

APA, Harvard, Vancouver, ISO, and other styles

19

Granlund, Ann-Louise. "Comparing Emotional Intensity Between Languages: A parallel corpus Investigation on the Swedish word Njuta and its English equivalents." Thesis, Malmö högskola, Institutionen för globala politiska studier (GPS), 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-22490.

Full text

Abstract:

Denna uppsats utforskar känslomässigt semantiska skillnader mellan ordet "njuta" och dess engelska motsvarigheter. Genom en empirisk engelsk-svensk parallell-corpus undersökning kommer jag försöka visa variationen i semantisk definition i ordets användning. Syftet är att demonstrera, i enlighet med min hypotes, att de engelska motsvarigheterna till det svenska ordet "njuta" är mindre känslomässigt laddade och att det svenska ordet är mera intensivt och semantiskt starkare än engelskans "enjoy".
This paper seeks to investigate the emotional semantic differences between the Swedish word "njuta" and its English equivalents. As a Swede, when attempting to describe the word "njuta", the first natural description is to have feelings of lust, or to experience something with passion. The most common translation of the word into English is "enjoy" , and the first natural description of this word is for me to like something, or to find pleasure in it. The words that I have chosen to investigate have a wider meaning apart from simply experiencing feelings of pleasure to different degrees. They are also used in connection with having something, possessing, valuing, or consuming something . By an English-Swedish Parallel Corpus investigation I will try to show the variety of semantic definitions of usage of the word. The aim and scope of this paper is to demonstrate, in accordance with my hypothesis, how the English equivalents of the Swedish word njuta carry less emotional value, and that the Swedish word is more intense and semantically stronger than the English enjoy.

APA, Harvard, Vancouver, ISO, and other styles

20

Finnveden, Gustav. "Finding case through personal names in parallel texts." Thesis, Stockholms universitet, Institutionen för lingvistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-174831.

Full text

Abstract:

The aim of this study is to evaluate whether the ‘richness’ of the marking on personal names is an adequate indirect measure of a language’s case usage. The method uses parallel texts to identify, and group by lemma, names in over a thousand languages. These groupings are compared with data for case usage from a typological database for those languages for which it is available. This material is then used to test a method for assessing whether a language uses case or not. Results indicate that the maximum number of word types a proprial lemma is attested with in a text is a useful tool for inferring case usage. However, it only yielded clear results for a subset of the languages tested. It was not particularly useful for inferring the absence of case usage. Estimation of number of case categories was also performed. An entropy measure based on word types that a personal name lemma is attested with and the occurrences of these word types was used. It was found to be a fair indicator of number of case categories for languages, if somewhat inaccurate. Markings on languages which had no case were investigated. They were found to be of several types: pragmatic markers, non-case grammatical markers and case-like markers. Two languages with few markings on personal names and with case were investigated. They were found to not use any case marking on their personal names, but still use such markers on common nouns. This contrasts with a tentative generalization that this study is based on: ‘No languages have case marking exclusively in the domain of [personal names] or [common nouns].’ (Handschuh, 2017).
Denna studies syfte är att utvärdera om ’formrikedomen’ hos personnamnslexem är ett fungerande indirekt sätt att undersöka språks kasussystem. Parallella texter användes för att namnen hitta personnamn och gruppera dem efter lexem i över ett tusen språk. För den delmängd av språken där data om deras kasussystem fanns tillgänglig så jämfördes denna med grupperingarna. Resultaten indikerar att det maximala antalet ordformstyper som ett namnlemma observerades i är ett användbart verktyg för att hitta språk som använder kasus, men bara för en delmängd av testade språk. Det var däremot sämre på att hitta språk som inte använder kasus. En entropiuppskattning som var baserat på antalet ordformstyper ett personnamnslemma hittades med och antalet förekomster av dessa ordformstyper användes. Det var en okej indikator för antalet kasuskategorier, dock med något bristande träffsäkerhet. Personnamnsmarkeringar på språk utan kasus undersöktes. De funna typerna av markeringar var pragmatiska, kasuslika, och grammatiska icke-kasus. Två språk med kasus, men med få personnamns, undersöktes. De använder inte kasusmarkering på personnamn, men på sina substantiv, vilket bröt mot en hypotetisk generalisering som denna studie baserades på: Att inga språk har kasusmarkeringar endast på personnamn eller endast på substantiv.

APA, Harvard, Vancouver, ISO, and other styles

21

Mörn, Anna. "The Modal Auxiliaries Can and Could - A contrastive investigation of the modal auxiliaries can and could in descriptions in materials aimed for English tuition and the English-Swedish Parallel Corpus." Thesis, Halmstad University, School of Humanities (HUM), 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-2515.

Full text

Abstract:

The two modal auxiliaries can and could are investigated in this essay. Focus is on the correspondence between descriptions in grammar books and real-life data.

First four English learner grammar books aimed for Swedish high-schools were analyzed. The uses and translations of can and could found in the grammar books were then compared to real-life examples from an English-Swedish parallel corpus.

It was found that three of the grammar books categorize the uses of can and could according to ability, possibility and permission in quite general terms and these uses correlated to the majority of the corpus examples. The forth book did not mention the possibility use and stated very specific uses of the modal auxiliaries. This grammar book did not correspond to the corpus data to the same extent as the other three grammars.

It could be concluded that the assumptions made about use correlated to a greater extent with the corpus than the assumptions made about translations.

APA, Harvard, Vancouver, ISO, and other styles

22

MATSUBARA, Shigeki, and Yoshihide KATO. "Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar." Institute of Electronics, Information and Communication Engineers, 2010. http://hdl.handle.net/2237/15002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Lecuit, Émeline. "Les tribulations d'un nom propre en traduction : étude contrastive du nom propre et de sa traduction à partir d'un corpus aligné de dix langues européennes." Thesis, Tours, 2012. http://www.theses.fr/2012TOUR2017/document.

Full text

Abstract:

Les noms propres sont omniprésents et intéressent, depuis des siècles, philosophes et linguistes. Le travail réalisé ici est une étude contrastive des noms propres en traduction, divisée en quatre parties. Les deux premières parties sont théoriques. La première partie traite de la notion de nom propre en linguistique anglaise et en linguistique française. La deuxième partie présente les différents procédés de traduction, illustrés par des exemples sur les noms propres. Les deux parties suivantes sont expérimentales. La troisième partie détaille les différentes étapes de la constitution de notre corpus multilingue parallèle aligné et annoté, composé de onze versions du roman de Jules Verne, Le Tour du Monde en quatre-vingts jours, en dix langues européennes. La quatrième partie expose les résultats obtenus suite à l’observation du comportement des noms propres en traduction.Cette étude contredit souvent l’hypothèse largement répandue de leur intraduisibilité
Proper names are omnipresent and have long held the interest of both philosophers and linguists.Our work, divided into four parts, presents, from a contrastive perspective, the behaviour of proper names in translation.The first two parts are theoretical. Firstly, we give a general presentation of what is a proper name from the point of view of both English and French linguistics. Secondly, we introduce the different translation processes proper nouns can undergo.The last two parts are experimental. We begin by explaining the different phases in the process of constitution of our aligned and annotated multilingual parallel corpus, composed of eleven versions of Jules Verne’s novel, Le Tour du monde en quatre-vingts jours, in ten European languages. We then present the results obtained from the observation of proper names behaviour in translation.These results often contradict the widespread idea regarding proper names untranslatability

APA, Harvard, Vancouver, ISO, and other styles

24

Mydliar, Ján. "Překladač z češtiny do slovenštiny." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236398.

Full text

Abstract:

This Master thesis deals with machine translation from Czech to Slovak. The first chapter motivates the work, the second discusses various approaches to machine translation and the third details evaluation of the methods. Chapter 4 introduces the design and implementation of my system, paying a special attention to a new parallel corpus that has been created. Chapter 5 summarizes testing and evaluation of the developed system.

APA, Harvard, Vancouver, ISO, and other styles

25

Do, Thi Ngoc Diep. "Extraction de corpus parallèle pour la traduction automatique depuis et vers une langue peu dotée." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00680046.

Full text

Abstract:

Les systèmes de traduction automatique obtiennent aujourd'hui de bons résultats sur certains couples de langues comme anglais - français, anglais - chinois, anglais - espagnol, etc. Les approches de traduction empiriques, particulièrement l'approche de traduction automatique probabiliste, nous permettent de construire rapidement un système de traduction si des corpus de données adéquats sont disponibles. En effet, la traduction automatique probabiliste est fondée sur l'apprentissage de modèles à partir de grands corpus parallèles bilingues pour les langues source et cible. Toutefois, la recherche sur la traduction automatique pour des paires de langues dites "peu dotés" doit faire face au défi du manque de données. Nous avons ainsi abordé le problème d'acquisition d'un grand corpus de textes bilingues parallèles pour construire le système de traduction automatique probabiliste. L'originalité de notre travail réside dans le fait que nous nous concentrons sur les langues peu dotées, où des corpus de textes bilingues parallèles sont inexistants dans la plupart des cas. Ce manuscrit présente notre méthodologie d'extraction d'un corpus d'apprentissage parallèle à partir d'un corpus comparable, une ressource de données plus riche et diversifiée sur l'Internet. Nous proposons trois méthodes d'extraction. La première méthode suit l'approche de recherche classique qui utilise des caractéristiques générales des documents ainsi que des informations lexicales du document pour extraire à la fois les documents comparables et les phrases parallèles. Cependant, cette méthode requiert des données supplémentaires sur la paire de langues. La deuxième méthode est une méthode entièrement non supervisée qui ne requiert aucune donnée supplémentaire à l'entrée, et peut être appliquée pour n'importe quelle paires de langues, même des paires de langues peu dotées. La dernière méthode est une extension de la deuxième méthode qui utilise une troisième langue, pour améliorer les processus d'extraction de deux paires de langues. Les méthodes proposées sont validées par des expériences appliquées sur la langue peu dotée vietnamienne et les langues française et anglaise.

APA, Harvard, Vancouver, ISO, and other styles

26

Julin, Hanna. "“What you NEED to know”, “Was man wissen muss” and “Vad man behöver veta” : A contrastive corpus study of NEED to and its German and Swedish correspondences in non-fiction." Thesis, Linnéuniversitetet, Institutionen för språk (SPR), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-91202.

Full text

Abstract:

This study investigates how the semi-modal need to is translated into German/Swedish and which German/Swedish correspondences are translated into need to. To this end, the Linnaeus University English–German–Swedish Corpus (LEGS) is used. Nida’s (1964: 159-162) concept of formal and dynamic equivalence is used to perform the qualitative analysis and to discuss the results from the quantitative part of the study. The use of semi-modals such as be going to, have to and want to have increased during the second half of the 20th century (Leech et al.: 2009: 99). need to represents the obligation as being in the best interest of the subject and is associated with objectivity (Kastrone 2008: 829; Aijmer 2017: 28) Thus, need to is used to distance the speaker to avoid an authoritarian stance. This trend is a sign of an ongoing democratization (Leech et al. 2009: 270). The results showed that the preferred German translation is müssen (‘must’) (55%) and the preferred Swedish translation is behöva (‘need’) (47%). ‘Other’ is the second preferred German translation and the third preferred Swedish translation. These results are reflected in the structures translated from German and Swedish. The results indicate that the semantic category of the co-occurring main verb and the co-occurring subject affect translation. Based on these results, it could be said that English, followed by Swedish, is leading the process of democratization. However, further studies are needed to confirm this hypothesis.

APA, Harvard, Vancouver, ISO, and other styles

27

Fernández, Quiroz Ariel Marcelo. "Análise da perda de comicidade na tradução de piadas do seriado "El Chavo del 8" em um corpus paralelo da sua dublagem do espanhol do México para o português do Brasil /." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154495.

Full text

Abstract:

Submitted by Ariel Marcelo Fernández Quiroz (rel.fernandezq@gmail.com) on 2018-07-11T15:45:48Z No. of bitstreams: 1 Dissertação Ariel Fernández.pdf: 5074427 bytes, checksum: 67e509c12de12d41d7b1c49b9590090d (MD5)
Approved for entry into archive by Paula Torres Monteiro da Torres (paulatms@sjrp.unesp.br) on 2018-07-12T12:19:59Z (GMT) No. of bitstreams: 1 quiroz_amf_me_sjrp.pdf: 5074427 bytes, checksum: 67e509c12de12d41d7b1c49b9590090d (MD5)
Made available in DSpace on 2018-07-12T12:19:59Z (GMT). No. of bitstreams: 1 quiroz_amf_me_sjrp.pdf: 5074427 bytes, checksum: 67e509c12de12d41d7b1c49b9590090d (MD5) Previous issue date: 2018-04-20
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
O principal problema da maioria das dublagens de produtos audiovisuais humorísticos são os laugh tracks (sons artificiais de um público rindo), já que toda vez que há trilha sonora de risadas, estas devem coincidir com uma piada para não causar estranheza no público-alvo. Neste trabalho analisaremos, por meio de um corpus paralelo, os problemas de tradução presentes na dublagem de um desses produtos: o seriado “El Chavo del 8” (“Chaves” no Brasil) do espanhol do México para o português do Brasil, com base nas teorias de dublagem fundamentadas por Hurtado Albir (1996); de humor, fundamentadas por Raskin (1987), Bergson (1983), Posada (1995) entre outros; e de técnicas de tradução propostas por Hurtado Albir (2001). Apresentamos uma análise realizada em três etapas: na primeira, criamos um quadro com as minutagens das piadas para cada um dos 18 episódios analisados e uma seção “houve/não houve piada”; na segunda, 12 participantes responderam se houve piada ou não em cada trecho selecionado; finalmente, na terceira etapa, criamos quadros para cada piada nas quais os participantes determinaram que não houve piada e explicamos o motivo dessa perda. Com base na definição dos problemas e nas técnicas de tradução, pretende-se apresentar as possíveis soluções que os tradutores audiovisuais teriam para traduzir as piadas em caso de perda de comicidade.
The main problem with dubbing translation in most humorous audiovisual products is the laugh track, since every time there is a laugh track it must match a joke not to cause any strangeness in the target audience. In this research, we will analyze, through a parallel corpus, the translation problems in the dubbing of the series "El Chavo del 8" ("Chaves" in Brazil) from Mexican Spanish to Brazilian Portuguese, based on theories of audiovisual translation by Hurtado Albir (1996), Humor by Raskin (1987), Bergson (1983) and Posada (1995), and translation strategies by Hurtado Albir (2001).We show an analysis performed in three stages: in the first one, we created tables with the minutes of the jokes in 21 episodes and a “yes / no” joke section; in the second one, 14 participants answered whether or not there was a joke in each selected section; finally, in the third one, we created tables for each joke in which participants determined if there was no joke. Based on the definition of the problems and translation strategies, we intended to offer possible solutions for the audiovisual translators when dealing with jokes.
CNPq:190394/2015-3

APA, Harvard, Vancouver, ISO, and other styles

28

Miao, Jun. "Approches textométriques de la notion de style du traducteur : Analyses d'un corpus parallèle Français-Chinois : Jean-Christophe de Romain Rolland et ses trois traductions chinoises." Phd thesis, Université de la Sorbonne nouvelle - Paris III, 2012. http://tel.archives-ouvertes.fr/tel-00846619.

Full text

Abstract:

Nous avons tenté d'explorer la notion de style du traducteur en articulant les analysestraductologiques et les méthodes de la textométrie multilingue (méthodes d'analysequantitatives textuelles appliquées à des corpus de textes alignés). Notre corpus d'étude est constitué par trois traductions chinoises d'une oeuvre littéraire française, Jean-Christophe de Romain Rolland (1904-1917), réalisées respectivement par Fu Lei (1952-1953), Han Hulin(2000) et Xu Yuanchong (2000). Après une description des difficultés inhérentes à la construction d'un corpus parallèle français-chinois, nous effectuons successivement diverses mesures textométriques sur ce corpus, dans le but de mettre en évidence des usages lexicaux et syntaxiques propres à chacun des traducteurs. La remise en contexte dans le corpus parallèle des différences statistiques des phénomènes linguistiques entre traductions et l'examen des facteurs socioculturels relatifs à chacune des époques font ressortir des indicateurs du style de chaque traducteur. La recherche détaillée de type traductologique, portant sur les particules chinoises, appuyée sur des comparaisons textométriques, fournit une série d'indices révélant des approches spécifiques à chacun des traducteurs dans son travail. Les résultats de cette enquête, menée à travers la comparaison des trois versions chinoisesentre elles, puis avec le texte original français jettent les bases d'une proposition de modèle d'analyse centré sur le style du traducteur. Nous pensons que notre travail ouvre une voie à une exploration scientifique et systématique de la notion de style du traducteur dans le cadre traductologique.

APA, Harvard, Vancouver, ISO, and other styles

29

Yahiaoui, Abdelghani. "Conception et développement d'un outil d'aide à la traduction anglais/arabe basé sur des corpus parallèles." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE2042.

Full text

Abstract:

Dans cette thèse, nous abordons la réalisation d’un outil innovant d’aide à la traduction anglais/arabe pour répondre au besoin croissant en termes d’outils en ligne d’aide à la traduction centrés sur la langue arabe. Cet outil combine des dictionnaires adaptés aux spécificités de la langue arabe et un concordancier bilingue issu des corpus parallèles. Compte tenu de sa nature agglutinante et non voyellée, le mot arabe nécessite un traitement spécifique. C’est pourquoi, et pour construire nos ressources lexicales, nous nous sommes basés sur l’analyseur morphologique de Buckwalter qui, d’une part, permet une analyse morphologique en tenant compte de la composition complexe du mot arabe (proclitique, préfixe, radical, suffixe, enclitique), et qui, d’autre part, fournit des ressources traductionnelles permettant une réadaptation au sein d’un système de traduction. Par ailleurs, cet analyseur morphologique est compatible avec l’approche définie autour de la base de données DIINAR (DIctionnaire Informatisé de l’Arabe), qui a été construite, entre autres, par des membres de notre équipe de recherche. Pour répondre à la problématique du contexte dans la traduction, un concordancier bilingue a été développé à partir des corpus parallèles Ces derniers représentent une ressource linguistique très intéressante et ayant des usages multiples, en l’occurrence l’aide à la traduction. Nous avons donc étudié de près ces corpus, leurs méthodes d’alignement, et nous avons proposé une approche mixte qui améliore significativement la qualité d’alignement sous-phrastique des corpus parallèles anglais-arabes. Plusieurs technologies informatiques ont été utilisées pour la mise en œuvre de cet outil d’aide à la traduction qui est disponible en ligne (tarjamaan.com), et qui permet à l’utilisateur de chercher la traduction de millions de mots et d’expressions tout en visualisant leurs contextes originaux. Une évaluation de cet outil a été faite en vue de son optimisation et de son élargissement pour prendre en charge d’autres paires de langues
We create an innovative English/Arabic translation aid tool to meet the growing need for online translation tools centered on the Arabic language. This tool combines dictionaries appropriate to the specificities of the Arabic language and a bilingual concordancer derived from parallel corpora. Given its agglutinative and unvoweled nature, Arabic words require specific treatment. For this reason, and to construct our dictionary resources, we base on Buckwalter's morphological analyzer which, on the one hand, allows a morphological analysis taking into account the complex composition of the Arabic word (proclitic, prefix, stem, suffix, enclitic), and on the other hand, provides translational resources enabling rehabilitation in a translation system. Furthermore, this morphological analyzer is compatible with the approach defined around the DIINAR database (DIctionnaire Informatisé de l’Arabe - Computerized Dictionary for Arabic), which was constructed, among others, by members of our research team. In response to the contextual issue in translation, a bilingual concordancer was developed from parallel corpora. The latter represent a novel linguistic resource with multiple uses, in this case aid for translation. We therefore closely analyse these corpora, their alignment methods, and we proposed a mixed approach that significantly improves the quality of sub-sentential alignment of English-Arabic corpora. Several technologies have been used for the implementation of this translation aid tool which have been made available online (tarjamaan.com) and which allow the user to search the translation of millions of words and expressions while visualizing their original contexts. An evaluation of this tool has been made with a view to its optimization and its enlargement to support other language pairs

APA, Harvard, Vancouver, ISO, and other styles

30

Shen, Lionel. "Méthodes de veille textométrique multilingue appliquées à des corpus de l’environnement et de l’énergie : « Restitution, prévision et anticipation d’événements par poly-résonances croisées »." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCA085/document.

Full text

Abstract:

Cette thèse propose une série de méthodes de veille textométrique multilingue appliquées à des corpus thématiques. Pour constituer ce travail, deux types de corpus sont mobilisés : un corpus comparable et un corpus parallèle, composés de données textuelles extraites des discours de presse, ainsi que ceux des ONG. Les informations récupérées proviennent de trois mondes en trois langues différentes : français, anglais et chinois. La construction de ces deux corpus s’effectue autour de deux thèmes d’actualité ayant pour objet, l’environnement et l’énergie, avec une attention particulière sur trois notions : les énergies, le nucléaire et l’EPR. Après un bref rappel de l’état de l’art en intelligence économique, veille et textométrie, nous avons exposé les deux sujets retenus, les technicités morphosyntaxiques des trois langues dans les contextes nationaux et internationaux. Successivement, les caractéristiques globales, les convergences et les particularités de ces corpus ont été mises en évidence. Les dépouillements et les analyses qualitatives et quantitatives des résultats obtenus sont réalisés à l’aide des outils de la textométrie, notamment grâce aux analyses factorielles des correspondances, réseaux cooccurrentiels et poly-cooccurrentiels, spécificités du modèle hypergéométrique, segments répétés ou encore à la carte des sections. Ensuite, la veille bi-textuelle bilingue a été appliquée sur les trois mêmes concepts dans l’objectif de mettre en évidence les modes selon lesquels les corpus multilingues à caractère comparé et parallèle se complètent dans un processus de veille plurilingue, de restitution, de prévision et d’anticipation. Nous concluons notre recherche en proposant une méthode analytique par Objets-Traits-Entrées (OTE)
This thesis proposes a series of textometric multilingual information monitoring methods applied to thematic corpora (textometry is also called textual statistics or text data analysis). Two types of corpora are mobilized to create this work: a comparable corpus and a parallel corpus in which the textual data are extracted from the press and discourse of NGOs. The information source was retrieved from three countries in three different languages: English, French and Chinese. The two corpora were constructed on two topical issues concerning the environment and energy, with a focus on three concepts: energy, nuclear power and the EPR (European Pressurized Reactor or Evolutionary Power Reactor). After a brief review of the state of the art on business intelligence, information monitoring and textometry, we first set out the two chosen subjects – the environment and energy – and then the morphosyntactic features of the three languages in national and international contexts. The overall characteristics, similarities and peculiarities of these corpora are highlighted successively. The recounts and qualitative and quantitative analyses of the results were carried out using textometric tools, including factor analysis of correspondences, co-occurrences and polyco-occurrential networks, specificities of the hypergeometric model and repeated segments or map sections. Thereafter, bilingual bitextual information monitoring was applied to the same three concepts with the aim of elucidating how the comparable corpus and the parallel corpus can mutually help each other in a process of multilingual information monitoring, by restitution, forecasting and anticipation. We conclude our research by offering an analytical method called Objects-Features-Opening (OFO)

APA, Harvard, Vancouver, ISO, and other styles

31

Znotina, Inga. "Parodomieji įvardžiai lietuvių – latvių lygiagrečiajame tekstyne." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120626_145312-68508.

Full text

Abstract:

Šio darbo tyrimo objektas yra lietuvių ir latvių parodomieji įvardžiai, o tikslas – nustatyti jų santykius lietuvių – latvių kalbų vertimuose ir jų keliamas problemas vertėjams. Iškeliama tokia hipotezė: parodomieji įvardžiai vertimuose iš lietuvių kalbos į latvių kalbą dažniausiai, bet ne visada, yra keičiami latvių kalbos parodomaisiais įvardžiais. Darbo medžiaga yra dabar (nuo 2011 metų sausio iki 2012 metų gruodžio) Vytauto Didžiojo universiteto ir Latvijos universiteto bendrai rengiamas lietuvių – latvių lygiagretusis tekstynas. Į šį tekstyną įtraukiami lietuviškų tekstų vertimai į latvių kalbą. Iš šio tekstyno lygiagrečiųjų tekstynų konkordavimo programa ParaConc buvo sudaryti konkordansai pagal lietuvių parodomuosius įvardžius ir tirta, kaip jie verčiami į latvių kalba. Darbas susideda iš penkių dalių. Pirmoji dalis – įvadas, kuriame trumpai aprašytas tyrimo tikslas ir uždaviniai. Antrajame skyriuje nustatoma, kas yra parodomieji įvardžiai ir tarpusavyje lyginami šios grupės lietuviški ir latviški žodžiai. Trečias skyrius skirtas tyrimo metodikai ir panaudojamam tekstynui aprašyti. Ketvirtame skyriuje atliekama iš tekstyno išgautų duomenų analizė. Atskirai apžvelgtas lietuviškų parodomųjų įvardžių vertimas latviškais parodomaisiais įvardžiais; lietuviškų parodomųjų įvardžių vertimas kitais įvardžiais ir kitoms kalbos dalims priskiriamais žodžiais; lietuviškų parodomųjų įvardžių nevertimas. Penktoje dalyje pateikiami darbo rezultatai ir kelios rekomendacijos... [toliau žr. visą tekstą]
Lithuanian and Latvian demonstrative pronouns are the object of this paper. The aim is to identify their relations in Lithuanian – Latvian translations and the problems they may cause to the translators. The hypothesis is as follows: Lithuanian demonstrative pronouns in Lithuanian – Latvian translations are mostly but not always replaced by Latvian demonstrative pronouns. This research is based on the Lithuanian – Latvian parallel corpus which is now being prepared in two partner universities: Vytautas Magnus University and University of Latvia. In this corpus translations of Lithuanian texts into Latvian are being collected. Concordances of Lithuanian demonstrative pronouns are extracted from this corpus using concordancer ParaConc. It is studied how these pronouns are translated into Latvian. The paper consists of five chapters. The first one is introduction where the aim and tasks are shortly described. The second chapter presents demonstrative pronouns, the definitions of this group and the words that belong to it in Lithuanian and Latvian languages. The third chapter describes methodology and the corpus used in this research. In the fourth chapter analysis of the corpus data is performed. Translation of Lithuanian demonstrative pronouns as Latvian demonstrative pronouns; translation of Lithuanian demonstrative pronouns as other Latvian pronouns or other word classes; and discarding of Lithuanian demonstrative pronouns are discussed. Conclusions and some recommendations... [to full text]

APA, Harvard, Vancouver, ISO, and other styles

32

Musil, Jakub. "Automatická tvorba slovníků z překladových textů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237245.

Full text

Abstract:

Aim of this thesis is to implement system for translation words from source language into the target language with pair input texts. There are descriptions of terms and methods used in machine translation and machine build dictionary. The thesis also contains a concept and specification of each part created system including final evaluation. There is analysed options which make extension of existing dictionatry.

APA, Harvard, Vancouver, ISO, and other styles

33

Kouřil, Jan. "Paralelní korpusový manažer." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236928.

Full text

Abstract:

The goal of diploma project was to implement parallel corpus manager, which can align parallel texts in different languages and insert them into corpus, where several more processing functions are provided. Program provides possibilities of automatic text alignment and its interactive editing. These aligned texts are then inserted into corpus. Program can work with multiple corpora, parallel corpus is allways identified by a couple of languages. In corpus, there are possibilities to search by many categories, view and edit particular selections, lemmatize and morphologically tag given texts, sort selections, import and export data, in many ways edit corpus for further easy navigation and add new expressions to managed dictionaries. Particular chapters describe introduction to corpus problematics, theory of aligning parallel texts, morphological text tagging and lemmatization, external tools used in program, most common subtitle formats and implementation solution of particular problems.

APA, Harvard, Vancouver, ISO, and other styles

34

Phan, Thi Thanh Thao. "Machine translation of proper names from english and french into vietnamese : an error analysis and some proposed solutions." Thesis, Besançon, 2014. http://www.theses.fr/2014BESA1002/document.

Full text

Abstract:

Dans l'ère de l'information et de la connaissance, la traduction automatique (TA) devientprogressivement un outil indispensable pour transposer la signification d'un texte d'une langue source versune langue cible. La TA des noms propres (NP), en particulier, joue un rôle crucial dans ce processus,puisqu'elle permet une identification précise des personnes, des lieux, des organisations et des artefacts àtravers les langues. Malgré un grand nombre d'études et des résultats significatifs concernant lareconnaissance d'entités nommées (dont le nom propre fait partie) dans la communauté de TAL dans lemonde, il n'existe presque aucune recherche sur la traduction automatique des noms propres (TANP) pourle vietnamien. En raison des caractéristiques différentes d'écriture de NP, la translittération ou la transcription etla traduction de plusieurs de langues incluant l'anglais, le français, le russe, le chinois, etc. vers levietnamien, le TANP de ces langues vers le vietnamien est stimulant et problématique. Cette étude seconcentre sur les problèmes de TANP d’anglais vers le vietnamien et de français vers le vietnamienrésultant du moteurs courants de la TA et présente les solutions de prétraitement de ces problèmes pouraméliorer la qualité de la TA. A travers l'analyse et la classification d'erreurs de la TANP faites sur deux corpus parallèles detextes avec PN (anglais-vietnamien et français-vietnamien), nous proposons les solutions concernant deuxproblématiques importantes: (1) l'annotation de corpus, afin de préparer des bases de données pour leprétraitement et (2) la création d'un programme pour prétraiter automatiquement les corpus annotés, afinde réduire les erreurs de la TANP et d'améliorer la qualité de traduction des systèmes de TA, tels queGoogle, Vietgle, Bing et EVTran. L'efficacité de différentes méthodes d'annotation des corpus avec des NP ainsi que les tauxd'erreurs de la TANP avant et après l'application du programme de prétraitement sur les deux corpusannotés est comparés et discutés dans cette thèse. Ils prouvent que le prétraitement réduitsignificativement le taux d'erreurs de la TANP et, par la même, contribue à l'amélioration de traductionautomatique vers la langue vietnamienne
Machine translation (MT) has increasingly become an indispensable tool for decoding themeaning of a text from a source language into a target language in our current information and knowledgeera. In particular, MT of proper names (PN) plays a crucial role in providing the specific and preciseidentification of persons, places, organizations, and artefacts through the languages. Despite a largenumber of studies and significant achievements of named entity recognition in the NLP communityaround the world, there has been almost no research on PNMT for Vietnamese language. Due to the different features of PN writing, transliteration or transcription and translation from a variety of languages including English, French, Russian, Chinese, etc. into Vietnamese, the PNMT from those languages into Vietnamese is still challenging and problematic issue. This study focuses on theproblems of English-Vietnamese and French-Vietnamese PNMT arising from current MT engines. First,it proposes a corpus-based PN classification, then a detailed PNMT error analysis to conclude with somepre-processing solutions in order to improve the MT quality. Through the analysis and classification of PNMT errors from the two English-Vietnamese and French-Vietnamese parallel corpora of texts with PNs, we propose solutions concerning two major issues:(1)corpus annotation for preparing the pre-processing databases, and (2)design of the pre-processingprogram to be used on annotated corpora to reduce the PNMT errors and enhance the quality of MTsystems, including Google, Vietgle, Bing and EVTran. The efficacy of different annotation methods of English and French corpora of PNs and the results of PNMT errors before and after using the pre-processing program on the two annotated corporaare compared and discussed in this study. They prove that the pre-processing solution reducessignificantly PNMT errors and contributes to the improvement of the MT systems’ for Vietnameselanguage

APA, Harvard, Vancouver, ISO, and other styles

35

Cuofano, Letizia. "As equivalências no português e no italiano de verbos suecos com prefixos de origem germânica num corpus paralelo de textos escritos." Thesis, Stockholms universitet, Avdelningen för portugisiska, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-64270.

Full text

Abstract:

Os prefixos germânicos de alguns verbos suecos serão comparados numa análise contrastiva com as relativas equivalências em português e em italiano num corpus paralelo escrito composto por um romance de língua sueca, um de língua portuguesa e um de língua italiana e pelas suas respectivas traduções. As funções desenvolvidas pelos prefixos germânicos dos verbos suecos analisados serão examinadas e depois confrontadas com as relativas equivalências, com o resultado que também nas duas línguas românicas relevam-se, de maneira bastante constante, procedimentos gramaticais parecidos aos desenvolvidos pelos prefixos germânicos.
Germanic prefixes of which some Swedish verbs are composed are going to be compared in acontrastive analysis with their relative equivalences in Portuguese and Italian in a parallel written corpus characterized by a Swedish-language romance, a Portuguese-language romance and an Italian language romance, and by their relative translations. The functions executed by the German prefixes of the analysed Swedish verbs are going to be examined and then compared with their relative equivalences, with the result that even in the Romance languages it is possible to find in a quite constant way grammatical processes which are similar to those executed by the Germanic prefixes.
I prefissi germanici di alcuni verbi svedesi saranno comparati in un'analisi contrastiva con le relative equivalenze in portoghese e in italiano in un corpus parallelo scritto composto da un romanzo di lingua svedese, uno di lingua portoghese e uno di lingua italiana e dalle rispettive traduzioni. Le funzioni svolte dai prefissi germanici dei verbi svedesi analizzati saranno esaminate e poi confrontate con le relative equivalenze, con il risultato che anche nelle due lingue romanze si riscontrano in maniera abbastanza costante processi grammaticali simili a quelli svolti dai prefissi germanici.
De germanska prefix som återfinns i vissa svenska verb kommer att jämföras med sina motsvarigheter på portugisiska och italienska. Detta görs med hjälp av en skriven korpus bestående av en roman ursprungligen skriven på svenska, en skriven på portugisiska och en skriven på italienska samt översättningar av dessa romaner till de två andra språken. Funktionen hos de svenska verben med germanska prefix kommer att analyseras och sedan jämföras med verbens motsvarigheter. Resultatet av analysen visar att det är möjligt att finna systematiskt återkommande grammatiska processer i de romanska språken, som liknar de som förekommer i samband med de germanska prefixen på svenska.

APA, Harvard, Vancouver, ISO, and other styles

36

Dilbaitė, Indrė. "Konceptualiųjų metaforų vertimas lygiagrečiajame anglų-lietuvių kalbų ES dokumentų tekstyne." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2010~D_20100617_112544-87348.

Full text

Abstract:

Šiame magistro darbe analizuojamos konceptualiosios metaforos, atrinktos iš Europos Sąjungos dokumentų tekstyno anglų ir lietuvių kalbomis, ir jų vertimas. Oficialiuose tekstuose paprastai nesitikima rasti daug kalbos gražinimo priemonių, tokių kaip metaforos. Tačiau jos yra sudėtinė visų kalbų dalis, o konceptualioji yra ypatinga tuo, kad gali būti prigijusi pačiose netikėčiausiose srityse, ir, jei neatliekami tam tikri tyrimai, gali likti nepastebėta. Darbe apibrėžta metaforos samprata, nurodomos jos rūšys. Išskiriama konceptualioji metafora ir kelios jos klasifikacijos – pagal konvencionalumą, atliekamą funkciją (skirstomos į struktūrines, ontologines ir erdvines) ir apibrėžtumo laipsnį. Aprašius atrankos būdus ir kriterijus, dažniniuose dvižodžių ir trižodžių junginių sąrašuose atrinktos konceptualiosios metaforos. Kiekviena pagal atliekamą funkciją priskirta struktūrinėms, ontologinėms arba erdvinėms. Lygiagrečiajame anglų-lietuvių kalbų ES dokumentų tekstyne buvo ieškoma kiekvienos konceptualiosios metaforos vertimo, siekiant nustatyti, ar junginiai išlaikė savo konceptualumą; ar jį įgijo tik vertime; ar vertime jis pranyko. Nustatyta, kad iš dažniausiai pasikartojančių ES dokumentų tekstyne konceptualiųjų metaforų anglų ir lietuvių kalbomis vertime išlieka didžioji dalis, tai yra atitinkamai 61% ir 69%. Tipiškiausios, dažniausiai vartojamos konceptualiosios metaforos lygiagrečiajame tekstyne verčiamos labai vienodai.
This research is based on conceptual metaphors that were manually extracted from the English-Lithuanian corpus of European Union documents, the translation was analyzed. Normally it is uncommon to find many figures of speech in official texts, but metaphor is a component of all languages. The conceptual metaphor is naturalized in most uncommon areas without being noticed unless specifically investigated. Conception of metaphor and its types are defined in this work. Conceptual metaphor is presented, as well as possible classifications – by conventionality, by cognitive function they perform (classified into structural, ontological and orientational metaphors) as well as generality of metaphor. After presenting the identification criteria and methods, conceptual metaphors were extracted from the frequency lists of two-word and three-word combinations. Each conceptual metaphor was analyzed and classified as structural, ontological or orientational in accordance with the functions they perform. Translation of each metaphor was located in the English-Lithuanian parallel corpus of EU documents, in order to determine if the combinations retained their conceptuality; if it was obtained only in translation; if it vanished in translation. It was discovered that the majority of the most frequently used English and Lithuanian conceptual metaphors remained in translation, 61% and 69% respectively. The most typical, most frequently used conceptual metaphors are translated in the parallel... [to full text]

APA, Harvard, Vancouver, ISO, and other styles

37

Dalunde, Tilda. "Minnen från en parallell framtid." Thesis, Konstfack, Ädellab/Metallformgivning, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:konstfack:diva-4695.

Full text

Abstract:

Vi lever i en ömtålig vardag. Vi gör den än ömtåligare genom vårt sätt att leva. Det är ingen idé att jag säger det med ord; jag har redan sagt det så många gånger att människorna runt omkring mig har slutat lyssna. Kanske är objekt en bättre ingång till samtal. I det här arbetet har jag, genom såväl text som praktiskt arbete inom corpuskonstfältet, undersökt vad som händer med oss när vardagen faller sönder och kaos utbryter. Genom en startpunkt i klimatkatastrofen år 536, som ledde till att närmare hälften av Nordens befolkning dog, har jag spekulerat kring om samma sak skulle hända idag, eller kanske att det händer idag. Resursbrist leder alltid till våld. Trots att vi vet det fortsätter vi knapra i oss jorden en liten bit i taget. Vad är tanken att vi ska göra när den tar slut?
We live in a fragile everyday. We make it even more fragile by the way we live it. There is no point in saying it with words any more, I've already tried that so many times that people have stopped listening. Maybe objects are a better way to start a conversation. In this project, that consists of this thesis and the physical body of work "Memories from a parallel future", I've been investigating what happens to us when the everyday falls apart and chaos erupts. With a starting point in the climate-crisis of the year 536, that led to the death of almost half of the Norse population, I've been speculating what would have happened today. Or maybe that it is actually happening today. Depletion of resources always results in violence. We know this, but still we keep nibbling at the earth, a little chunk at a time. What do we plan to do when there is nothing left?

Bilder av verk av konstnärerna Iain Baxter&, Naoko Ito och Luiana Rondolini har tagits bort av upphovsrättsliga skäl. Titlarna på verken står dock kvar.

APA, Harvard, Vancouver, ISO, and other styles

38

Vialla, Bastien. "Contributions à l'algèbre linéaire exacte sur corps finis et au chiffrement homomorphe." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS112.

Full text

Abstract:

Cette thèse est composée de deux axes principaux, le premier portant sur le chiffrement homomorphe et le second sur l’algèbre linéaire creuse sur corps finis. Avec l’essor des technologies de communication et en particulier d’internet, de nouveaux protocoles de chiffrement sont développés. En particulier, le besoin de systèmes de chiffrement permettant de manipuler les données chiffrées tout en assurant leur sécurité. C’est dans ce contexte que des systèmes de chiffrement homomorphe sont développés, ces protocoles permettent d’effectuer des calculs avec des données chiffrées. La sécurité de ce type système repose sur l’ajout de bruit aux messages à chiffrer. Ce bruit augmente avec chaque opération effectuée, mais il ne doit pas dépasser un certain seuil. Pour contourner ce problème, une technique nommée bootstrapping est utilisée permettant de réduire le bruit d’un chiffré. Les bootstrappings sont le goulot d’étranglement lors des calculs sur des données chiffrées, il est important d’en faire le moins possible. Or la quantité de bootstrappings à faire est déterminée par la nature des calculs à effectuer ainsi que du protocole de chiffrement utilisé.C’est dans ce contexte que notre travail intervient, nous proposons une méthode effective pour réduire le nombre bootstrappings basé sur la programmation linéaire en nombre entier. Cette méthode s’adapte à un grand nombre de protocoles de chiffrement. De plus, nous effectuons une analyse de la complexité de ce problème en montrant qu’il est APX-complet et nous fournissons un algorithme d’approximation.La résolution de système linéaire sur corps finis est une brique de calcul essentielle dans de nombreux problèmes de calcul formel. En particulier, beaucoup de problèmes produisent des matrices comprenant un grand nombre de zéros, on dit qu’elles sont creuses. Les meilleurs algorithmes permettant de résoudre ce type de système linéaire creux sont des algorithmes dits itératifs. L’opération fondamentale de ces algorithmes itératifs est la multiplication de la matrice par un vecteur ou une matrice dense. Afin d’obtenir les meilleures performances, il est important de tenir compte des propriétés (SIMD, multicoeurs, hiérarchie des caches ....) des processus modernes .C’est dans ce contexte que notre travail intervient, nous étudions la meilleure façon d’implanter efficacement cette opération sur les processeurs récents.Nous proposons un nouveau format permettant de tenir compte du grand nombre de +- 1 présents dans une matrice.Nous proposons une implantation parallèle basée sur le paradigme du vol de tâche offrant un meilleur passage à l’échelle que le parallélisme par threads.Nous montrons comment exploiter au mieux les instructions SIMD des processeurs dans les différentes opérations.Finalement, nous proposons une méthode efficace permettant d’effectuer cette opération lorsque le corps finis est multiprécision (les éléments sont stockés sur plusieurs mots machine) en ayant recours au système de représentation RNS
This thesis is composed of two independent parts.The first one is related to homomorphic encryption and the second part deal with sparse linear algebra on finite fields.Homomorphic encryption extends traditional encryption in the sense that it becomes feasible to perform operations on ciphertexts, without the knowledge of the secret decryption key. As such, it enables someone to delegate heavy computations on his sensitive data to an untrusted third party, in a secure way. More precisely, with such a system, one user can encrypt his sensitive data such that the third party can evaluate a function on the encrypted data, without learning any information on the underlying plain data. Getting back the encrypted result, the user can use his secret key to decrypt it and obtain, in clear, the result of the evaluation of the function on his sensitive plain data. For a cloud user, the applications are numerous, and reconcile both a rich user experience and a strong privacy protection.The first fully homomorphic encryption (FHE) scheme, able to handle an arbitrary number of additions and multiplications on ciphertexts, has been proposed by Gentry in 2009.In homomorphic encryption schemes, the executed function is typically represented as an arithmetic circuit. In practice, any circuit can be described as a set of successive operation gates, each one being either a sum or a product performed over some ring.In Gentry’s construction, based on lattices, each ciphertext is associated with some noise, which grows at each operation (addition or multiplication) done throughout the evaluation of the function. When this noise reaches a certain limit, decryption is not possible anymore.To overcome this limitation, closely related to the number of operations that the HE.Eval procedure can handle, Gentry proposed in a technique of noise refreshment called“bootstrapping”.The main idea behind this bootstrapping procedure is to homomorphically run the decryptionprocedure of the scheme on the ciphertext, using an encrypted version of the secret key. In this context, our contribution is twofold. We first prove that the lmax-minimizing bootstrapping problem is APX-complete and NP-complete for lmax ≥ 3. We then propose a new method to determine the minimal number of bootstrappings needed for a given FHE scheme and a given circuit.We use linear programming to find the best outcome for our problem. The main advantage of our method over the previous one is that it is highly flexible and can be adapted for numerous types of homomorphic encryption schemes and circuits.Computing a kernel element of a matrix is a fundamental kernel in many computer algebra and cryptography algorithms. Especially, many applications produces matrices with many matrix elements equals to 0.Those matrices are named sparse matrices. Sparse linear algebra is fundamentally relying on iterative approaches such as Wiedemann or Lanczos. The main idea is to replace the direct manipulation of a sparse matrix with its Krylov subspace. In such approach, the cost is therefore dominated by the computation of the Krylov subspace, which is done by successive product of a matrix by a vector or a dense matrix.Modern processor unit characteristics (SIMD, multicores, caches hierarchy, ...) greatly influence algorithm design.In this context our work deal with the best approach to design efficient implementation of sparse matrix vector product for modern processors.We propose a new sparse matrix format dealing with the many +-1 matrix elements to improve performance.We propose a parallel implementation based on the work stealing paradigm that provide a good scaling on multicores architectures.We study the impact of SIMD instructions on sparse matrix operations.Finally, we provide a modular arithmetic implementation based on residue number system to deal with sparse matrix vector product over multiprecision finite fields

APA, Harvard, Vancouver, ISO, and other styles

39

Samuelsson, Thomas. "The Russian Verbal Prefix v- and Circumfix v- -sja in Space : A Contrastive Study between Russian and Swedish." Thesis, Stockholms universitet, Slaviska språk, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-155815.

Full text

Abstract:

Den här studien undersöker det ryska verbprefixet v(o)- och cirkumfixet v(o)- -sja i det konkreta fysiska rummet. Syftet med den kontrastiva studien är att undersöka och beskriva betydelser. Tvåspråkig data från en samtida rysk-svensk ordbok analyseras med Krongauz metod. En lista över verbaffixens betydelser byggs upp genom att jämföra lexikala betydelser och morfosyntaktiska konstruktioner för verben i båda språken. Resultatet visar att affixens betydelser kan delas in i följande kategorier: Spatiala rörelser in i ett slutet rum, Spatiala rörelser till en avgränsad yta, Spatiala rörelser mot en närhet, Vidhäftning och Platser i det fysiska rummet.
This present study investigates the Russian verbal prefix v(o)- and circumfix v(o)- -sja in the concrete physical space. The aim of the contrastive study is to explore and describe meanings. Bilingual data, extracted from a contemporary Russian-Swedish dictionary, is analysed by using Krongauz’s method. A list of meanings of the Russian verbal affixes is built by comparing similarities and differences between lexical meanings and morphosyntactic structures for the verbs in both languages. The result shows that the meanings of the affixes can be divided into the following categories: Spatial movements into an enclosed space, Spatial movements onto a delimited surface, Spatial movements towards a vicinity, Adhesion and Locations in physical space.

APA, Harvard, Vancouver, ISO, and other styles

40

Phan, Thi Thanh Thao. "Machine translation of proper names from english and french into vietnamese : an error analysis and some proposed solutions." Electronic Thesis or Diss., Besançon, 2014. http://indexation.univ-fcomte.fr/nuxeo/site/esupversions/8ded02fb-eae4-4c01-8ded-ede048ac2a4d.

Full text

Abstract:

Dans l'ère de l'information et de la connaissance, la traduction automatique (TA) devientprogressivement un outil indispensable pour transposer la signification d'un texte d'une langue source versune langue cible. La TA des noms propres (NP), en particulier, joue un rôle crucial dans ce processus,puisqu'elle permet une identification précise des personnes, des lieux, des organisations et des artefacts àtravers les langues. Malgré un grand nombre d'études et des résultats significatifs concernant lareconnaissance d'entités nommées (dont le nom propre fait partie) dans la communauté de TAL dans lemonde, il n'existe presque aucune recherche sur la traduction automatique des noms propres (TANP) pourle vietnamien. En raison des caractéristiques différentes d'écriture de NP, la translittération ou la transcription etla traduction de plusieurs de langues incluant l'anglais, le français, le russe, le chinois, etc. vers levietnamien, le TANP de ces langues vers le vietnamien est stimulant et problématique. Cette étude seconcentre sur les problèmes de TANP d’anglais vers le vietnamien et de français vers le vietnamienrésultant du moteurs courants de la TA et présente les solutions de prétraitement de ces problèmes pouraméliorer la qualité de la TA. A travers l'analyse et la classification d'erreurs de la TANP faites sur deux corpus parallèles detextes avec PN (anglais-vietnamien et français-vietnamien), nous proposons les solutions concernant deuxproblématiques importantes: (1) l'annotation de corpus, afin de préparer des bases de données pour leprétraitement et (2) la création d'un programme pour prétraiter automatiquement les corpus annotés, afinde réduire les erreurs de la TANP et d'améliorer la qualité de traduction des systèmes de TA, tels queGoogle, Vietgle, Bing et EVTran. L'efficacité de différentes méthodes d'annotation des corpus avec des NP ainsi que les tauxd'erreurs de la TANP avant et après l'application du programme de prétraitement sur les deux corpusannotés est comparés et discutés dans cette thèse. Ils prouvent que le prétraitement réduitsignificativement le taux d'erreurs de la TANP et, par la même, contribue à l'amélioration de traductionautomatique vers la langue vietnamienne
Machine translation (MT) has increasingly become an indispensable tool for decoding themeaning of a text from a source language into a target language in our current information and knowledgeera. In particular, MT of proper names (PN) plays a crucial role in providing the specific and preciseidentification of persons, places, organizations, and artefacts through the languages. Despite a largenumber of studies and significant achievements of named entity recognition in the NLP communityaround the world, there has been almost no research on PNMT for Vietnamese language. Due to the different features of PN writing, transliteration or transcription and translation from a variety of languages including English, French, Russian, Chinese, etc. into Vietnamese, the PNMT from those languages into Vietnamese is still challenging and problematic issue. This study focuses on theproblems of English-Vietnamese and French-Vietnamese PNMT arising from current MT engines. First,it proposes a corpus-based PN classification, then a detailed PNMT error analysis to conclude with somepre-processing solutions in order to improve the MT quality. Through the analysis and classification of PNMT errors from the two English-Vietnamese and French-Vietnamese parallel corpora of texts with PNs, we propose solutions concerning two major issues:(1)corpus annotation for preparing the pre-processing databases, and (2)design of the pre-processingprogram to be used on annotated corpora to reduce the PNMT errors and enhance the quality of MTsystems, including Google, Vietgle, Bing and EVTran. The efficacy of different annotation methods of English and French corpora of PNs and the results of PNMT errors before and after using the pre-processing program on the two annotated corporaare compared and discussed in this study. They prove that the pre-processing solution reducessignificantly PNMT errors and contributes to the improvement of the MT systems’ for Vietnameselanguage

APA, Harvard, Vancouver, ISO, and other styles

41

Mellquist, Simone. "Ryska gerundier i översättning till och från svenska : Implicita och explicita betydelser." Thesis, Stockholms universitet, Slaviska språk, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-153972.

Full text

Abstract:

Swedish lacks gerunds and is therefore suitable for studying implicit meanings of Russian gerunds that are forced to become explicit in Swedish translations. The present contrastive parallel corpus study explores translation correspondences in both directions. It is shown that constructions with finite verbs of perfective aspect followed by imperfective gerunds largely correspond to Swedish absolute withconstructions (= Swedish med-constructions). Another finding is the insertion of extra gerunds in connection with translation of Swedish locative constructions. A classification of Swedish explicit time markers corresponding to converb constructions is structured according to time relations (taxis relations): perfective aspect converbs show a clear correspondence to anterior markers, and imperfective converbs correlate with simultaneity markers. Contextual secondary meanings like means, purpose, cause, consequence are analyzed and various structures are found

APA, Harvard, Vancouver, ISO, and other styles

42

Stankevičius, Kęstutis. "Lygiagrečių tekstynų kūrimo interaktyvios informacinės sistemos." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120723_105613-14222.

Full text

Abstract:

Šio magistro darbo užduotis – apžvelgti šiuo metu labiausiai naudojamas vartotojo sąsajas, kurios padeda žmogui sąveikauti su kompiuteriais ir kitais įrenginiais bei sąsajų architektūros būdus, kurie palengvina programų kūrimą. Taip pat išanalizuoti šiuo metu plačiausiai naudojamus metodus interneto paslaugoms įgyvendinti, kad būtų rastas sprendimas, kaip interaktyvios informacinės sistemos galėtų bendrauti tarpusavyje be apribojimų reikiamam funkcionalumui gauti pasirenkant geriausią būdą saugoti ir atvaizduoti reikiamus programos duomenis kuo paprastesniu ir lankstesniu būdu. Sukurti lygiagrečių tekstynų prototipą, kuris leistų matyti gautą rezultatą su galimybe kuo lengviau ir greičiau rasti bei koreguoti automatiškai sugeneruotus netikslumus, jei tokie yra, pritaikant sąsają, kuri būtų patogesnė ir reikalautų kuo mažiau darbo pastangų. Pasinaudojant prototipu atlikti tyrimą, kuris parodytų įvesties įrenginių naudojimo tendencijas. Darbą sudaro 8 dalys: įvadas, vartotojo sąsajų apžvalga, vartotojo sąsajos atskyrimas, interneto paslaugų analizė, XML duomenų bazės, vartotojo sąsajos kūrimas, išvados ir literatūros sąrašas. Darbo apimtis – 48 p. teksto be priedų, 25 paveikslai ir 2 lentelės. Atskirai pridedami 2 darbo priedai.
The purpose of this thesis is to review most currently used user interfaces that help people interact with computers and other equipment, and begin exploring new user interface paradigm, which allows humans to interact naturally with the computer. Furthermore, analyze the most widely used methods today for implementing web services, to find a solution how interactive information systems could communicate with each other without any restrictions to gain an overall result choosing the best way to store and display relevant data to the program simpler and more flexible way. Create an interactive parallel corpus development environment prototype for minimizing available errors, if they occur, from the generated parallel translation as easy as possible using as less human labor as possible. Using the prototype, perform a study that will show trends in the use of different interface input devices. The work consists of 8 parts: introduction, overview of user interfaces, user interface separation, web services analysis, XML databases, user interface development, conclusions and references. Thesis consists of: 48 pages of text without appendixes, 25 pictures and 2 tables. Two enclosures of the work are enclosed separately.

APA, Harvard, Vancouver, ISO, and other styles

43

Chrétien, Benjamin. "Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT315/document.

Full text

Abstract:

Un robot humanoïde est un système complexe doté de nombreux degrés de liberté, et dont le comportement est sujet aux équations non linéaires du mouvement. Par conséquent, la planification de mouvement pour un tel système est une tâche difficile d'un point de vue calculatoire. Dans ce mémoire, nous avons pour objectif de développer une méthode permettant d'utiliser la puissance de calcul des GPUs dans le contexte de la planification de mouvement corps-complet basée sur de l'optimisation. Nous montrons dans un premier temps les propriétés du problème d'optimisation, et des pistes d'étude pour la parallélisation de ce dernier. Ensuite, nous présentons notre approche du calcul de la dynamique, adaptée aux architectures de calcul parallèle. Cela nous permet de proposer une implémentation de notre problème de planification de mouvement sur GPU: contraintes et gradients sont calculés en parallèle, tandis que la résolution du problème même se déroule sur le CPU. Nous proposons en outre une nouvelle paramétrisation des forces de contact adaptée à notre problème d'optimisation. Enfin, nous étudions l'extension de notre travail au contrôle prédictif
A humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control

APA, Harvard, Vancouver, ISO, and other styles

44

Fernández, Sánchez Francesc. "El Folleto de cursos de idiomas para extranjeros: análisis contrastivo (alemán-español) por tipos de emisor y subtextos." Doctoral thesis, Universitat Pompeu Fabra, 2005. http://hdl.handle.net/10803/7581.

Full text

Abstract:

The translationally relevant aim of this PhD is to account for the genre conventions of the LCLF mainly related to the persuasive and directive functions, by analyzing a bilingual corpus of parallel texts according to the method of contrastive textology. Genre conventions in this case are considered by sender types (public vs. private) and subtexts (text constituents functionally, semantically and formally defined) on the hypothesis that they will vary more depending on the sender type than on the language.
The intralinguistic and interlinguistic analysis of the macrostructure and the recurrent textual segments, as well as of the functions (persuasive, referential and directive) characterizing both the LCLF as a persuasive leaflet and its three subtexts does not confirm the hypothesis. It does reflect, however, that the directive and persuasive functions prevail respectively in the public and private sender leaflets, as well as in those belonging to the Spanish and German subcorpora.
Esta tesis se plantea el objetivo traductivamente relevante de dar cuenta de las convenciones del FCIE, vinculadas principalmente a las funciones persuasiva y directiva, analizando un corpus bilingüe de textos paralelos según el método de la textología contrastiva. Dichas convenciones se ven consideradas por tipos de emisor (público y privado) y subtextos (unidades constitutivas del texto funcional, semántica y formalmente definidas) a partir de la hipótesis de que diferirán más dependiendo del tipo de emisor que de la lengua.
El análisis intralingüístico e interlingüístico de la macroestructura y los segmentos textuales recurrentes, así como de las funciones (persuasiva, referencial y directiva) que caracterizan tanto el FCIE, en cuanto que folleto persuasivo, como sus tres subtextos no permite confirmar esa hipótesis. No obstante, sí evidencia cómo las funciones directiva y persuasiva priman respectivamente en los ejemplares de emisor público y privado, así como en los de los subcorpus español y alemán.

APA, Harvard, Vancouver, ISO, and other styles

45

Orenha, Adriane [UNESP]. "Unidades fraseológicas especializadas: colocações e colocações estendidas em contratos sociais e estatutos sociais traduzidos no modo juramentado e não-juramentado." Universidade Estadual Paulista (UNESP), 2009. http://hdl.handle.net/11449/103524.

Full text

Abstract:

Made available in DSpace on 2014-06-11T19:32:45Z (GMT). No. of bitstreams: 0 Previous issue date: 2009-05-26Bitstream added on 2014-06-13T20:24:00Z : No. of bitstreams: 1 orenha_a_dr_sjrp.pdf: 2083225 bytes, checksum: d8f591d9558b95f175aa9e7d6591f835 (MD5)
Esta pesquisa visa realizar um estudo a respeito dos termos, colocações e colocações especializadas estendidas presentes em contratos sociais e estatutos sociais que representam os corpora de pesquisa. Nesta pesquisa, também observaremos as semelhanças e diferenças nos corpora de traduções jurídicas e juramentadas, no que concerne ao uso desses termos e padrões lexicais, assim como apontaremos aqueles que são mais frequentemente empregados em documentos do tipo contrato social e estatuto social. A investigação baseia-se na abordagem interdisciplinar dos Estudos da Tradução Baseados em Corpus, da Linguística de Corpus, da Fraseologia, de modo mais específico das colocações, das colocações especializadas e das unidades fraseológicas especializadas. A Terminologia, por meio de seus pressupostos teóricos, também traz sua contribuição para a pesquisa, assim como os trabalhos sobre a tradução juramentada. Uma das motivações que delineia este estudo reside no fato de a tradução juramentada ser considerada de grande relevância nas relações comerciais, sociais e jurídicas entre as nações. Para realizar este estudo, compilamos um corpus de estudo (CE1) constituído por contratos sociais e estatutos sociais traduzidos no modo juramentado, nas direções tradutórias inglês português e português inglês, extraídos de Livros de Registro de Traduções, pertencentes a tradutores juramentados credenciados pela Junta Comercial de dois Estados brasileiros; e um corpus de estudo (CE2) formado por documentos de mesma natureza traduzidos sem o processo de juramentação, nas mesmas direções tradutórias. Além destes corpora, construímos dois corpora comparáveis, formados pelos referidos documentos originalmente escritos em português e em inglês. Os resultados desta pesquisa mostraram várias semelhanças, no tocante aos termos empregados em documentos traduzidos...
This investigation aims at carrying out a study on terms, collocations and extended specialized collocations present in articles of incorporation/articles of organization/articles of association and bylaws that represent our research corpora. We will also observe similarities and differences in sworn and legal translation corpora, which concerns the use of such terms and lexical patterns, as well as point out the ones which are more frequently used in the focused documents. This research derives its theoretical and methodological sources from Corpus-Based Translation Studies, Corpus Linguistics, Phraseology, more specifically from collocations, specialized collocations and specialized phraseological units (SPUs). Terminology, from its theoretical standpoint, also offers its contribution to this study, as well as essays on sworn translation. One of the aspects that motivates this study is the fact that sworn translation is considered to be of great relevance to commercial, social and legal relations among nations. To conduct this research, we compiled a study corpus (CE1) composed of articles of incorporation/articles of organization/articles of association and bylaws submitted to the process of sworn translation in the English Portuguese and Portuguese English directions, excerpted from the Books of Sworn Translation Records, made available by five Brazilian sworn translators, duly sworn by the Board of Trade of two Brazilian States; a study corpus (CE2) made up of documents of the same nature not submitted to the process of sworn translation, in the same translation directions. Besides these corpora, we also built two comparable corpora formed by the referred documents originally written in Portuguese and in English. The results obtained in this research showed some similarities which refer to the terms used in documents submitted to the process of sworn translation... (Complete abstract click electronic access below)

APA, Harvard, Vancouver, ISO, and other styles

46

Fawi, Fathi Hassan Ahmed <1982&gt. "Le variazioni terminologiche in un corpus giuridico parallelo italiano-arabo: studio linguistico-computazionale." Doctoral thesis, Università Ca' Foscari Venezia, 2016. http://hdl.handle.net/10579/10274.

Full text

Abstract:

La presente tesi si pone l’obiettivo di studiare le variazioni terminologiche in un corpus giuridico parallelo italiano-arabo, adottando un approccio linguistico-computazionale. La tesi parte dall’assunto che anche i linguaggi specialistici presentano delle variazioni a livello lessicale come è il caso della lingua comune, contrastando quindi con la teoria generale della terminologia basata sul principio di monosemia e univocità secondo il quale i termini, al fine di evitare qualsiasi ambiguità comunicativa, non devono subire delle variazioni. Nel lavoro la componente linguistica si integra con i metodi statistici: mentre la parte linguistica si evidenzia nei capitoli riguardanti la variazione nei discorsi specializzati, i procedimenti di formazione delle parole in italiano e in arabo e l’analisi delle variazioni estratte dal corpus parallelo, le misure statistiche vengono adoperate, oltre che nella creazione e nell’annotazione del corpus parallelo, nell'estrazione, sia monolingue che bilingue, dei termini dal corpus e nell'individuazione delle variazioni terminologiche.

APA, Harvard, Vancouver, ISO, and other styles

47

Giesta, Letícia Caporlíngua. "Tradução pedagógica e letramento acadêmico com o uso de corpus paralelo." reponame:Repositório Institucional da UFSC, 2014. https://repositorio.ufsc.br/xmlui/handle/123456789/129655.

Full text

Abstract:

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Programa de Pós-Graduação em Estudos da Tradução, Florianópolis, 2014
Made available in DSpace on 2015-02-05T21:20:52Z (GMT). No. of bitstreams: 1 329921.pdf: 6308350 bytes, checksum: 3938506d34195124b40a7cc4652fa59e (MD5) Previous issue date: 2014
Este estudo objetiva, com base em um corpus paralelo, analisar a tradução de padrões colocacionais frequentes da área de Física, com vistas a promover tradução pedagógica e auxiliar no letramento acadêmico de estudantes envolvidos com esta área. O corpus se constitui por 434 resumos de teses de doutorado da área de Física com seus respectivos abstracts, totalizando 868 textos. Foram analisados padrões colocacionais formados por quatro palavras em dois sistemas computacionais seguindo as três categorias utilizadas por Hyland (2008a): orientados pela pesquisa, orientados pelo texto e orientados pelo participante; e ações empregadas na tradução desses padrões colocacionais, com base nas estratégias de tradução sugeridas por Baker (1992). Argumenta-se que na tradução pedagógica através do corpus paralelo é possível promover o envolvimento reflexivo de docentes e discentes em práticas pedagógicas e linguísticas que busquem amenizar perspectivas divergentes no contato com os textos trabalhados, desenvolvendo atitudes que possam favorecer entendimentos no trato de práticas situadas abrangendo leitura e escrita, assim como, nas relações sociais no ensinar e aprender que venham a auxiliar no letramento acadêmico de estudantes de graduação. Os resultados revelam que os padrões colocacionais com quatro palavras utilizados nos resumos/abstracts do corpus analisado mostram uma tendência da área de refletir na linguagem acadêmica sua visão de ciência, apresentando 74% dos marcadores orientados pelo texto e raros marcadores orientados pelo participante; bem como, escolhas de tradução que mantiveram em sua maioria as funções linguísticas dos padrões colocacionais na língua-fonte e na língua-alvo. A utilização de diferentes estratégias na tradução permite reflexão acerca de tomadas de decisões dos autores dos textos. A análise dos dados e a discussão teórica favorecem a resposta ao objetivo deste estudo, assim como ao argumento de tese, instigando questionamentos sobre a linguagem acadêmica da área de Física para identificar aspectos culturais dessa comunidade e auxiliar no letramento acadêmico de discentes.

Abstract:The objective of this study is: to analyze, based on parallel corpus, the translation of frequent clusters in the Physics area in order to promote pedagogical translation and assist in the academic literacy of students involved in this area. The corpus is constituted by 434 doctoral dissertation abstracts in Physics in Portuguese with their respective translations to English, in a total of 868 texts. Forty-nine 4-word clusters were analyzed in two computational systems following the three categories suggested by Hyland (2008a): research-oriented, text-oriented and participant-oriented; as well as actions employed in the translation of these clusters based on the translation strategies suggested by Baker (1992). It is argued that in pedagogical translation through parallel corpus, it is possible to promote reflexive involvement of professors and students in pedagogical and linguistic practices that try to reduce divergent perspectives when in contact with texts. In these practices, the development of attitudes may create opportunities to better comprehend what they read in situated practices involving reading and writing, and also in social relations in teaching and learning that help the academic literacy of undergraduate students. The results reveal that 4-word clusters used in the abstracts analyzed in the corpus show a tendency of the area to reflect its science view in the academic language as 74% of the markers are text-oriented and the participant-oriented are rare. They also reveal that the translation choices have maintained in its majority the linguistic functions of the clusters in the source and target languages. The use of different translation strategies allows reflection towards decisions from authors of the texts. The data analysis and theoretical discussion provide elements to achieve the objective this study, instigating questions about academic language in the Physics area to identify cultural aspects of this community and assist students' academic literacy.

APA, Harvard, Vancouver, ISO, and other styles

48

Svášek, Martin. "Définitions, élaboration et exploitation d'un corpus parallèle bidirectionnel français-tchèque tchèque français." Paris, INALCO, 2007. http://www.theses.fr/2007INAL0020.

Full text

Abstract:

D’abord, nous introduisons le concept de corpus parallèle. Fratchèque est un corpus parallèle de ressources écrites dont les textes en français et en tchèque proviennent de la littérature écrite après 1945. Il ne contient pas de balises XML, le logiciel ParaConc utilisé pour le traitement du corpus n’en a pas besoin. L’élaboration du corpus est décrite d’une façon détaillée en suivant toutes les démarches et tout le paramétrage des logiciels utilisés. Elle commence avec le logiciel de reconnaissance optique de caractères FineReader et après le contrôle de la qualité des textes numérisés sous MS Word 2002 on procède à la constitution d’un corpus parallèle géré par ParaConc. La partie linguistique de la thèse s’appuie sur le corpus parallèle réalisé. Elle aborde un phénomène connu en tchèque sous le terme částice qui n’a d’équivalent univoque en français. Les termes le plus souvent liés en français à la question sont mots du discours et particules énonciatives. Selon les descriptions existantes, il y a une relation étroite entre ces mots et le discours. Cette constatation est démontrée pour deux částice – vždyt̕, přece et leurs variantes – sur les grands corpus tchèques (Analyse A) et Fratchèque (Analyse B). L’étude continue avec l’analyse systématique des types variés d’usage de vždyt̕, přece dans le but de proposer une description lexicographique pour un dictionnaire bilingue tchèque-français. Quelques exercices basés sur les résultats de l’étude montrent comment utiliser le corpus bilingue dans la didactique des langues. Enfin, on discute quelques questions qui concernent la possibilité d’évaluer automatiquement la qualité de traductions liées à la présence de částice
At the beginning the concept of a parallel corpus is defined. French and Czech texts forming the parallel Fratchèque corpus come from literature; only texts after the year 1945 have been selected. Fratchèque is not marked up explicitly by XML tags because the tagging is not necessary for the proper functioning of the corpus manager ParaConc. The building-up of the corpus is thoroughly described following all steps and settings of the software used. The process starts with the optical character recognition program FineReader and, after checking the accuracy of numerical texts by using MS Word 2002, it goes on building up a corpus managed by ParaConc. The linguistic investigations of the thesis rely primarily on the realization of a parallel corpus. The main purpose is to tackle a phenomenon that is known in Czech as částice but has no direct equivalent in French. The most frequent terms used in the French approach are mots du discours and particules énonciatives. The existing descriptions suggest a close relationship between these words and the discourse. It is demonstrated on two Czech částice - přece, vždyt̕ and their variants - using huge Czech corpora (Analysis A) and Fratchèque (Analysis B). The study continues analysing systematically all kind of usage of vždyt̕, přece in order to present lexicographical description for a bilingual Czech-French dictionary. Through some exercices based on the results of the linguistic analysis it is shown how to use the bilingual corpus in teaching foreign languages. Finally, some issues concerning automatic evaluation of translation quality are discussed taking into account the work with částice

APA, Harvard, Vancouver, ISO, and other styles

49

Ramnäs, Mårten. "Étude contrastive du verbe suédois "få" dans un corpus parallèle suédois-français /." Göteborg : Acta Universitatis Gothoburgensis, 2008. http://catalogue.bnf.fr/ark:/12148/cb41372155m.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Oliveira, Joacyr Tupinambás de. "A Linguística de Corpus na formação do tradutor: compilação e proposta de análise de um corpus paralelo de aprendizes de tradução." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/8/8147/tde-26052015-104749/.

Full text

Abstract:

Os estudos sobre o ensino da tradução no Brasil ainda oferecem muito espaço para discussões. Valendo-se disso, este trabalho traz como um de seus objetivos uma breve reflexão sobre a sala de aula e sugere um possível método de ensino de tradução baseado na análise de material produzido por tradutores-aprendizes. A intenção é que, por meio da Linguística de Corpus, consigamos observar o processo de construção do texto de chegada pela ótica do aluno, nos mesmos moldes que o fazemos ao analisar material produzido por aprendizes de idiomas. Para tanto, compilamos um corpus de aprendizes de tradução, constituído por oito textos originais e cerca de 100 traduções para cada um deles. Alinhar tantas traduções referentes a um original de modo a permitir análises não foi tarefa fácil. A estratégia empregada para superar tal dificuldade foi o desenvolvimento de uma metodologia específica de alinhamento tendo como ferramenta planilhas eletrônicas. Tal metodologia tornou-se o foco central desta pesquisa. A utilização de fórmulas para a manipulação de dados textuais na planilha eletrônica resultou em um corpus alinhado, com todos os textos de partida e suas referidas traduções com cabeçalhos e com todas as linhas etiquetadas. Esse procedimento possibilitou a organização de um corpus para ser analisado tanto no editor de planilhas eletrônicas quando em programas como AntConc e WordSmith Tools. Além disso, também apresentamos a planilha eletrônica como uma ferramenta didática para ser usada nas aulas de prática de tradução.
Studies on the teaching of translation in Brazil still offer room for discussions. Having that in mind, one of the goals of this research aims at fostering a brief reflection upon the classroom and proposes a teaching method based on the analyses of material produced by translation learners. We show that Corpus Linguistics can be used to analyze student translations in the same way we do when we analyze material produced by language learners. For that purpose, we compiled a corpus of translations produced by learners, consisting of eight source texts in English and about 800 translations into Portuguese, approximately 100 for each text. Aligning so many translations to their original texts to favor analyses was not a simple task. Such difficulties were overcome by the development of a methodology for alignment, which became the central focus of this research. By utilizing formulas to deal with textual data in spreadsheets resulted in an aligned corpus containing source texts and their referring translations with headers and all lines tagged. Such procedure allowed us to come up with a corpus to be analyzed in both the spreadsheet editor and in programs such as AntConc and WordSmith Tools. In addition to that, we also introduced the spreadsheets as a didactic tool to be used in translation practice classes.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Parallel corpus'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles