Dissertations / Theses on the topic 'Parallel corpus'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Parallel corpus.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Adesam, Yvonne. "The Multilingual Forest : Investigating High-quality Parallel Corpus Development." Doctoral thesis, Stockholms universitet, Institutionen för lingvistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-79076.
Full textI denna doktorsavhandling utforskas skapandet av parallella trädbanker. Dessa är språkliga data som består av texter och deras översättningar, som har märkts upp med syntaktisk information samt länkar mellan ord, fraser och meningar som motsvarar varandra i översättningarna. Vi beskriver den delvis manuella uppmärkningen av den parallella trädbanken SMULTRON, med 1.000 engelska, tyska och svenska meningar. Denna beskrivning är utgångspunkt för att besvara den första av två frågor i avhandlingen. Vilka frågor måste beaktas för att skapa en högkvalitativ parallell trädbank? De enheter som märks upp samt valet av uppmärkningssystemet är viktiga för kvaliteten, och en viss andel automatisk bearbetning är nödvändig för att utöka storleken. Automatiska kvalitetskontroller och automatisk utvärdering är av vikt, men viss manuell granskning är nödvändig för att uppnå hög kvalitet. Vidare utforskar vi att använda information som finns i uppmärkningen, för att förbättra den automatiskt skapade uppmärkningen för ett annat språk. Detta leder oss till den andra av de två frågorna i avhandlingen. Kan vi förbättra automatisk uppmärkning genom att överföra information som finns i de andra språken? Experimenten visar att automatisk länkning som överförs från två språkpar, L1–L2 och L1–L3, till det tredje språkparet, L2–L3, får förbättrad precision, framför allt för skärningspunkten mellan den överförda länkningen och den automatiska länkningen. Vi skapar även en testsamling för experiment med överföring av uppmärkning för att lösa upp strukturella flertydigheter hos prepositionsfraser. Överföring enligt majoritetsprincipen förbättrar uppmärkningen, jämfört med den grundläggande automatiska uppmärkningen, men att använda språkliga ledtrådar för att korrigera uppmärkningen innan majoritetsöverföring är ännu bättre, om än mer arbetskrävande. Vissa felaktiga strukturer kan dock inte korrigeras med hjälp av överföring, eftersom de olika språken använder olika formuleringar, och därmed har olika strukturer.
Cho, Joon-Hyung. "Analyse textométrique des corpus parallèles francais-coréens." Thesis, Paris 3, 2010. http://www.theses.fr/2010PA030012.
Full textThe translational equivalences extracted from a parallel corpus become a valuable resource enable to study the various translational contexts between the two distinct languages. The use of translational texts is now a principal subject in the translation studies and the contrastive studies of languages. The textometry operate a set of statistical calculations on the textual units in a parallel corpus divided into the tokens. They provide the quantitative evidence that verify the translational relation of the linguistic units. In exploring bilingual words in the French-Korean parallel corpora, we verified the usefulness of this methodology applied to the French-Korean translational texts. They produced actually a positive result, on the one hand, and a negative result, on the other hand, throughout our work. Yet, these methods made also observe the various translational relations of textual units between French and Korean. The most automated methods devoted to the parallel corpora of heterogeneous language pairs have not produced the approvable result. For the reason, the textometry, which aims to observe the lexical elements of a corpus from a statistical point of view, would be very practical method when we deal with a parallel corpus that consists of different language pairs
Gao, Z. M. "Automatic extraction of translation equivalents from a parallel Chinese - English corpus." Thesis, University of Manchester, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488455.
Full textJilani, Aisha. "Parallel corpus multi stream question answering with applications to the Qu'ran." Thesis, University of Huddersfield, 2013. http://eprints.hud.ac.uk/id/eprint/23852/.
Full textRibas, Bruguer Marta. "Alineació de textos jurídics paral·lels (català-castellà): alguns problemes." Doctoral thesis, Universitat Pompeu Fabra, 2006. http://hdl.handle.net/10803/7502.
Full textPartint d'un corpus de textos jurisprudencials paral·lels catalans i castellans i utilitzant el programa ALINEA, fem un estudi descriptiu de detall sobre les diferències discursives entre els textos jurisprudencials catalans i castellans per tal de formalitzar el coneixement comparatiu del discurs jurídic (jurisprudencial) català i castellà. Establim una tipologia dels fenòmens lingüístics propis d'aquest discurs que poden generar alineacions insatisfactòries, n'estudiem les causes i fem una proposta de tractament lexicogràfic i d'estratègies complementàries (regles lingüístiques) per millorar els resultats de l'alineació d'aquest tipus de textos.
Recent development in alignment programs of bilingual corpora open horizons in studies about specialized texts. Its use let to contrast and to show discoursive differences between parallel specialized texts in different languages. This constitues a benefit in the treatment of comparative knowledge between one discourse and the other. Nevertheless, the formalization of this knowledge is a complex task and, so, the cases of noise in the results of the programs show it.
Considering a corpus of Catalan and Spanish jurisprudencial parallel texts and using the ALINEA program, we present a descriptive study of detail about the discoursive differences between Catalan and Spanish jurisprudencial texts in order to formalize the comparative knowledge of Catalan and Spanish legal (jurisprundencial) discourse. We set a typology of own linguistic phenomena about this type of discourse which can generate non satisfactory alignments, we study the causes of this and we make a proposal of lexicographic treatment and of supplementary strategies (linguistic rules) in order to improve the results of the alignment of this type of texts.
Silva, Carlos Eduardo da. "Developing online parallel corpus-based processing tools for translation research and pedagogy." reponame:Repositório Institucional da UFSC, 2013. https://repositorio.ufsc.br/xmlui/handle/123456789/130880.
Full textMade available in DSpace on 2015-03-18T20:46:23Z (GMT). No. of bitstreams: 1 332777.pdf: 8216934 bytes, checksum: d9c6b777d9c9b0f2a3212787858b8619 (MD5) Previous issue date: 2013
Abstract : This study describes the key steps in developing online parallel corpus-based tools for processing COPA-TRAD (copa-trad.ufsc.br), a parallel corpus compiled for translation research and pedagogy. The study draws on Fernandes s (2009) proposal for corpus compilation, which divides the compiling process into three main parts: corpus design, corpus building and corpus processing. This compiling process received contributions from the good development practices of Software Engineering, especially the ones advocated by Pressman (2011). The tools developed can, for example, assist in the investigation of certain types of texts and translational practices related to certain linguistic patterns such as collocations and semantic prosody. As a result of these applications, COPA-TRAD becomes a suitable tool for the investigation of empirical phenomena with a view to translation research and pedagogy.
Este estudo descreve as principais etapas no desenvolvimento de ferramentas online com base em corpus para o processamento do COPA-TRAD (Corpus Paralelo de Tradução - www.copa-trad.ufsc.br), um corpus paralelo compilado para a pesquisa e ensino de tradução. Para a compilação do corpus, o estudo utiliza a proposta de Fernandes (2009) que divide o processo de compilação em três etapas principais: desenho do corpus, construção do corpus e processamento do corpus. Este processo de compilação recebeu contribuições das boas práticas de desenvolvimento fornecidas pela Engenharia de Software, especialmente as que foram sugeridas por Pressman (2011). As ferramentas desenvolvidas podem, por exemplo, auxiliar na investigação de certos tipos de textos, bem como em práticas tradutórias relacionadas a certos padrões linguísticos tais como colocações e prosódia semântica. Como resultado dessas aplicações, o COPA-TRAD configura-se em uma ferramenta útil para a investigação empírica de fenômenos tradutórios com vistas à pesquisa e ao ensino de tradução.
Piao, Scott. "Sentence and word alignment between Chinese and English." Thesis, Lancaster University, 2000. http://eprints.lancs.ac.uk/52143/.
Full textAl-Qaisi, Fu'ad. "Apport de la linguistique de corpus à la lexicographie bilingue (français-arabe) : macrostructure et microstructure d'un dictionnaire de collocations." Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO20115.
Full textThe aim of this study is to examine the contribution of corpus linguistics to bilingual French-Arabic lexicography. We particularly focus on collocations, as our research begins with the compilation of a bilingual corpus leading up to the integration of collocations in the lexicon. Fundamentals such as corpus linguistics, corpora and collocation are examined. Our research then takes an empirical turn that is based on the use of our corpus. To overcome the unavailability of corpus processing tools in Arabic, an approach was developed in this study that we called the footbridge strategy. The idea is to start from a French-Arabic (translated) parallel corpus. This corpus consists of the French version of Le Monde Diplomatique, and its translation. Using a parallel corpus aims to facilitate the identification of contrastive phenomena. The results obtained in the translated corpus (in its Arabic component) will be subsequently checked in an Arabic monolingual corpus. The latter is a corpus consisting of three newspapers: Alrai, Alayyam, Algouhouria. Throughout the exploitation of the corpus, results are compared first between corpora and dictionaries, secondly between corpus types (parallel and comparable), and thirdly between newspapers (Alrai, Alayyam, Algouhouria). Then a number of collocations are subjected to semantic and structural review and consideration. This review process not only brings some clarifications on the environment of collocations between language and speech but also about a possible approach for their integration in the dictionary. Legitimate questions gradually arise regarding the resemblance of collocations in French and Arabic. The results highlight phenomena such as collocational chains (clusters), collocational synonyms, etc. The study culminates in the design of a computer dictionary of collocations, i.e. an active bilingual dictionary aimed at Arabic language specialists and translators
Abdulhay, Authoul. "Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00836764.
Full textTeixeira, Lílian Figueiró. "A semântica dos compostos nominais – um estudo de corpus paralelo inglês/português." Universidade do Vale do Rio do Sinos, 2009. http://www.repositorio.jesuita.org.br/handle/UNISINOS/2574.
Full textNenhuma
Os compostos nominais são construções produtivas em diversos idiomas, ou seja, novas combinações são facilmente criadas em contextos de uso da língua. No entanto, esse fenômeno linguístico é idiossincrático, fato que torna o seu estudo um desafio para a linguística e para as investigações que se ocupam do Processamento da Linguagem Natural. Neste trabalho, é feita uma investigação sobre a forma como os elementos constituintes dos compostos nominais em inglês formados por dois substantivos (compostos NN) se relacionam semanticamente e quais as características dos seus correspondentes de tradução em língua portuguesa encontrados em dez edições da revista National Geographic. O objetivo desta investigação é identificar as relações mais frequentes no corpus a fim de que se possa propor uma tipologia que expresse a composicionalidade semântica dessas construções. Para alcançar esse fim, o trabalho está dividido em três etapas. A primeira etapa consiste em apresentar os pressupostos teóricos adotados no trabalho. P
Noun compounds are productive constructions in many languages. However, they are idiosyncratic, fact that makes the study of this linguistic phenomenon a challenge for the linguistics and for the Natural Language researches. The purpose of this paper is to study the semantics of the noun compounds formed by two nouns (NN compounds). It is also intended to identify the translation equivalents in Portuguese found in ten editions of the National Geographic Magazine. The final product is a proposal of typology which expresses the compositionality of the NN compounds according to the data found in the corpus. This paper has three distinctive parts, where the following subjects are introduced: the theoretical bases for this paper; the methodological resources from Corpus Linguistics that were adopted; the analysis and discussion about the data. Concepts about the semantics of nominal compounds as productivity, semantic transparency, headness, lexicalization and nominalization are commented. Two theories were used f
NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.
Full textCorpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.
NOSEDA, VALENTINA. "CORPORA PARALLELI E LINGUISTICA CONTRASTIVA: AMPLIAMENTO E APPLICAZIONI DEL CORPUS ITALIANO - RUSSO NEL NACIONAL'NYJ KORPUS RUSSKOGO JAZYKA." Doctoral thesis, Università Cattolica del Sacro Cuore, 2017. http://hdl.handle.net/10280/24613.
Full textCorpus Linguistics - which exploits electronic annotated corpora in the study of languages - is a widespread and consolidated approach. In particular, parallel corpora, where texts in a language are aligned with their translation in a second language, are an extremely useful tool in contrastive analysis. The lack of good parallel corpora for the languages of our interest - Russian and Italian - has led us to work for improving the Italian-Russian parallel corpus available as a pilot corpus in the Russian National Corpus. Therefore, this work had a twofold aim: practical and theoretical. On the one hand, after studying the essential issues for designing a high-quality corpus, all the criteria for expanding the corpus were established and the number of texts was increased, allowing the Italian-Russian parallel corpus, which counted 700.000 words, to reach more than 4 million words. As a result, it is now possible to conduct scientifically valid research based on this corpus. On the other hand, three corpus-based analyses were proposed in order to highlight the potential of the corpus: the study of prefixed Russian memory verbs and their translation into Italian; the comparison between the Italian analytic causative "fare + infinitive" and Russian causative verbs; The comparative analysis of fifteen Italian versions of The Overcoat by N. Gogol'. These analyses first of all allowed to advance some methodological remarks considering a further enlargement and improvement of the Italian-Russian parallel corpus. Secondly, the corpus-based approach has proved to be useful in deepening the study of these topics from a theoretical point of view.
Knobloch, Nina. "The encoding of bad and evil : A cross-linguistic study using a parallel Bible corpus." Thesis, Stockholms universitet, Institutionen för lingvistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-196656.
Full textI denna studie undersöks den tvärspråkliga kodningen av uttryck med dålig och ond. Probabilistiska semantiska kartor har skapats med hjälp av Multi-Dimensional scaling genom att använda parallel data från Bibelkorpusen som består av 30 översättningar av Nya Testamentet. Förekomsten av eventuell morfologisk och syntaktisk negation inom domänen har tillägnats särskild uppmärksamhet. Resultaten visar att de flesta språken antingen har ett bredare uttryck som används inom hela domänen, eller har minst två uttryck varav ett är bredare, dvs används för dåliga tillstånd, handlingar eller karaktärsdrag, och det andra är mer begränsad, dvs används endast för de mest onda handlingar och karaktärer som kräver en moralisk agent. Språk med flera uttryck varierar mycktet i hur breda eller begränsade uttrycken är. En representation av den semantiska domänen som en skala föreslås därför, snarare än att dela uppdomänen i diskreta semantiska kategorier. I de språken där negation förekom inom domänen fanns det endast i de bredare uttrycken.
Bouamor, Dhouha. "Constitution de ressources linguistiques multilingues à partir de corpus de textes parallèles et comparables." Phd thesis, Université Paris Sud - Paris XI, 2014. http://tel.archives-ouvertes.fr/tel-00994222.
Full textZennaki, Othman. "Construction automatique d'outils et de ressources linguistiques à partir de corpus parallèles." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM006/document.
Full textThis thesis focuses on the automatic construction of linguistic tools and resources for analyzing texts of low-resource languages. We propose an approach using Recurrent Neural Networks (RNN) and requiring only a parallel or multi-parallel corpus between a well-resourced language and one or more low-resource languages. This parallel or multi-parallel corpus is used to construct a multilingual representation of words of the source and target languages. We used this multilingual representation to train our neural models and we investigated both uni and bidirectional RNN models. We also proposed a method to include external information (for instance, low-level information from Part-Of-Speech tags) in the RNN to train higher level taggers (for instance, SuperSenses taggers and Syntactic dependency parsers). We demonstrated the validity and genericity of our approach on several languages and we conducted experiments on various NLP tasks: Part-Of-Speech tagging, SuperSenses tagging and Dependency parsing. The obtained results are very satisfactory. Our approach has the following characteristics and advantages: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages)
Neifar, Wafa. "Méthodes d'acquisition terminologique en arabe : Application au domaine médical." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS085/document.
Full textThe goal of this thesis is to reduce the lack of available resources and NLP tools for Arabic language in specialised domains by proposing methods allowing the extraction of terms from texts in Modern Standard Arabic. In this context, we first constructed an English-Arabic parallel corous in a specific domain.It is a set of medical texts produced by the US National Library of Medicine (NLM). Thereafter, we have proposed terminological acquisition methods, toextract terms or acquire relations between these terms, for Arabic based on: i) the adaptation of an existing terminology extractor for French or English, ii) the transliteration of English terms in Arabic characters and iii) cross-lingual transfer. Applied at the terminological level, transfer aims to implement a process of term extraction or relationship acquisition between terms in the texts of a source language (here, French or English) and then to transfer the extracted information to target language texts (in this case, Modern Standard Arabic), thereby identifying the same type of terminologicalinformation. We have evaluated the monolingual and bilingual term lists that we have obtained by the experiments we carried out, according to a transparent, direct and semi-automatic method: the extracted term candidates are confronted with a reference terminology before being validated manually. This evaluation follows a protocol that we proposed
Wang, Lixum. "The use of parallel texts in language learning : computer software and teaching materials for English and Chinese." Thesis, University of Birmingham, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368990.
Full textTeixeira, Luiz Gustavo [UNESP]. "Colocações criativas presentes no corpus literário paralelo Memórias póstumas de Brás Cubas sob a perspectiva de um novo olhar." Universidade Estadual Paulista (UNESP), 2016. http://hdl.handle.net/11449/143885.
Full textApproved for entry into archive by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br) on 2016-09-14T20:13:10Z (GMT) No. of bitstreams: 1 teixeira_lg_me_sjrp.pdf: 1490041 bytes, checksum: c24052ade68282564ca0f8ec02c73aa8 (MD5)
Made available in DSpace on 2016-09-14T20:13:10Z (GMT). No. of bitstreams: 1 teixeira_lg_me_sjrp.pdf: 1490041 bytes, checksum: c24052ade68282564ca0f8ec02c73aa8 (MD5) Previous issue date: 2016-07-29
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
O presente trabalho tem como objetivo a análise das traduções das colocações criativas presentes em um corpus literário paralelo, constituído pela obra originalmente escrita em português, Memórias Póstumas de Brás Cubas (TO), de Machado de Assis e de suas três versões para língua inglesa: Epitaph of a Small Winner (TT¹), de Grossman (1953), Posthumous Reminiscences of Braz Cubas (TT²), de Ellis (1955) e, The Posthumous Memoirs of Brás Cubas (TT³), de Rabassa (1997). Como fundamentação teórica e metodológica, apoiamo-nos nos pressupostos teóricos da Linguística de Corpus e de sua interface com os Estudos da Tradução Baseados em Corpus e a Literatura, no conceito de colocações criativas, bem como nos estudos machadianos, de Bosi (1999, 2006) e Schwarz (1990), mostrando como o olhar do defunto autor é retratado pelos olhos dos personagens, nas passagens selecionadas. Para o levantamento das palavras de maior índice de chavicidade, utilizamos o programa WordSmith Tools (SCOTT, 2012), o qual nos possibilitou realizar uma análise mais abrangente e dinâmica dos dados. Como corpora de referências em inglês e português, usamos respectivamente o Brown Corpus e o corpus Lácio-Ref. O levantamento das palavras-chave apontou a significativa chavicidade dos nódulos “olhos”, no texto original (TO) e de eyes, nos textos traduzidos (TT¹, TT², TT³), a partir dos quais extraímos e analisamos as colocações criativas relacionadas aos referidos nódulos. Tanto o levantamento das palavras-chave quanto a análise das traduções das colocações criativas, nas passagens selecionadas, mostram-nos que, apesar de todos os tradutores repetirem as traduções de algumas colocações criativas, Grossman (TT¹) as repete com mais frequência e, portanto, não explora a criatividade presente no estilo machadiano. A análise também nos sugere que em determinadas passagens, os tradutores não absorvem o sentido das colocações criativas originalmente empregadas, revelando a dificuldade de tradução do estilo machadiano.
This study aims to analyze the creative collocations in a literary parallel corpus comprised of the original text in Portuguese Memórias Póstumas de Brás Cubas, by Machado de Assis (1891), and its three translations into English Epitaph of a Small Winner (TT¹), by Grossman (1953), Posthumous Reminiscences of Braz Cubas (TT²), by Ellis(1955) and, The Posthumous Memoirs of Brás Cubas (TT³), by Rabassa (1997). The theoretical and methodological approach was based on Corpus Linguistics and its relations with Corpus-based Translation Studies and Literature, on the study of creative collocations, and some literary concepts from Alfredo Bosi (1999, 2006) and Schwarz (1990), trying to show the implications of the dead Brás Cubas‘ looks on the characters in the selected fragments. In order to extract the most significant key words, we used the computer program WordSmith Tools (SCOTT, 2012) which allowed us to accomplish a broader analysis of data. As reference corpora we used the Brown Corpus in English and the Lacio-Ref corpus in Portuguese. The extraction of the keywords has shown a significant keyness value of the nodes ―olhos‖ in the original text (TO) and eyes in translated texts (TT¹, TT², TT³) and thus the creative collocations related to these nodes were analyzed. Both the extraction of keywords and the analysis of the translations of creative collocations, in the selected fragments, show us that in spite of the translators repeating the translation of some creative collocations, Grossman (TT¹) did it more frequently, and did not explored the creativity that Machado‘s writing entails. The analysis also suggests that in some fragments the translators do not render the very sense of the collocations, revealing how difficult the task of translating Machado‘s style is.
Granlund, Ann-Louise. "Comparing Emotional Intensity Between Languages: A parallel corpus Investigation on the Swedish word Njuta and its English equivalents." Thesis, Malmö högskola, Institutionen för globala politiska studier (GPS), 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-22490.
Full textThis paper seeks to investigate the emotional semantic differences between the Swedish word "njuta" and its English equivalents. As a Swede, when attempting to describe the word "njuta", the first natural description is to have feelings of lust, or to experience something with passion. The most common translation of the word into English is "enjoy" , and the first natural description of this word is for me to like something, or to find pleasure in it. The words that I have chosen to investigate have a wider meaning apart from simply experiencing feelings of pleasure to different degrees. They are also used in connection with having something, possessing, valuing, or consuming something . By an English-Swedish Parallel Corpus investigation I will try to show the variety of semantic definitions of usage of the word. The aim and scope of this paper is to demonstrate, in accordance with my hypothesis, how the English equivalents of the Swedish word njuta carry less emotional value, and that the Swedish word is more intense and semantically stronger than the English enjoy.
Finnveden, Gustav. "Finding case through personal names in parallel texts." Thesis, Stockholms universitet, Institutionen för lingvistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-174831.
Full textDenna studies syfte är att utvärdera om ’formrikedomen’ hos personnamnslexem är ett fungerande indirekt sätt att undersöka språks kasussystem. Parallella texter användes för att namnen hitta personnamn och gruppera dem efter lexem i över ett tusen språk. För den delmängd av språken där data om deras kasussystem fanns tillgänglig så jämfördes denna med grupperingarna. Resultaten indikerar att det maximala antalet ordformstyper som ett namnlemma observerades i är ett användbart verktyg för att hitta språk som använder kasus, men bara för en delmängd av testade språk. Det var däremot sämre på att hitta språk som inte använder kasus. En entropiuppskattning som var baserat på antalet ordformstyper ett personnamnslemma hittades med och antalet förekomster av dessa ordformstyper användes. Det var en okej indikator för antalet kasuskategorier, dock med något bristande träffsäkerhet. Personnamnsmarkeringar på språk utan kasus undersöktes. De funna typerna av markeringar var pragmatiska, kasuslika, och grammatiska icke-kasus. Två språk med kasus, men med få personnamns, undersöktes. De använder inte kasusmarkering på personnamn, men på sina substantiv, vilket bröt mot en hypotetisk generalisering som denna studie baserades på: Att inga språk har kasusmarkeringar endast på personnamn eller endast på substantiv.
Mörn, Anna. "The Modal Auxiliaries Can and Could - A contrastive investigation of the modal auxiliaries can and could in descriptions in materials aimed for English tuition and the English-Swedish Parallel Corpus." Thesis, Halmstad University, School of Humanities (HUM), 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-2515.
Full textThe two modal auxiliaries can and could are investigated in this essay. Focus is on the correspondence between descriptions in grammar books and real-life data.
First four English learner grammar books aimed for Swedish high-schools were analyzed. The uses and translations of can and could found in the grammar books were then compared to real-life examples from an English-Swedish parallel corpus.
It was found that three of the grammar books categorize the uses of can and could according to ability, possibility and permission in quite general terms and these uses correlated to the majority of the corpus examples. The forth book did not mention the possibility use and stated very specific uses of the modal auxiliaries. This grammar book did not correspond to the corpus data to the same extent as the other three grammars.
It could be concluded that the assumptions made about use correlated to a greater extent with the corpus than the assumptions made about translations.
MATSUBARA, Shigeki, and Yoshihide KATO. "Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar." Institute of Electronics, Information and Communication Engineers, 2010. http://hdl.handle.net/2237/15002.
Full textLecuit, Émeline. "Les tribulations d'un nom propre en traduction : étude contrastive du nom propre et de sa traduction à partir d'un corpus aligné de dix langues européennes." Thesis, Tours, 2012. http://www.theses.fr/2012TOUR2017/document.
Full textProper names are omnipresent and have long held the interest of both philosophers and linguists.Our work, divided into four parts, presents, from a contrastive perspective, the behaviour of proper names in translation.The first two parts are theoretical. Firstly, we give a general presentation of what is a proper name from the point of view of both English and French linguistics. Secondly, we introduce the different translation processes proper nouns can undergo.The last two parts are experimental. We begin by explaining the different phases in the process of constitution of our aligned and annotated multilingual parallel corpus, composed of eleven versions of Jules Verne’s novel, Le Tour du monde en quatre-vingts jours, in ten European languages. We then present the results obtained from the observation of proper names behaviour in translation.These results often contradict the widespread idea regarding proper names untranslatability
Mydliar, Ján. "Překladač z češtiny do slovenštiny." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236398.
Full textDo, Thi Ngoc Diep. "Extraction de corpus parallèle pour la traduction automatique depuis et vers une langue peu dotée." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00680046.
Full textJulin, Hanna. "“What you NEED to know”, “Was man wissen muss” and “Vad man behöver veta” : A contrastive corpus study of NEED to and its German and Swedish correspondences in non-fiction." Thesis, Linnéuniversitetet, Institutionen för språk (SPR), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-91202.
Full textFernández, Quiroz Ariel Marcelo. "Análise da perda de comicidade na tradução de piadas do seriado "El Chavo del 8" em um corpus paralelo da sua dublagem do espanhol do México para o português do Brasil /." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154495.
Full textApproved for entry into archive by Paula Torres Monteiro da Torres (paulatms@sjrp.unesp.br) on 2018-07-12T12:19:59Z (GMT) No. of bitstreams: 1 quiroz_amf_me_sjrp.pdf: 5074427 bytes, checksum: 67e509c12de12d41d7b1c49b9590090d (MD5)
Made available in DSpace on 2018-07-12T12:19:59Z (GMT). No. of bitstreams: 1 quiroz_amf_me_sjrp.pdf: 5074427 bytes, checksum: 67e509c12de12d41d7b1c49b9590090d (MD5) Previous issue date: 2018-04-20
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
O principal problema da maioria das dublagens de produtos audiovisuais humorísticos são os laugh tracks (sons artificiais de um público rindo), já que toda vez que há trilha sonora de risadas, estas devem coincidir com uma piada para não causar estranheza no público-alvo. Neste trabalho analisaremos, por meio de um corpus paralelo, os problemas de tradução presentes na dublagem de um desses produtos: o seriado “El Chavo del 8” (“Chaves” no Brasil) do espanhol do México para o português do Brasil, com base nas teorias de dublagem fundamentadas por Hurtado Albir (1996); de humor, fundamentadas por Raskin (1987), Bergson (1983), Posada (1995) entre outros; e de técnicas de tradução propostas por Hurtado Albir (2001). Apresentamos uma análise realizada em três etapas: na primeira, criamos um quadro com as minutagens das piadas para cada um dos 18 episódios analisados e uma seção “houve/não houve piada”; na segunda, 12 participantes responderam se houve piada ou não em cada trecho selecionado; finalmente, na terceira etapa, criamos quadros para cada piada nas quais os participantes determinaram que não houve piada e explicamos o motivo dessa perda. Com base na definição dos problemas e nas técnicas de tradução, pretende-se apresentar as possíveis soluções que os tradutores audiovisuais teriam para traduzir as piadas em caso de perda de comicidade.
The main problem with dubbing translation in most humorous audiovisual products is the laugh track, since every time there is a laugh track it must match a joke not to cause any strangeness in the target audience. In this research, we will analyze, through a parallel corpus, the translation problems in the dubbing of the series "El Chavo del 8" ("Chaves" in Brazil) from Mexican Spanish to Brazilian Portuguese, based on theories of audiovisual translation by Hurtado Albir (1996), Humor by Raskin (1987), Bergson (1983) and Posada (1995), and translation strategies by Hurtado Albir (2001).We show an analysis performed in three stages: in the first one, we created tables with the minutes of the jokes in 21 episodes and a “yes / no” joke section; in the second one, 14 participants answered whether or not there was a joke in each selected section; finally, in the third one, we created tables for each joke in which participants determined if there was no joke. Based on the definition of the problems and translation strategies, we intended to offer possible solutions for the audiovisual translators when dealing with jokes.
CNPq:190394/2015-3
Miao, Jun. "Approches textométriques de la notion de style du traducteur : Analyses d'un corpus parallèle Français-Chinois : Jean-Christophe de Romain Rolland et ses trois traductions chinoises." Phd thesis, Université de la Sorbonne nouvelle - Paris III, 2012. http://tel.archives-ouvertes.fr/tel-00846619.
Full textYahiaoui, Abdelghani. "Conception et développement d'un outil d'aide à la traduction anglais/arabe basé sur des corpus parallèles." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE2042.
Full textWe create an innovative English/Arabic translation aid tool to meet the growing need for online translation tools centered on the Arabic language. This tool combines dictionaries appropriate to the specificities of the Arabic language and a bilingual concordancer derived from parallel corpora. Given its agglutinative and unvoweled nature, Arabic words require specific treatment. For this reason, and to construct our dictionary resources, we base on Buckwalter's morphological analyzer which, on the one hand, allows a morphological analysis taking into account the complex composition of the Arabic word (proclitic, prefix, stem, suffix, enclitic), and on the other hand, provides translational resources enabling rehabilitation in a translation system. Furthermore, this morphological analyzer is compatible with the approach defined around the DIINAR database (DIctionnaire Informatisé de l’Arabe - Computerized Dictionary for Arabic), which was constructed, among others, by members of our research team. In response to the contextual issue in translation, a bilingual concordancer was developed from parallel corpora. The latter represent a novel linguistic resource with multiple uses, in this case aid for translation. We therefore closely analyse these corpora, their alignment methods, and we proposed a mixed approach that significantly improves the quality of sub-sentential alignment of English-Arabic corpora. Several technologies have been used for the implementation of this translation aid tool which have been made available online (tarjamaan.com) and which allow the user to search the translation of millions of words and expressions while visualizing their original contexts. An evaluation of this tool has been made with a view to its optimization and its enlargement to support other language pairs
Shen, Lionel. "Méthodes de veille textométrique multilingue appliquées à des corpus de l’environnement et de l’énergie : « Restitution, prévision et anticipation d’événements par poly-résonances croisées »." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCA085/document.
Full textThis thesis proposes a series of textometric multilingual information monitoring methods applied to thematic corpora (textometry is also called textual statistics or text data analysis). Two types of corpora are mobilized to create this work: a comparable corpus and a parallel corpus in which the textual data are extracted from the press and discourse of NGOs. The information source was retrieved from three countries in three different languages: English, French and Chinese. The two corpora were constructed on two topical issues concerning the environment and energy, with a focus on three concepts: energy, nuclear power and the EPR (European Pressurized Reactor or Evolutionary Power Reactor). After a brief review of the state of the art on business intelligence, information monitoring and textometry, we first set out the two chosen subjects – the environment and energy – and then the morphosyntactic features of the three languages in national and international contexts. The overall characteristics, similarities and peculiarities of these corpora are highlighted successively. The recounts and qualitative and quantitative analyses of the results were carried out using textometric tools, including factor analysis of correspondences, co-occurrences and polyco-occurrential networks, specificities of the hypergeometric model and repeated segments or map sections. Thereafter, bilingual bitextual information monitoring was applied to the same three concepts with the aim of elucidating how the comparable corpus and the parallel corpus can mutually help each other in a process of multilingual information monitoring, by restitution, forecasting and anticipation. We conclude our research by offering an analytical method called Objects-Features-Opening (OFO)
Znotina, Inga. "Parodomieji įvardžiai lietuvių – latvių lygiagrečiajame tekstyne." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120626_145312-68508.
Full textLithuanian and Latvian demonstrative pronouns are the object of this paper. The aim is to identify their relations in Lithuanian – Latvian translations and the problems they may cause to the translators. The hypothesis is as follows: Lithuanian demonstrative pronouns in Lithuanian – Latvian translations are mostly but not always replaced by Latvian demonstrative pronouns. This research is based on the Lithuanian – Latvian parallel corpus which is now being prepared in two partner universities: Vytautas Magnus University and University of Latvia. In this corpus translations of Lithuanian texts into Latvian are being collected. Concordances of Lithuanian demonstrative pronouns are extracted from this corpus using concordancer ParaConc. It is studied how these pronouns are translated into Latvian. The paper consists of five chapters. The first one is introduction where the aim and tasks are shortly described. The second chapter presents demonstrative pronouns, the definitions of this group and the words that belong to it in Lithuanian and Latvian languages. The third chapter describes methodology and the corpus used in this research. In the fourth chapter analysis of the corpus data is performed. Translation of Lithuanian demonstrative pronouns as Latvian demonstrative pronouns; translation of Lithuanian demonstrative pronouns as other Latvian pronouns or other word classes; and discarding of Lithuanian demonstrative pronouns are discussed. Conclusions and some recommendations... [to full text]
Musil, Jakub. "Automatická tvorba slovníků z překladových textů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237245.
Full textKouřil, Jan. "Paralelní korpusový manažer." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236928.
Full textPhan, Thi Thanh Thao. "Machine translation of proper names from english and french into vietnamese : an error analysis and some proposed solutions." Thesis, Besançon, 2014. http://www.theses.fr/2014BESA1002/document.
Full textMachine translation (MT) has increasingly become an indispensable tool for decoding themeaning of a text from a source language into a target language in our current information and knowledgeera. In particular, MT of proper names (PN) plays a crucial role in providing the specific and preciseidentification of persons, places, organizations, and artefacts through the languages. Despite a largenumber of studies and significant achievements of named entity recognition in the NLP communityaround the world, there has been almost no research on PNMT for Vietnamese language. Due to the different features of PN writing, transliteration or transcription and translation from a variety of languages including English, French, Russian, Chinese, etc. into Vietnamese, the PNMT from those languages into Vietnamese is still challenging and problematic issue. This study focuses on theproblems of English-Vietnamese and French-Vietnamese PNMT arising from current MT engines. First,it proposes a corpus-based PN classification, then a detailed PNMT error analysis to conclude with somepre-processing solutions in order to improve the MT quality. Through the analysis and classification of PNMT errors from the two English-Vietnamese and French-Vietnamese parallel corpora of texts with PNs, we propose solutions concerning two major issues:(1)corpus annotation for preparing the pre-processing databases, and (2)design of the pre-processingprogram to be used on annotated corpora to reduce the PNMT errors and enhance the quality of MTsystems, including Google, Vietgle, Bing and EVTran. The efficacy of different annotation methods of English and French corpora of PNs and the results of PNMT errors before and after using the pre-processing program on the two annotated corporaare compared and discussed in this study. They prove that the pre-processing solution reducessignificantly PNMT errors and contributes to the improvement of the MT systems’ for Vietnameselanguage
Cuofano, Letizia. "As equivalências no português e no italiano de verbos suecos com prefixos de origem germânica num corpus paralelo de textos escritos." Thesis, Stockholms universitet, Avdelningen för portugisiska, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-64270.
Full textGermanic prefixes of which some Swedish verbs are composed are going to be compared in acontrastive analysis with their relative equivalences in Portuguese and Italian in a parallel written corpus characterized by a Swedish-language romance, a Portuguese-language romance and an Italian language romance, and by their relative translations. The functions executed by the German prefixes of the analysed Swedish verbs are going to be examined and then compared with their relative equivalences, with the result that even in the Romance languages it is possible to find in a quite constant way grammatical processes which are similar to those executed by the Germanic prefixes.
I prefissi germanici di alcuni verbi svedesi saranno comparati in un'analisi contrastiva con le relative equivalenze in portoghese e in italiano in un corpus parallelo scritto composto da un romanzo di lingua svedese, uno di lingua portoghese e uno di lingua italiana e dalle rispettive traduzioni. Le funzioni svolte dai prefissi germanici dei verbi svedesi analizzati saranno esaminate e poi confrontate con le relative equivalenze, con il risultato che anche nelle due lingue romanze si riscontrano in maniera abbastanza costante processi grammaticali simili a quelli svolti dai prefissi germanici.
De germanska prefix som återfinns i vissa svenska verb kommer att jämföras med sina motsvarigheter på portugisiska och italienska. Detta görs med hjälp av en skriven korpus bestående av en roman ursprungligen skriven på svenska, en skriven på portugisiska och en skriven på italienska samt översättningar av dessa romaner till de två andra språken. Funktionen hos de svenska verben med germanska prefix kommer att analyseras och sedan jämföras med verbens motsvarigheter. Resultatet av analysen visar att det är möjligt att finna systematiskt återkommande grammatiska processer i de romanska språken, som liknar de som förekommer i samband med de germanska prefixen på svenska.
Dilbaitė, Indrė. "Konceptualiųjų metaforų vertimas lygiagrečiajame anglų-lietuvių kalbų ES dokumentų tekstyne." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2010~D_20100617_112544-87348.
Full textThis research is based on conceptual metaphors that were manually extracted from the English-Lithuanian corpus of European Union documents, the translation was analyzed. Normally it is uncommon to find many figures of speech in official texts, but metaphor is a component of all languages. The conceptual metaphor is naturalized in most uncommon areas without being noticed unless specifically investigated. Conception of metaphor and its types are defined in this work. Conceptual metaphor is presented, as well as possible classifications – by conventionality, by cognitive function they perform (classified into structural, ontological and orientational metaphors) as well as generality of metaphor. After presenting the identification criteria and methods, conceptual metaphors were extracted from the frequency lists of two-word and three-word combinations. Each conceptual metaphor was analyzed and classified as structural, ontological or orientational in accordance with the functions they perform. Translation of each metaphor was located in the English-Lithuanian parallel corpus of EU documents, in order to determine if the combinations retained their conceptuality; if it was obtained only in translation; if it vanished in translation. It was discovered that the majority of the most frequently used English and Lithuanian conceptual metaphors remained in translation, 61% and 69% respectively. The most typical, most frequently used conceptual metaphors are translated in the parallel... [to full text]
Dalunde, Tilda. "Minnen från en parallell framtid." Thesis, Konstfack, Ädellab/Metallformgivning, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:konstfack:diva-4695.
Full textWe live in a fragile everyday. We make it even more fragile by the way we live it. There is no point in saying it with words any more, I've already tried that so many times that people have stopped listening. Maybe objects are a better way to start a conversation. In this project, that consists of this thesis and the physical body of work "Memories from a parallel future", I've been investigating what happens to us when the everyday falls apart and chaos erupts. With a starting point in the climate-crisis of the year 536, that led to the death of almost half of the Norse population, I've been speculating what would have happened today. Or maybe that it is actually happening today. Depletion of resources always results in violence. We know this, but still we keep nibbling at the earth, a little chunk at a time. What do we plan to do when there is nothing left?
Bilder av verk av konstnärerna Iain Baxter&, Naoko Ito och Luiana Rondolini har tagits bort av upphovsrättsliga skäl. Titlarna på verken står dock kvar.
Vialla, Bastien. "Contributions à l'algèbre linéaire exacte sur corps finis et au chiffrement homomorphe." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS112.
Full textThis thesis is composed of two independent parts.The first one is related to homomorphic encryption and the second part deal with sparse linear algebra on finite fields.Homomorphic encryption extends traditional encryption in the sense that it becomes feasible to perform operations on ciphertexts, without the knowledge of the secret decryption key. As such, it enables someone to delegate heavy computations on his sensitive data to an untrusted third party, in a secure way. More precisely, with such a system, one user can encrypt his sensitive data such that the third party can evaluate a function on the encrypted data, without learning any information on the underlying plain data. Getting back the encrypted result, the user can use his secret key to decrypt it and obtain, in clear, the result of the evaluation of the function on his sensitive plain data. For a cloud user, the applications are numerous, and reconcile both a rich user experience and a strong privacy protection.The first fully homomorphic encryption (FHE) scheme, able to handle an arbitrary number of additions and multiplications on ciphertexts, has been proposed by Gentry in 2009.In homomorphic encryption schemes, the executed function is typically represented as an arithmetic circuit. In practice, any circuit can be described as a set of successive operation gates, each one being either a sum or a product performed over some ring.In Gentry’s construction, based on lattices, each ciphertext is associated with some noise, which grows at each operation (addition or multiplication) done throughout the evaluation of the function. When this noise reaches a certain limit, decryption is not possible anymore.To overcome this limitation, closely related to the number of operations that the HE.Eval procedure can handle, Gentry proposed in a technique of noise refreshment called“bootstrapping”.The main idea behind this bootstrapping procedure is to homomorphically run the decryptionprocedure of the scheme on the ciphertext, using an encrypted version of the secret key. In this context, our contribution is twofold. We first prove that the lmax-minimizing bootstrapping problem is APX-complete and NP-complete for lmax ≥ 3. We then propose a new method to determine the minimal number of bootstrappings needed for a given FHE scheme and a given circuit.We use linear programming to find the best outcome for our problem. The main advantage of our method over the previous one is that it is highly flexible and can be adapted for numerous types of homomorphic encryption schemes and circuits.Computing a kernel element of a matrix is a fundamental kernel in many computer algebra and cryptography algorithms. Especially, many applications produces matrices with many matrix elements equals to 0.Those matrices are named sparse matrices. Sparse linear algebra is fundamentally relying on iterative approaches such as Wiedemann or Lanczos. The main idea is to replace the direct manipulation of a sparse matrix with its Krylov subspace. In such approach, the cost is therefore dominated by the computation of the Krylov subspace, which is done by successive product of a matrix by a vector or a dense matrix.Modern processor unit characteristics (SIMD, multicores, caches hierarchy, ...) greatly influence algorithm design.In this context our work deal with the best approach to design efficient implementation of sparse matrix vector product for modern processors.We propose a new sparse matrix format dealing with the many +-1 matrix elements to improve performance.We propose a parallel implementation based on the work stealing paradigm that provide a good scaling on multicores architectures.We study the impact of SIMD instructions on sparse matrix operations.Finally, we provide a modular arithmetic implementation based on residue number system to deal with sparse matrix vector product over multiprecision finite fields
Samuelsson, Thomas. "The Russian Verbal Prefix v- and Circumfix v- -sja in Space : A Contrastive Study between Russian and Swedish." Thesis, Stockholms universitet, Slaviska språk, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-155815.
Full textThis present study investigates the Russian verbal prefix v(o)- and circumfix v(o)- -sja in the concrete physical space. The aim of the contrastive study is to explore and describe meanings. Bilingual data, extracted from a contemporary Russian-Swedish dictionary, is analysed by using Krongauz’s method. A list of meanings of the Russian verbal affixes is built by comparing similarities and differences between lexical meanings and morphosyntactic structures for the verbs in both languages. The result shows that the meanings of the affixes can be divided into the following categories: Spatial movements into an enclosed space, Spatial movements onto a delimited surface, Spatial movements towards a vicinity, Adhesion and Locations in physical space.
Phan, Thi Thanh Thao. "Machine translation of proper names from english and french into vietnamese : an error analysis and some proposed solutions." Electronic Thesis or Diss., Besançon, 2014. http://indexation.univ-fcomte.fr/nuxeo/site/esupversions/8ded02fb-eae4-4c01-8ded-ede048ac2a4d.
Full textMachine translation (MT) has increasingly become an indispensable tool for decoding themeaning of a text from a source language into a target language in our current information and knowledgeera. In particular, MT of proper names (PN) plays a crucial role in providing the specific and preciseidentification of persons, places, organizations, and artefacts through the languages. Despite a largenumber of studies and significant achievements of named entity recognition in the NLP communityaround the world, there has been almost no research on PNMT for Vietnamese language. Due to the different features of PN writing, transliteration or transcription and translation from a variety of languages including English, French, Russian, Chinese, etc. into Vietnamese, the PNMT from those languages into Vietnamese is still challenging and problematic issue. This study focuses on theproblems of English-Vietnamese and French-Vietnamese PNMT arising from current MT engines. First,it proposes a corpus-based PN classification, then a detailed PNMT error analysis to conclude with somepre-processing solutions in order to improve the MT quality. Through the analysis and classification of PNMT errors from the two English-Vietnamese and French-Vietnamese parallel corpora of texts with PNs, we propose solutions concerning two major issues:(1)corpus annotation for preparing the pre-processing databases, and (2)design of the pre-processingprogram to be used on annotated corpora to reduce the PNMT errors and enhance the quality of MTsystems, including Google, Vietgle, Bing and EVTran. The efficacy of different annotation methods of English and French corpora of PNs and the results of PNMT errors before and after using the pre-processing program on the two annotated corporaare compared and discussed in this study. They prove that the pre-processing solution reducessignificantly PNMT errors and contributes to the improvement of the MT systems’ for Vietnameselanguage
Mellquist, Simone. "Ryska gerundier i översättning till och från svenska : Implicita och explicita betydelser." Thesis, Stockholms universitet, Slaviska språk, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-153972.
Full textStankevičius, Kęstutis. "Lygiagrečių tekstynų kūrimo interaktyvios informacinės sistemos." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120723_105613-14222.
Full textThe purpose of this thesis is to review most currently used user interfaces that help people interact with computers and other equipment, and begin exploring new user interface paradigm, which allows humans to interact naturally with the computer. Furthermore, analyze the most widely used methods today for implementing web services, to find a solution how interactive information systems could communicate with each other without any restrictions to gain an overall result choosing the best way to store and display relevant data to the program simpler and more flexible way. Create an interactive parallel corpus development environment prototype for minimizing available errors, if they occur, from the generated parallel translation as easy as possible using as less human labor as possible. Using the prototype, perform a study that will show trends in the use of different interface input devices. The work consists of 8 parts: introduction, overview of user interfaces, user interface separation, web services analysis, XML databases, user interface development, conclusions and references. Thesis consists of: 48 pages of text without appendixes, 25 pictures and 2 tables. Two enclosures of the work are enclosed separately.
Chrétien, Benjamin. "Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT315/document.
Full textA humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control
Fernández, Sánchez Francesc. "El Folleto de cursos de idiomas para extranjeros: análisis contrastivo (alemán-español) por tipos de emisor y subtextos." Doctoral thesis, Universitat Pompeu Fabra, 2005. http://hdl.handle.net/10803/7581.
Full textThe intralinguistic and interlinguistic analysis of the macrostructure and the recurrent textual segments, as well as of the functions (persuasive, referential and directive) characterizing both the LCLF as a persuasive leaflet and its three subtexts does not confirm the hypothesis. It does reflect, however, that the directive and persuasive functions prevail respectively in the public and private sender leaflets, as well as in those belonging to the Spanish and German subcorpora.
Esta tesis se plantea el objetivo traductivamente relevante de dar cuenta de las convenciones del FCIE, vinculadas principalmente a las funciones persuasiva y directiva, analizando un corpus bilingüe de textos paralelos según el método de la textología contrastiva. Dichas convenciones se ven consideradas por tipos de emisor (público y privado) y subtextos (unidades constitutivas del texto funcional, semántica y formalmente definidas) a partir de la hipótesis de que diferirán más dependiendo del tipo de emisor que de la lengua.
El análisis intralingüístico e interlingüístico de la macroestructura y los segmentos textuales recurrentes, así como de las funciones (persuasiva, referencial y directiva) que caracterizan tanto el FCIE, en cuanto que folleto persuasivo, como sus tres subtextos no permite confirmar esa hipótesis. No obstante, sí evidencia cómo las funciones directiva y persuasiva priman respectivamente en los ejemplares de emisor público y privado, así como en los de los subcorpus español y alemán.
Orenha, Adriane [UNESP]. "Unidades fraseológicas especializadas: colocações e colocações estendidas em contratos sociais e estatutos sociais traduzidos no modo juramentado e não-juramentado." Universidade Estadual Paulista (UNESP), 2009. http://hdl.handle.net/11449/103524.
Full textEsta pesquisa visa realizar um estudo a respeito dos termos, colocações e colocações especializadas estendidas presentes em contratos sociais e estatutos sociais que representam os corpora de pesquisa. Nesta pesquisa, também observaremos as semelhanças e diferenças nos corpora de traduções jurídicas e juramentadas, no que concerne ao uso desses termos e padrões lexicais, assim como apontaremos aqueles que são mais frequentemente empregados em documentos do tipo contrato social e estatuto social. A investigação baseia-se na abordagem interdisciplinar dos Estudos da Tradução Baseados em Corpus, da Linguística de Corpus, da Fraseologia, de modo mais específico das colocações, das colocações especializadas e das unidades fraseológicas especializadas. A Terminologia, por meio de seus pressupostos teóricos, também traz sua contribuição para a pesquisa, assim como os trabalhos sobre a tradução juramentada. Uma das motivações que delineia este estudo reside no fato de a tradução juramentada ser considerada de grande relevância nas relações comerciais, sociais e jurídicas entre as nações. Para realizar este estudo, compilamos um corpus de estudo (CE1) constituído por contratos sociais e estatutos sociais traduzidos no modo juramentado, nas direções tradutórias inglês português e português inglês, extraídos de Livros de Registro de Traduções, pertencentes a tradutores juramentados credenciados pela Junta Comercial de dois Estados brasileiros; e um corpus de estudo (CE2) formado por documentos de mesma natureza traduzidos sem o processo de juramentação, nas mesmas direções tradutórias. Além destes corpora, construímos dois corpora comparáveis, formados pelos referidos documentos originalmente escritos em português e em inglês. Os resultados desta pesquisa mostraram várias semelhanças, no tocante aos termos empregados em documentos traduzidos...
This investigation aims at carrying out a study on terms, collocations and extended specialized collocations present in articles of incorporation/articles of organization/articles of association and bylaws that represent our research corpora. We will also observe similarities and differences in sworn and legal translation corpora, which concerns the use of such terms and lexical patterns, as well as point out the ones which are more frequently used in the focused documents. This research derives its theoretical and methodological sources from Corpus-Based Translation Studies, Corpus Linguistics, Phraseology, more specifically from collocations, specialized collocations and specialized phraseological units (SPUs). Terminology, from its theoretical standpoint, also offers its contribution to this study, as well as essays on sworn translation. One of the aspects that motivates this study is the fact that sworn translation is considered to be of great relevance to commercial, social and legal relations among nations. To conduct this research, we compiled a study corpus (CE1) composed of articles of incorporation/articles of organization/articles of association and bylaws submitted to the process of sworn translation in the English Portuguese and Portuguese English directions, excerpted from the Books of Sworn Translation Records, made available by five Brazilian sworn translators, duly sworn by the Board of Trade of two Brazilian States; a study corpus (CE2) made up of documents of the same nature not submitted to the process of sworn translation, in the same translation directions. Besides these corpora, we also built two comparable corpora formed by the referred documents originally written in Portuguese and in English. The results obtained in this research showed some similarities which refer to the terms used in documents submitted to the process of sworn translation... (Complete abstract click electronic access below)
Fawi, Fathi Hassan Ahmed <1982>. "Le variazioni terminologiche in un corpus giuridico parallelo italiano-arabo: studio linguistico-computazionale." Doctoral thesis, Università Ca' Foscari Venezia, 2016. http://hdl.handle.net/10579/10274.
Full textGiesta, Letícia Caporlíngua. "Tradução pedagógica e letramento acadêmico com o uso de corpus paralelo." reponame:Repositório Institucional da UFSC, 2014. https://repositorio.ufsc.br/xmlui/handle/123456789/129655.
Full textMade available in DSpace on 2015-02-05T21:20:52Z (GMT). No. of bitstreams: 1 329921.pdf: 6308350 bytes, checksum: 3938506d34195124b40a7cc4652fa59e (MD5) Previous issue date: 2014
Este estudo objetiva, com base em um corpus paralelo, analisar a tradução de padrões colocacionais frequentes da área de Física, com vistas a promover tradução pedagógica e auxiliar no letramento acadêmico de estudantes envolvidos com esta área. O corpus se constitui por 434 resumos de teses de doutorado da área de Física com seus respectivos abstracts, totalizando 868 textos. Foram analisados padrões colocacionais formados por quatro palavras em dois sistemas computacionais seguindo as três categorias utilizadas por Hyland (2008a): orientados pela pesquisa, orientados pelo texto e orientados pelo participante; e ações empregadas na tradução desses padrões colocacionais, com base nas estratégias de tradução sugeridas por Baker (1992). Argumenta-se que na tradução pedagógica através do corpus paralelo é possível promover o envolvimento reflexivo de docentes e discentes em práticas pedagógicas e linguísticas que busquem amenizar perspectivas divergentes no contato com os textos trabalhados, desenvolvendo atitudes que possam favorecer entendimentos no trato de práticas situadas abrangendo leitura e escrita, assim como, nas relações sociais no ensinar e aprender que venham a auxiliar no letramento acadêmico de estudantes de graduação. Os resultados revelam que os padrões colocacionais com quatro palavras utilizados nos resumos/abstracts do corpus analisado mostram uma tendência da área de refletir na linguagem acadêmica sua visão de ciência, apresentando 74% dos marcadores orientados pelo texto e raros marcadores orientados pelo participante; bem como, escolhas de tradução que mantiveram em sua maioria as funções linguísticas dos padrões colocacionais na língua-fonte e na língua-alvo. A utilização de diferentes estratégias na tradução permite reflexão acerca de tomadas de decisões dos autores dos textos. A análise dos dados e a discussão teórica favorecem a resposta ao objetivo deste estudo, assim como ao argumento de tese, instigando questionamentos sobre a linguagem acadêmica da área de Física para identificar aspectos culturais dessa comunidade e auxiliar no letramento acadêmico de discentes.
Abstract:The objective of this study is: to analyze, based on parallel corpus, the translation of frequent clusters in the Physics area in order to promote pedagogical translation and assist in the academic literacy of students involved in this area. The corpus is constituted by 434 doctoral dissertation abstracts in Physics in Portuguese with their respective translations to English, in a total of 868 texts. Forty-nine 4-word clusters were analyzed in two computational systems following the three categories suggested by Hyland (2008a): research-oriented, text-oriented and participant-oriented; as well as actions employed in the translation of these clusters based on the translation strategies suggested by Baker (1992). It is argued that in pedagogical translation through parallel corpus, it is possible to promote reflexive involvement of professors and students in pedagogical and linguistic practices that try to reduce divergent perspectives when in contact with texts. In these practices, the development of attitudes may create opportunities to better comprehend what they read in situated practices involving reading and writing, and also in social relations in teaching and learning that help the academic literacy of undergraduate students. The results reveal that 4-word clusters used in the abstracts analyzed in the corpus show a tendency of the area to reflect its science view in the academic language as 74% of the markers are text-oriented and the participant-oriented are rare. They also reveal that the translation choices have maintained in its majority the linguistic functions of the clusters in the source and target languages. The use of different translation strategies allows reflection towards decisions from authors of the texts. The data analysis and theoretical discussion provide elements to achieve the objective this study, instigating questions about academic language in the Physics area to identify cultural aspects of this community and assist students' academic literacy.
Svášek, Martin. "Définitions, élaboration et exploitation d'un corpus parallèle bidirectionnel français-tchèque tchèque français." Paris, INALCO, 2007. http://www.theses.fr/2007INAL0020.
Full textAt the beginning the concept of a parallel corpus is defined. French and Czech texts forming the parallel Fratchèque corpus come from literature; only texts after the year 1945 have been selected. Fratchèque is not marked up explicitly by XML tags because the tagging is not necessary for the proper functioning of the corpus manager ParaConc. The building-up of the corpus is thoroughly described following all steps and settings of the software used. The process starts with the optical character recognition program FineReader and, after checking the accuracy of numerical texts by using MS Word 2002, it goes on building up a corpus managed by ParaConc. The linguistic investigations of the thesis rely primarily on the realization of a parallel corpus. The main purpose is to tackle a phenomenon that is known in Czech as částice but has no direct equivalent in French. The most frequent terms used in the French approach are mots du discours and particules énonciatives. The existing descriptions suggest a close relationship between these words and the discourse. It is demonstrated on two Czech částice - přece, vždyt̕ and their variants - using huge Czech corpora (Analysis A) and Fratchèque (Analysis B). The study continues analysing systematically all kind of usage of vždyt̕, přece in order to present lexicographical description for a bilingual Czech-French dictionary. Through some exercices based on the results of the linguistic analysis it is shown how to use the bilingual corpus in teaching foreign languages. Finally, some issues concerning automatic evaluation of translation quality are discussed taking into account the work with částice
Ramnäs, Mårten. "Étude contrastive du verbe suédois "få" dans un corpus parallèle suédois-français /." Göteborg : Acta Universitatis Gothoburgensis, 2008. http://catalogue.bnf.fr/ark:/12148/cb41372155m.
Full textOliveira, Joacyr Tupinambás de. "A Linguística de Corpus na formação do tradutor: compilação e proposta de análise de um corpus paralelo de aprendizes de tradução." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/8/8147/tde-26052015-104749/.
Full textStudies on the teaching of translation in Brazil still offer room for discussions. Having that in mind, one of the goals of this research aims at fostering a brief reflection upon the classroom and proposes a teaching method based on the analyses of material produced by translation learners. We show that Corpus Linguistics can be used to analyze student translations in the same way we do when we analyze material produced by language learners. For that purpose, we compiled a corpus of translations produced by learners, consisting of eight source texts in English and about 800 translations into Portuguese, approximately 100 for each text. Aligning so many translations to their original texts to favor analyses was not a simple task. Such difficulties were overcome by the development of a methodology for alignment, which became the central focus of this research. By utilizing formulas to deal with textual data in spreadsheets resulted in an aligned corpus containing source texts and their referring translations with headers and all lines tagged. Such procedure allowed us to come up with a corpus to be analyzed in both the spreadsheet editor and in programs such as AntConc and WordSmith Tools. In addition to that, we also introduced the spreadsheets as a didactic tool to be used in translation practice classes.