To see the other types of publications on this topic, follow the link: Corpus parallelo.

Journal articles on the topic 'Corpus parallelo'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Corpus parallelo.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Biagini, Francesca, and Marco Mazzoleni. "I costrutti preconcessivi in italiano e in russo: uno studio sul “corpus” parallelo del NKRJa." Italica Belgradensia 2018, no. 1 (2018): 27–47. http://dx.doi.org/10.18485/italbg.2018.1.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Duran-Muñoz, Isabel, and Katia Peruzzo. "I testi turistici sulle aree naturali protette in italiano e spagnolo: Un compito semplice per il traduttore? // Tourist Texts about Protected Natural Areas in Italian and Spanish: A Simple Task for the Translator?" Ecozon@: European Journal of Literature, Culture and Environment 5, no. 1 (April 1, 2014): 65–83. http://dx.doi.org/10.37536/ecozona.2014.5.1.587.

Full text
Abstract:
Nella traduzione di testi che riguardano tematiche ambientali dallo spagnolo all’italiano o viceversa, il traduttore spesso incontra termini che a prima vista possono sembrare equivalenti assoluti, come parque regional in spagnolo e parco regionale in italiano. Tuttavia, questi termini a volte possono dare luogo a errori di traduzione o problemi di interpretazione del testo di partenza, essendo il prodotto della cultura di origine e quindi diversi dal punto di vista sociale, politico e amministrativo. I traduttori devono quindi essere consapevoli delle differenze per evitare i tranelli linguistici che possono portare ad errori di interpretazione e di resa del testo di partenza nella lingua di arrivo. In questo articolo si presenta uno studio condotto su un corpus parallelo spagnolo-italiano composto di testi turistici sulla promozione delle aree naturali protette in Spagna, a cui si affianca l’analisi delle fonti normative nazionali, internazionali ed eurounitarie. Lo scopo dello studio è fornire una descrizione delle differenze e somiglianze a livello concettuale e terminologico nel settore della tutela ambientale, basata sull’analisi delle classificazioni e delle caratteristiche delle aree naturali protette. Nel materiale analizzato si individuano differenze e somiglianze tra i termini specifici delle due culture in esame, mentre la disamina delle traduzioni presenti nel corpus parallelo permette di identificare i principali problemi traduttivi ed eventuali errori di traduzione. Infine, i corpora comparabili monolingui e le fonti normative vengono considerati strumenti fondamentali che permettono di evitare errori traduttivi e selezionare le strategie traduttive più adeguate, come la domesticazione e la stranierizzazione (Venuti 1995) o l’espansione e la semplificazione, al fine di rendere più fruibili i testi tradotti e consentire sia ai destinatari della lingua di partenza che a quelli della lingua di arrivo di condividere se non proprio la stessa realtà concettuale, una realtà molto simile. Abstract In the process of translating Italian-Spanish environmental texts, translators frequently come across terms which at first glance might seem to be perfect translation equivalents, such as parque natural in Spanish and parco naturale in Italian, which can sometimes result in mistranslation and misinterpretation. These terms are embedded in cultures and, thus, different both from a social and a political perspective. Consequently, translators must be aware of these underlying differences so as to avoid possible pitfalls in interpreting the content and translating the texts correctly. This article presents a study carried out on a Spanish-Italian parallel corpus of tourist texts dealing with the promotion of protected natural areas in Spain, which is accompanied by an analysis of the legal sources at national, international and European levels. The purpose of the study is to provide a description of the differences and similarities at the conceptual and terminological levels in the specific field of environmental protection, based on the analysis of the characteristics and classifications of protected natural areas. The use of these resources will identify the differences and similarities between the specific terms of the two cultures examined (the Italian and Spanish), while the discussion of translations in Italian in the parallel corpus allows to highlight the major translation problems and potential translation errors. Finally, the use of the comparable monolingual corpora and the consultation of legal sources are seen as key tools that help translators to avoid errors and to select the most appropriate translation strategies, such as domestication or foreignization (Venuti 1995) and amplification or simplification in order to make the translated texts more accessible and allow both the recipients of the source language and those of the target language to share if not exactly the same conceptual reality, a very similar reality. Resumen En la traducción de textos sobre temas ambientales del español al italiano o vice versa, el traductor se enfrenta a menudo a términos que a primera vista pueden parecer equivalentes absolutos, como parque regional en español y parque regional en italiano. Sin embargo, estos términos a veces puede conducir a errores en la traducción o a una interpretación inadecuada en el texto meta debido a la relación del texto origen con la cultura origen. Por lo tanto, los traductores deben ser conscientes de estas diferencias entre la cultura origen y la meta a fin de evitar posibles errores de interpretación y transmitir correctamente el mensaje original. En este artículo se presenta un estudio llevado a cabo en un corpus paralelo español-italiano compuesto por textos turísticos relacionados con la promoción de las áreas naturales protegidas en España, junto con un análisis de las fuentes aplicables nacionales, internacionales y comunitarias. El propósito del estudio es proporcionar una descripción de las diferencias y similitudes en el plano conceptual y terminológico en el ámbito de la protección del medio ambiente, basado ​​en el análisis de las características y clasificaciones de las áreas naturales protegidas. En el material analizado se identifican similitudes y diferencias entre los términos específicos de las dos culturas en cuestión, mientras que el examen de las traducciones en el corpus paralelo se utiliza para identificar los principales problemas de traducción y errores de traducción. Finalmente, los corpus monolingües y la normativa al respecto se consideran herramientas fundamentales para evitar los errores de traducción y seleccionar las estrategias de traducción más apropiadas, tales como la domesticación y extranjerización (Venuti 1995) o la amplificación y la simplificación, con el fin de producir unos textos traducidos más accesibles y ofrecer tanto a los destinatarios de la lengua de origen como a los de la lengua meta una realidad compartida, si no exactamente la misma desde un punto de vista conceptual, sí muy similar.
APA, Harvard, Vancouver, ISO, and other styles
3

Satoła-Staśkowiak, Joanna. "On the Benefits of Foreign Language Learning Based on Parallel Language Corpus." Cognitive Studies | Études cognitives, no. 15 (December 31, 2015): 57–65. http://dx.doi.org/10.11649/cs.2015.005.

Full text
Abstract:
On the Benefits of Foreign Language Learning Based on Parallel Language CorpusA recently observed strong interest in language corpora, which can be defined as a collection of texts in an electronic format, as well as my work within the European Project Clarin on ‘The Parallel Polish-Bulgarian-Russian Corpus’ became the reason for writing the text concerning the use of the parallel language corpus for learning a foreign language. The article discusses the benefits resulting from the use of such a corpus in learning a foreign language, describes selected corpus language tools supporting the learning process as well as indicates some threats arising from the wrong use of the corpus.
APA, Harvard, Vancouver, ISO, and other styles
4

Macken, Lieve, Orphée De Clercq, and Hans Paulussen. "Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus." Meta 56, no. 2 (October 14, 2011): 374–90. http://dx.doi.org/10.7202/1006182ar.

Full text
Abstract:
This paper presents the Dutch Parallel Corpus, a high-quality parallel corpus for Dutch, French and English consisting of more than ten million words. The corpus contains five different text types and is balanced with respect to text type and translation direction. All texts included in the corpus have been cleared from copyright. We discuss the importance of parallel corpora in various research domains and contrast the Dutch Parallel Corpus with existing parallel corpora. The Dutch Parallel Corpus distinguishes itself from other parallel corpora by having a balanced composition and by its availability to the wide research community, thanks to its copyright clearance. All texts in the corpus are sentence-aligned and further enriched with basic linguistic annotations (lemmas and word class information). Approximately 25,000 words of the Dutch-English part have been manually aligned at the sub-sentential level. Rich metadata facilitates the navigability of the corpus and enables users to select the texts that satisfy their needs. The entire corpus is released as full texts in XML format and is also available via a web interface, which supports basic and complex search queries and presents the results as parallel concordances. The corpus will be distributed by the Flemish-Dutch Human Language Technology Agency (TST-Centrale).
APA, Harvard, Vancouver, ISO, and other styles
5

Levchuk, Pavlo, Danuta Roszko, and Roman Roszko. "Multilingual corps institute of Slavic Studies, Polish Academy of Sciences – CLARIN PL. Polish-Lithuanian Parallel Corpus “2” and Polish-Ukrainian Parallel Corpus." Language: classic - modern - postmodern, no. 6 (December 30, 2020): 306–170. http://dx.doi.org/10.18523/lcmp2522-9281.2020.6.306-170.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Izquierdo, Marlén, Knut Hofland, and Øystein Reigem. "The ACTRES parallel corpus: an English–Spanish translation corpus." Corpora 3, no. 1 (May 2008): 31–41. http://dx.doi.org/10.3366/e1749503208000051.

Full text
Abstract:
This paper describes the compilation of the ACTRES Parallel Corpus, an English–Spanish translation corpus built at the Department of Modern Languages at the University of León (Spain) by the ACTRES research group. The computerisation of the corpus was carried out in collaboration with Knut Hofland and Øystein Reigem, from the Department of Culture, Language and Information Technology, Aksis, at the UNIFOB/University of Bergen (Norway). The corpus is conceived as a powerful tool for cross-linguistic research in the fields of Contrastive Analysis and Descriptive Translation Studies. It was the need to bridge the gap between these disciplines and to extend applications that encouraged the building of a parallel corpus as a suitable tool to achieve these goals. This paper focusses on the practical aspects of building the corpus. A brief account of the research which prompted this endeavour precedes the description of this process. 4 4 This paper is an account of the building of the ACTRES Parallel Corpus, so no empirical results from research done on the basis of the corpus are reported here. Concerning new insights drawn from the actual use of P-ACTRES in English–Spanish translation and contrastive projects, there is an extended bibliography at: http://actres.unileon.es/
APA, Harvard, Vancouver, ISO, and other styles
7

Matvieieva, Svitlana. "Критерії відбору та первинна обробка емпіричного матеріалу паралельного корпусу юридичних текстів." Forum Filologiczne Ateneum, no. 1(7)2019 (December 31, 2019): 167–81. http://dx.doi.org/10.36575/2353-2912/1(7)2019.167.

Full text
Abstract:
The article deals with the formation of criteria for the primary selection of legal texts for the English-Ukrainian parallel corpus of legal texts. The author has developed a classification of legal texts on the basis of the style and text genres, taking into account the types of legal acts, and makes an attempt to combine legal and linguistic characteristics applicable to the classification of legal documents. The article proposes the structure of the metadata card for corpus texts (original and translation), which are tested on text samples. The need for metatext data and extra-linguistic information for working with corpus texts is substantiated in the article.
APA, Harvard, Vancouver, ISO, and other styles
8

Karimov, Rustam Abdurasulovich. "Text Selection Issue For Parallel Corpus." American Journal of Social Science and Education Innovations 2, no. 09 (September 26, 2020): 311–16. http://dx.doi.org/10.37547/tajssei/volume02issue09-48.

Full text
Abstract:
It is known that the basis of any corpus is its units. Typically, texts of different genres are selected as the corpus unit to ensure the representativeness of the corpus. Therefore, when creating any language corpus, first of all, the principles of selection of texts that are part of it should be defined. Parallel corpus units consist of texts that have been translated one or more times from the original. Which topic and genre text to choose for the parallel corpus is determined by the purpose of the compiler?
APA, Harvard, Vancouver, ISO, and other styles
9

Resnik, Philip, and Noah A. Smith. "The Web as a Parallel Corpus." Computational Linguistics 29, no. 3 (September 2003): 349–80. http://dx.doi.org/10.1162/089120103322711578.

Full text
Abstract:
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.
APA, Harvard, Vancouver, ISO, and other styles
10

Al-Raisi, Fatima, Weijian Lin, and Abdelwahab Bourai. "A Monolingual Parallel Corpus of Arabic." Procedia Computer Science 142 (2018): 334–38. http://dx.doi.org/10.1016/j.procs.2018.10.487.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Duškin, Maksim, and Joanna Satoła-Staśkowiak. "The Bulgarian-Polish-Russian parallel corpus." Cognitive Studies | Études cognitives, no. 11 (November 24, 2015): 241–54. http://dx.doi.org/10.11649/cs.2011.015.

Full text
Abstract:
The Bulgarian-Polish-Russian parallel corpusThe Semantics Laboratory Team of Institute of Slavic Studies of Polish Academy of Sciences is planning to begin work on the creation of a Bulgarian-Polish-Russian parallel corpus. The three selected languages are representatives of the main groups of Slavic languages: Bulgarian represents the southern group of Slavic languages, Polish – the western group of Slavic languages, Russian – the eastern group of Slavic languages. Our project will be the first parallel corpus of these three languages. The planned corpus will be based on material, dating from one period (the 20th century) and will have a synchronous nature. The project will not constitute the sum of the separate corpora of selected languages.One of the problems with creating multilingual parallel corpora are different proportions of translated texts between the selected languages, for example, Polish literature is often translated into Bulgarian, but not vice versa.Bulgarian, Russian and Polish differ typologically – Bulgarian is an analytic language, Polish and Russian are synthetic. The parallel corpus should have compatible annotation, while taking into account the characteristic features of the selected languages.We hope that the Bulgarian-Polish-Russian parallel corpus will serve as a source of linguistic material of contrastive language studies and may prove to be a big help for linguists, translators, terminologists and students of linguistics. The results of our work will be available on the Internet.
APA, Harvard, Vancouver, ISO, and other styles
12

Xiong, Kai, Rui Yuan, Wenxue He, Yanmei Jing, Yansheng Wang, Qiqi He, and Huafu Li. "Crawling Chinese-Myanmar Parallel Corpus: Automatic Collection, Screening and Cleaning Corpus." IOP Conference Series: Materials Science and Engineering 646 (October 17, 2019): 012046. http://dx.doi.org/10.1088/1757-899x/646/1/012046.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Ullah, Irfan, Liaqat Iqbal, and Ayaz Ahmad. "Pakistani Identity and Kamila Shamsies Novels: An Analysis in Stylistics (Thematic Parallelism)." Global Regional Review IV, no. II (June 30, 2019): 301–9. http://dx.doi.org/10.31703/grr.2019(iv-ii).32.

Full text
Abstract:
This paper explored thematic parallelism in Kamila five of Shamsies novels i.e. Salt and Saffron, Cartography, Broken Verses, Burnt Shadows, and Home Fire. The paper identify here conflicts, depressions, identity fluctuations and a relentless machination of transformations by the powerful and resisting quarters of the region. The repetitive rule of military in Pakistan, the negative fallouts of engagement in Afghanistans resistance against the Soviets, the alienation of Muhajirs, the national and international catastrophe of 9/11 emerge as the strings that reflect the dilemma of the nomadism of modern times. The tyranny of destructive forces is amply reflected in the parallel desolation of places, characters and cultures. Karachi in its violence is parallel to Tokyo and New York. These parallels sublimate each other in conveying the poignancy of uprootedness and loss of identity. The lexical and syntactic parallels identifiable through corpus tools helped in identifying such parallels.
APA, Harvard, Vancouver, ISO, and other styles
14

Lesatari, Aufa Eka Putri, Arie Ardiyanti, Arie Ardiyanti, Ibnu Asror, and Ibnu Asror. "Phrase Based Statistical Machine Translation Javanese-Indonesian." JURNAL MEDIA INFORMATIKA BUDIDARMA 5, no. 2 (April 25, 2021): 378. http://dx.doi.org/10.30865/mib.v5i2.2812.

Full text
Abstract:
This research aims to produce a statistical machine translation that can be implemented to perform Javanese-Indonesian translation and to know the influence of the main data sources of statistical machine translation namely parallel corpus and monolingual corpus on the quality of Javanese-Indonesian statistical machine translation. The testing was carried out by gradually adding the quantity of parallel corpus and monolingual corpus to seven configurations of Javanese-Indonesian statistical machine translation. All machine translation configuration experiments were tested with test data totaling 500 lines of Javanese sentences. Results from machine translation are evaluated automatically using Bilingual Evaluation Understudy (BLEU). Test results in seven configurations showed an increase in the evaluation value of the translation machine after the quantity of parallel corpus and monolingual corpus was added. The quantity of parallel corpus in configurations 1 and 2 increased by 3,6%, configurations 2 and 3 increased by 8,23%, configurations 3 and 7 increased by 14,92%. Additional monolingual corpus quantity in configurations 4 and 5 increased BLEU score by 0,18%, configurations 5 and 6 increased by 0,06%, configurations 6 and 7 increased by 0,24%. The test results showed that the quantity of parallel corpus and monolingual corpus could increase the evaluation value of statistical machine translation Javanese-Indonesian, but the quantity of parallel corpus had a greater influence than the quantity of monolingual corpus
APA, Harvard, Vancouver, ISO, and other styles
15

Erjavec, Tomaž. "The IJS-ELAN Slovene-English Parallel Corpus." International Journal of Corpus Linguistics 7, no. 1 (October 18, 2002): 1–20. http://dx.doi.org/10.1075/ijcl.7.1.01erj.

Full text
Abstract:
The paper presents an annotated parallel Slovene-English corpus developed in the scope of the EU ELAN project. The IJS-ELAN corpus was compiled to be a widely distributable dataset for language engineering and for translation and terminology studies. The corpus contains 1 million words from fifteen recent terminology-rich texts. The corpus is sentence aligned and word-tagged with context disambiguated morphosyntactic descriptions and lemmas. These descriptions model simple feature structures, the structure of which is shared between Slovene and English. The corpus is encoded according to the Guidelines for Text Encoding and Interchange and is freely available on the Web for downloading. Additionally, access to IJS-ELAN is available via a powerful Web concordancer.
APA, Harvard, Vancouver, ISO, and other styles
16

KASHIOKA, HIDEKI. "Synonymous Sentences Grouping with Multilingual Parallel Corpus." Journal of Natural Language Processing 11, no. 5 (2004): 3–18. http://dx.doi.org/10.5715/jnlp.11.5_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Leong, Chongman, Xuebo Liu, Derek F. Wong, and Lidia S. Chao. "Exploiting Translation Model for Parallel Corpus Mining." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2829–39. http://dx.doi.org/10.1109/taslp.2021.3105798.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Alotaibi, Hind M. "AEPC: Designing an Arabic/English parallel corpus." Research in Corpus Linguistics 4 (2016): 1–7. http://dx.doi.org/10.32714/ricl.04.01.

Full text
Abstract:
Parallel corpora ‒ collections of aligned translated texts of two or more languages ‒ play a significant role in translation and contrastive studies. Given the importance of the availability of such learning resources for the education and training of translators, Arabic suffers from a lack of such learning resources. Although there are a limited number of free Arabic/English parallel corpora, a major drawback is that they are domain-restricted corpora, which limits their benefits for Arabic translation education. This paper describes an ongoing project to design and construct a balanced, representative, and free-to-use Arabic English parallel corpus (AEPC). In addition, the project involves the design and implementation of an Arabic/English concordance tool. The proposed parallel corpus and its tool can be integrated into translators’ training institutions as an educational resource for translation studies and teaching. It can be used in training and testing Arabic/English machine translation systems. The first phase of this project involved compiling high-quality translated text samples; all translations were done by human translators. The corpus covers a wide range of text types and rich metadata. The target figure for the corpus is minimally 10 million words, with the intention to increase that figure in the future. After compiling the texts, manual (i.e. human-aided) alignment was performed, offering better outcomes in terms of accuracy compared to automated alignment. The second phase of this project involved designing a web interface with a bilingual concordancer, where users can explore the content of the AEPC in both English and Arabic.
APA, Harvard, Vancouver, ISO, and other styles
19

Mohammadi, Mohammad Hadi, Qiao Pan, Dehua Chen, and Marjan Kamyab. "PC-Corpus: A Persian-Chinese Parallel Corpora." Journal of Physics: Conference Series 1176 (March 2019): 022002. http://dx.doi.org/10.1088/1742-6596/1176/2/022002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Schwartz, Lane. "Better Splitting Algorithms for Parallel Corpus Processing." Prague Bulletin of Mathematical Linguistics 98, no. 1 (October 1, 2012): 109–19. http://dx.doi.org/10.2478/v10108-012-0013-x.

Full text
Abstract:
Better Splitting Algorithms for Parallel Corpus Processing Each iteration of minimum error rate training involves re-translating a development set. Distributing this work across computational nodes can speed up translation time, but in practice some parts may take much longer to complete than others, leading to computational slack time. To address this problem, we develop three novel algorithms for distributing translation tasks in a parallel computing environment, drawing on research in parallel machine scheduling. We present results showing a substantial speedup in overall decoding time.
APA, Harvard, Vancouver, ISO, and other styles
21

Deep, Kamal, Ajit Kumar, and Vishal Goyal. "Development of Punjabi-English (PunEng) Parallel Corpus for Machine Translation System." International Journal of Engineering & Technology 7, no. 2 (May 10, 2018): 690. http://dx.doi.org/10.14419/ijet.v7i2.10762.

Full text
Abstract:
This paper describes the creation process and statistics of Punjabi English (PunEng) parallel corpus. Parallel corpus is the main requirement to develop statistical machine translation as well as neural machine translation. Until now, we do not have any availability of PunEng parallel corpus. In this paper, we have shown difficulties and intensive labor to develop parallel corpus. Methods used for collecting data and the results are discussed, errors during the process of collecting data and how to handle these errors will be described.
APA, Harvard, Vancouver, ISO, and other styles
22

Lefever, Els, and Véronique Hoste. "Parallel corpora make sense." International Journal of Corpus Linguistics 19, no. 3 (September 1, 2014): 333–67. http://dx.doi.org/10.1075/ijcl.19.3.02lef.

Full text
Abstract:
We present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.
APA, Harvard, Vancouver, ISO, and other styles
23

De Pauw, Guy, Peter Waiganjo Wagacha, and Gilles-Maurice de Schryver. "Exploring the sawa corpus: collection and deployment of a parallel corpus English—Swahili." Language Resources and Evaluation 45, no. 3 (July 19, 2011): 331–44. http://dx.doi.org/10.1007/s10579-011-9159-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Yang, Wei, Hanfei Shen, and Yves Lepage. "Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation." Journal of Information Processing 25 (2017): 88–99. http://dx.doi.org/10.2197/ipsjjip.25.88.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Gao, Yin. "Research on English Electronic Standard Database Based on Android Platform." Advanced Materials Research 971-973 (June 2014): 2752–55. http://dx.doi.org/10.4028/www.scientific.net/amr.971-973.2752.

Full text
Abstract:
E-C parallel corpus is of great importance to translation teaching and practice due to its abundant text corpus. Although there are a number of E-C parallel corpora at home and abroad, few can be applied to translation teaching. In view of this, it is necessary to construct E-C parallel corpora that can be used in translation teaching. This paper aims to explore the construction of mini E-C parallel corpus and its application in translation teaching.
APA, Harvard, Vancouver, ISO, and other styles
26

Le Bruyn, Bert, Martín Fuchs, Martijn van der Klis, Jianan Liu, Chou Mo, Jos Tellings, and Henriëtte De Swart. "Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions." Languages 7, no. 3 (July 7, 2022): 176. http://dx.doi.org/10.3390/languages7030176.

Full text
Abstract:
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel corpus traditions rely on to deal with the issue of target language representativeness of translations. On the basis of a comparison of the corpus architectures and research designs of the three traditions, we argue that they have each developed their own representativeness strategies: (i) monolingual control corpora (Contrastive tradition), (ii) limits on the scope of research questions (Typological tradition), and (iii) parallel control corpora (Translation Mining tradition). We introduce normalized pointwise mutual information (NPMI) as a bi-directional measure of cross-linguistic association, allowing for an easy comparison of the outcomes of different traditions and the impact of the monolingual and parallel control corpus representativeness strategies. We further argue that corpus size has a major impact on the reliability of the monolingual control corpus strategy and that a sequential parallel control corpus strategy is preferable for smaller corpora.
APA, Harvard, Vancouver, ISO, and other styles
27

Cheon, Juryong, and Youngjoong Ko. "Parallel sentence extraction to improve cross-language information retrieval from Wikipedia." Journal of Information Science 47, no. 2 (February 10, 2021): 281–93. http://dx.doi.org/10.1177/0165551521992754.

Full text
Abstract:
Translation language resources, such as bilingual word lists and parallel corpora, are important factors affecting the effectiveness of cross-language information retrieval (CLIR) systems. In particular, when large domain-appropriate parallel corpora are not available, developing an effective CLIR system is particularly difficult. Furthermore, creating a large parallel corpus is costly and requires considerable effort. Therefore, we here demonstrate the construction of parallel corpora from Wikipedia as well as improved query translation, wherein the queries are used for a CLIR system. To do so, we first constructed a bilingual dictionary, termed WikiDic. Then, we evaluated individual language resources and combinations of them in terms of their ability to extract parallel sentences; the combinations of our proposed WikiDic with the translation probability from the Web’s bilingual example sentence pairs and WikiDic was found to be best suited to parallel sentence extraction. Finally, to evaluate the parallel corpus generated from this best combination of language resources, we compared its performance in query translation for CLIR to that of a manually created English–Korean parallel corpus. As a result, the corpus generated by our proposed method achieved a better performance than did the manually created corpus, thus demonstrating the effectiveness of the proposed method for automatic parallel corpus extraction. Not only can the method demonstrated herein be used to inform the construction of other parallel corpora from language resources that are readily available, but also, the parallel sentence extraction method will naturally improve as Wikipedia continues to be used and its content develops.
APA, Harvard, Vancouver, ISO, and other styles
28

Santos, Diana, and Signe Oksefjell. "Using a Parallel Corpus to Validate Independent Claims." Languages in Contrast 2, no. 1 (December 31, 1999): 115–30. http://dx.doi.org/10.1075/lic.2.1.07san.

Full text
Abstract:
This paper examines the results from two corpus-based contrastive studies. Both studies offer cross-linguistic claims about the language pair English-Portuguese. We attempt to replicate the studies and check the findings against a different corpus, viz. the English—Portuguese part of the English—Norwegian Parallel Corpus, to see whether the regularities observed in the original corpora can be confirmed. After a brief presentation of each study, we describe how we gathered equivalent data, present our findings in the new corpus, and discuss some possible reasons for discrepancies in relation to the earlier studies. The topics investigated are boundary-crossing movement descriptions (after Slobin 1997) and perception verbs (after Santos 1998).
APA, Harvard, Vancouver, ISO, and other styles
29

Zhang, Yingyi. "Russian Speech Conversion Algorithm Based on a Parallel Corpus and Machine Translation." Wireless Communications and Mobile Computing 2022 (March 23, 2022): 1–9. http://dx.doi.org/10.1155/2022/8023115.

Full text
Abstract:
The phonetic conversion technology is crucial in the resource construction of Russian phonetic information processing. This paper explains how to build a corpus and the key algorithms that are used, as well as how to design auxiliary translation software and implement the key algorithms. This paper focuses on the “parallel corpus” method of problem solving and the indispensable role of a parallel corpus in Russian learning. This paper examines the foundations, motivations, and methods for using parallel corpora in translation instruction. The main way of using a parallel corpus in the classroom environment is to present data, so that learners can be exposed to a large amount of easily screened bilingual data, and translation skills and specific language item translation can be taught in a concentrated and focused manner. Among them, the creation of a large-scale Russian-Chinese parallel corpus will play an important role not only in improving the translation quality of Russian-Chinese machine translation systems but also in Chinese and Russian teaching as well as other branches of linguistics and translation studies, all of which should be given sufficient attention. This paper proposes the use of automatic speech analysis technology to assist Russian pronunciation learning and designs a Russian word pronunciation learning assistant system with demonstration, scoring, and feedback functions, in response to the shortcomings of pronunciation teaching in Russian teaching in China. It can provide corpus support for gathering a large number of parallel corpora and, in the future, enabling online translation. This system is used for corpus automatic construction, and future corpus automatic construction systems could be built on top of it. The proper application of parallel corpus data will aid in the development of a high-quality autonomous learning and translation teaching environment.
APA, Harvard, Vancouver, ISO, and other styles
30

Sole-Mauri, Francina, Pilar Sánchez-Gijón, and Antoni Oliver. "Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents." Mutatis Mutandis. Revista Latinoamericana de Traducción 14, no. 2 (July 13, 2021): 494–508. http://dx.doi.org/10.17533/udea.mut.v14n2a10.

Full text
Abstract:
This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.
APA, Harvard, Vancouver, ISO, and other styles
31

Liu, Chao Peng. "Research on Web Application Technology for Building a Chinese-French Parallel Corpus of the Four Great Chinese Classical Novels." Applied Mechanics and Materials 473 (December 2013): 206–10. http://dx.doi.org/10.4028/www.scientific.net/amm.473.206.

Full text
Abstract:
As masterpieces in Chinese classical literature, the Four Great Chinese Classical Novels with their multilingual translations have exerted a profound influence in literature and translation studies both home and abroad. Building a Chinese-French bilingual parallel corpus of the Four Great Chinese Classical Novels is believed to facilitate large-scale investigations into the original Chinese text and their French translations in terms of stylistics, diction, culture and translation techniques. In this paper, we introduced the French translations of the four novels and illustrated the process of building the parallel corpus in detail. When the parallel corpus is completed, statistical analysis can thereby be carried out by employing different corpus tools. In order to enhance the availability and convenience of the parallel corpus, a web-based query platform is designed to provide world-wide search through the Internet for interested researchers and language learners.
APA, Harvard, Vancouver, ISO, and other styles
32

Jindal, Shishpal, Vishal Goyal, and Jaskarn Singh. "Building English-Punjabi Parallel corpus for Machine Translation." International Journal of Computer Applications 180, no. 8 (December 16, 2017): 26–29. http://dx.doi.org/10.5120/ijca2017916036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

MATSUNAGA, Tsutomu, Daisuke SATO, and Masami HARA. "Parallel Corpus Clean-up Based on Recursive Learning." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 29, no. 1 (2017): 527–32. http://dx.doi.org/10.3156/jsoft.29.1_527.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Mosavi Miangah, Tayebeh. "Constructing a Large-Scale English-Persian Parallel Corpus." Meta 54, no. 1 (April 29, 2009): 181–88. http://dx.doi.org/10.7202/029804ar.

Full text
Abstract:
Abstract In recent years the exploitation of large text corpora in solving various kinds of linguistic problems, including those of translation, is commonplace. Yet a large-scale English-Persian corpus is still unavailable, because of certain difficulties and the amount of work required to overcome them. The project reported here is an attempt to constitute an English-Persian parallel corpus composed of digital texts and Web documents containing little or no noise. The Internet is useful because translations of existing texts are often published on the Web. The task is to find parallel pages in English and Persian, to judge their translation quality, and to download and align them. The corpus so created is of course open; that is, more material can be added as the need arises. One of the main activities associated with building such a corpus is to develop software for parallel concordancing, in which a user can enter a search string in one language and see all the citations for that string in it and corresponding sentences in the target language. Our intention is to construct general translation memory software using the present English-Persian parallel corpus.
APA, Harvard, Vancouver, ISO, and other styles
35

Kuandykova, Ayana, Amandyk Kartbayev, and Tannur Kaldybekov. "English-Kazakh Parallel Corpus For Statistical Machine Translation." International Journal on Natural Language Computing 3, no. 3 (June 30, 2014): 65–72. http://dx.doi.org/10.5121/ijnlc.2014.3306.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Rakhmanova, Azizakhan Abdugafurovna. "THE ROLE OF PARALLEL TEXT IN CORPUS LINGUISTICS." Theoretical & Applied Science 91, no. 11 (November 30, 2020): 66–70. http://dx.doi.org/10.15863/tas.2020.11.91.15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Arkhipova, Elena Ivanovna. "Using a Parallel Corpus to Translate Ethnocultural Collocations." Filologičeskie nauki. Voprosy teorii i praktiki, no. 2 (February 2022): 554–58. http://dx.doi.org/10.30853/phil20220046.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Sidorova, Elena Yurievna. "Diminutive «Потихоньку» in the Texts of Parallel Corpus." Filologičeskie nauki. Voprosy teorii i praktiki, no. 11 (November 2021): 3404–9. http://dx.doi.org/10.30853/phil210567.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

박명수. "Investigation into English-Korean Parallel Corpus with ParaConc." Journal of Translation Studies 18, no. 5 (December 2017): 29–57. http://dx.doi.org/10.15749/jts.2017.18.5.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Torres-Ramos, Sulema, and Raymundo E. Garay-Quezada. "A Survey on Statistical-based Parallel Corpus Alignment." Research in Computing Science 90, no. 1 (December 31, 2015): 57–76. http://dx.doi.org/10.13053/rcs-90-1-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Čermák, František, and Alexandr Rosen. "The case of InterCorp, a multilingual parallel corpus." International Journal of Corpus Linguistics 17, no. 3 (December 31, 2012): 411–27. http://dx.doi.org/10.1075/ijcl.17.3.05cer.

Full text
Abstract:
This paper introduces InterCorp, a parallel corpus including texts in Czech and 27 other languages, available for online searches via a web interface. After discussing some issues and merits of a multilingual resource we argue that it has an important role especially for languages with fewer native speakers, supporting both comparative research and studies of the language from the perspective of other languages. We proceed with an overview of the corpus — the strategy and criteria for including new texts, the representation of available languages and text types, linguistic annotation, and a sketch of pre-processing issues. Finally, we present the search interface and suggest some research opportunities.
APA, Harvard, Vancouver, ISO, and other styles
42

Oksefjell, Signe. "A Description of the English-Norwegian Parallel Corpus." International Journal of Corpus Linguistics 4, no. 2 (December 31, 1999): 197–219. http://dx.doi.org/10.1075/ijcl.4.2.01oks.

Full text
Abstract:
This paper gives an introduction to the most important steps in the process of compiling the English-Norwegian Parallel Corpus (ENPC), which contains 50 original English text extracts with their translations into Norwegian and 50 original Norwegian text extracts with their translations into English, in all about 2.6 million words. Even if the most time-consuming part of the process is to prepare the text extracts for the corpus, much of the focus has also been on the development of software, notably a browser handling parallel texts and an alignment program linking the original and translated versions of the same text. The preparation of the texts themselves includes scanning, proofreading, mark-up, and alignment. Although the ENPC is completed, the ENPC project is still developing, and the most recent extensions will be mentioned in this paper, such as adding more languages, compiling multiple translations (in the same language) of the same text, part-of-speech-tagging, and marking direct speech and thought in the ENPC.
APA, Harvard, Vancouver, ISO, and other styles
43

Tadić, M. "Procedures in Building the Croatian-English Parallel Corpus." International Journal of Corpus Linguistics 6, no. 1 (December 1, 2001): 107–23. http://dx.doi.org/10.1075/ijcl.6.3.10tad.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Tadic, Marko. "Procedures in Building the Croatian-English Parallel Corpus." Text Corpora and Multilingual Lexicography 6, no. 3 (December 17, 2001): 107–23. http://dx.doi.org/10.1075/ijcl.6.si.10tad.

Full text
Abstract:
This contribution gives a survey of procedures and formats used in building the Croatian-English parallel corpus which is being collected at the Institute of Linguistics at the Philosophical Faculty, University of Zagreb. The primary text source is the newspaper Croatia Weekly which has been published from the beginning of 1998 by HIKZ (Croatian Institute for Information and Culture). After a quick survey of existing English-Croatian parallel corpora, the article copes with procedures involved in text conversion and text encoding, particularly the alignment. There are several recent suggestions for alignment encoding, and they are listed and elaborated at the end of the article.
APA, Harvard, Vancouver, ISO, and other styles
45

Tellings, Jos, Martín Fuchs, Martijn Van der Klis, Bert Le Bruyn, and Henriëtte De Swart. "Perfect variations in dialogue: a parallel corpus approach." Semantics and Linguistic Theory 1 (December 29, 2022): 22. http://dx.doi.org/10.3765/salt.v1i0.5342.

Full text
Abstract:
The variation in distribution and meaning of the English Present Perfect compared to its counterparts in other European languages raises a puzzle for the cross-linguistic semantics and pragmatics of tense and aspect. We apply Translation Mining, a form-based approach, to analyze the meaning of the HAVE-PERFECT across languages in a parallel corpus based on "Harry Potter and the Philosopher's Stone" and its translations in Swedish, Spanish, Dutch, German and French. We use the alternation in the Harry Potter novel between narrative discourse (storytelling) and dialogue (the characters talking to each other) to establish the PERFECT as an indexical tense-aspect category that appears exclusively in dialogue. We then link the proposed information management roles of the Present Perfect (Portner 2003, Nishiyama & Koenig 2010) to moves in the language game. We find different distributions of PERFECT use across the sentence types corresponding to these moves (declarative vs. interrogative). This lends support to a cross-linguistically common rhetorical structure in sequences of PERFECT sentences (de Swart 2007).
APA, Harvard, Vancouver, ISO, and other styles
46

Trushkina, Julia. "The North-West University Bible corpus: A multilingual parallel corpus for South African languages." Language Matters 37, no. 2 (January 2006): 227–45. http://dx.doi.org/10.1080/10228190608566262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Shlesinger, Miriam. "Corpus-based Interpreting Studies as an Offshoot of Corpus-based Translation Studies." Meta 43, no. 4 (October 2, 2002): 486–93. http://dx.doi.org/10.7202/004136ar.

Full text
Abstract:
Abstract This article looks at the problems and benefits that may arise from applying a corpus- based methodology to the study of interpreting. It explores two ways in which interpreting could be fruitfully investigated with the aid of corpora. The first is direct and involves the design of new types of parallel and comparable corpora. The second consists of employing existing monolingual corpora to extract material that can be used in experimental research into interpreting.
APA, Harvard, Vancouver, ISO, and other styles
48

Mihailov, Mihail, and Hannu Tommola. "Compiling Parallel Text Corpora: Towards Automation of Routine Procedures." Text Corpora and Multilingual Lexicography 6, no. 3 (December 17, 2001): 67–77. http://dx.doi.org/10.1075/ijcl.6.si.07mih.

Full text
Abstract:
The aim of the research project running at the Department of Translation Studies of the University of Tampere is to collect a Russian-Finnish parallel corpus of fiction. The corpus will be equipped with efficient search and analysis tools. The texts of the corpus will be stored as ordinary text files. Each text will be registered in a Microsoft Access database and supplied with a description. Automated parallel concordancing is being developed for the corpus. The program will find the keywords in text A (Russian), then look for possible translation equivalents of the keywords in language B (Finnish), and then search for the portion of text B (Finnish) where most of the keywords in question can be found.
APA, Harvard, Vancouver, ISO, and other styles
49

Siruk, Olena, and Ivan Derzhanski. "Linguistic Corpora as International Cultural Heritage: The Corpus of Bulgarian and Ukrainian Parallel Texts." Digital Presentation and Preservation of Cultural and Scientific Heritage 3 (September 30, 2013): 91–98. http://dx.doi.org/10.55630/dipp.2013.3.9.

Full text
Abstract:
The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian parallel texts. We discuss some differences in the approaches and the interpretation of some concepts, as well as various problems associated with the construction of our corpus, in particular the occasional ‘nonparallelism’ of original and translated texts. We give examples of the a pplication of the parallel corpus for the study of lexical semantics and note the outstanding role of the corpus in the lexicographic description of Ukrainian and Bulgarian translation equivalents. We draw attention to the importance of creating parallel corpora as objects of national as well as global cultural heritage.
APA, Harvard, Vancouver, ISO, and other styles
50

Orrequia-Barea, Aroa, and Cristian Marín-Honor. "Building a parallel corpus of literary texts featuring onomatopoeias: ONPACOR." Research in Corpus Linguistics 8, no. 2 (2020): 46–62. http://dx.doi.org/10.32714/ricl.08.02.03.

Full text
Abstract:
Onomatopoeias constitute a much neglected subject in linguistics. The rather scarce literature on onomatopoeias is derived from a lack of reliable empirical data on the topic. In order to bridge this gap, we have compiled a parallel corpus of literary texts featuring onomatopoeias: the Onomatopoeia Parallel Corpus (ONPACOR). The corpus consists of onomatopoeias in English, Spanish and French extracted from comics and representative corpora of each language. ONPACOR has been built on the basis of existing translations to the languages of reference. This article describes the methodology used to compile the corpus, as well as the applications that it can have.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography