Academic literature on the topic 'Corpus comparable'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Corpus comparable.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Corpus comparable"

1

Laviosa, Sara. "How Comparable Can 'Comparable Corpora' Be?" Target. International Journal of Translation Studies 9, no. 2 (1997): 287–317. http://dx.doi.org/10.1075/target.9.2.05lav.

Full text
Abstract:
Abstract The development of a coherent methodology for corpus-based work in translation studies is essential for the evolution of this newfield of research into a fully-fledged paradigm within the discipline. The design of a monolingual, multi-source-language comparable corpus of English as a resource for the systematic study of the nature of translated text can be regarded as an important step towards the development of such a methodology. This paper deals with a crucial and problematic aspect of the design of a monolingual comparable corpus, namely the achievement of an adequate level of comparability between its translational and non-translational components.
APA, Harvard, Vancouver, ISO, and other styles
2

López Arroyo, Belén. "Can comparable corpora be compared?" Ibérica, no. 39 (January 2, 2020): 43–68. http://dx.doi.org/10.17398/2340-2784.39.43.

Full text
Abstract:
Podemos afirmar que, hoy en día, no existe un acuerdo unánime sobre los criterios para compilar un corpus comparable o sobre cómo evaluar la comparabilidad de un corpus. Un corpus comparable es una colección de textos en diferentes lenguas o variaciones que son similares en ciertos aspectos. Pero, ¿en cuáles? Según McEnery y Wilson (2007: 20), la proporción en las muestras, el género, campo y tiempo deben ser los criterios principales a la hora de compilar un corpus comparable y deben ser los mismos en las diferentes lenguas. Sin embargo, estudios previos (López-Arroyo & Roberts, 2017) demuestran que estos criterios pueden no ser válidos en todos los campos. En el presente estudio, analizamos la comparabilidad desde el punto de vista del propósito del corpus. Para ello, hemos compilado un corpus comparable de 150 fichas de cata en inglés y 150 en español escritas por dos autoridades del campo y publicadas en las mismas décadas; según McEnery y Xiao (2007) nuestros subcorpus reúnen todos los requisitos para ser comparables. Sin embargo, nuestra metodología, centrada en el análisis de otros factores tales como El formato, el contenido y el estilo, demostrará que únicamente la proporción, el género, el campo, el tiempo y el tamaño no son siempre suficientes a la hora de comparar corpus
APA, Harvard, Vancouver, ISO, and other styles
3

Čermáková, Ann, Jarmo Jantunen, Tommi Jauhiainen, et al. "The International Comparable Corpus: Challenges in building multilingual spoken and written comparable corpora." Research in Corpus Linguistics 10, no. 1 (2021): 89–103. http://dx.doi.org/10.32714/ricl.09.01.06.

Full text
Abstract:
This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on the International Corpus of English (ICE). As such, ICC will contain approximately the same balance of forty percent of written language and 60 percent of spoken language distributed across 27 different text types and contexts. A number of issues encountered by the project teams are discussed, ranging from copyright and data sustainability to technical advances in data distribution.
APA, Harvard, Vancouver, ISO, and other styles
4

Awal, Norsimah Mat, Intan Safinaz Zainuddin, and Imran Ho-Abdullah. "Use of Comparable Corpus in Teaching Translation." Procedia - Social and Behavioral Sciences 18 (2011): 638–42. http://dx.doi.org/10.1016/j.sbspro.2011.05.094.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

López Arroyo, Belén, and Roda P. Roberts. "Genre and Register in Comparable Corpora: An English/Spanish Contrastive Analysis." Meta 62, no. 1 (2017): 114–36. http://dx.doi.org/10.7202/1040469ar.

Full text
Abstract:
A multilingual comparable corpus is a corpus containing texts that are collected using the same sampling frame and similar balance and representativeness. According to McEnery and Xiao (2007: 20), presenting proportion, genre, domain, and time constitutes the main criteria when compiling a comparable corpus and these criteria must match in the different languages for the corpus to be considered comparable. The problem is that these criteria do not always guarantee that the different language subcorpora in a comparable corpus match. This study, which analyzes two comparable corpora compiled by the authors, shows that, even when the text selection criteria are refined, genre theory cannot always guarantee enough linguistic similarities between language for specific purposes (LSP) texts in different languages. Genre seems to suffice to establish a good comparable corpus for scientific abstracts. However, the comparable corpus of wine tasting notes is not truly comparable, since the English and Spanish texts differ in register.
APA, Harvard, Vancouver, ISO, and other styles
6

Weng, Yu, Shumin Dong, and Chaomurilige Chaomurilige. "A Privacy-Preserving Multilingual Comparable Corpus Construction Method in Internet of Things." Mathematics 12, no. 4 (2024): 598. http://dx.doi.org/10.3390/math12040598.

Full text
Abstract:
With the expansion of the Internet of Things (IoT) and artificial intelligence (AI) technologies, multilingual scenarios are gradually increasing, and applications based on multilingual resources are also on the rise. In this process, apart from the need for the construction of multilingual resources, privacy protection issues like data privacy leakage are increasingly highlighted. Comparable corpus is important in multilingual language information processing in IoT. However, the multilingual comparable corpus concerning privacy preserving is rare, so there is an urgent need to construct a multilingual corpus resource. This paper proposes a method for constructing a privacy-preserving multilingual comparable corpus, taking Chinese–Uighur–Tibetan IoT based news as an example, and mapping the different language texts to a unified language vector space to avoid sensitive information, then calculates the similarity between different language texts and serves as a comparability index to construct comparable relations. Through the decision-making mechanism of minimizing the impossibility, it can identify a comparable corpus pair of multilingual texts based on chapter size to realize the construction of a privacy-preserving Chinese–Uighur–Tibetan comparable corpus (CUTCC). Evaluation experiments demonstrate the effectiveness of our proposed provable method, which outperforms in accuracy rate by 77%, recall rate by 34% and F value by 47.17%. The CUTCC provides valuable privacy-preserving data resources support and language service for multilingual situations in IoT.
APA, Harvard, Vancouver, ISO, and other styles
7

Sun, Yuan, and Qian Zhao. "Tibetan-Chinese Named Entity Extraction Based on Comparable Corpus." Applied Mechanics and Materials 571-572 (June 2014): 1202–5. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.1202.

Full text
Abstract:
Tibetan-Chinese named entity extraction is the foundation of cross language information processing, and provides a basis for machine translation and cross language information retrieval research. In this paper, we use the multi-language links of Wikipedia to obtain Tibetan-Chinese comparable corpus, and combine sentence length, word matching and entity boundary words together to get parallel sentence. Then we extract Tibetan-Chinese named entity from the comparable corpus in three ways: (1) Extracting Natural labeling information. (2) Acquiring the links of Tibetan entries and Chinese entries. (3) Using sequence intersection method, which includes the sentence representation, Chinese named entity recognition and corresponding Tibetan sentences intersection. Finally, the results show the extraction method based on comparable corpus is effective.
APA, Harvard, Vancouver, ISO, and other styles
8

LI, BO, ERIC GAUSSIER, and DAN YANG. "Measuring bilingual corpus comparability." Natural Language Engineering 24, no. 4 (2018): 523–49. http://dx.doi.org/10.1017/s1351324917000481.

Full text
Abstract:
AbstractComparable corpora serve as an important substitute for parallel resources in cases of under-resourced language pairs. Previous work mostly aims to find a better strategy to exploit existing comparable corpora, while ignoring the variety in corpus quality. The quality of comparable corpora affects a lot its usability in practice, a fact that has been justified by several studies. However, researchers have not been able to establish a widely accepted and fully validated framework to measure corpus quality. We will thus investigate in this paper a comprehensive methodology to deal with the quality of comparable corpora. To be exact, we will propose several comparability measures and a quantitative strategy to test those measures. Our experiments show that the proposed comparability measure can capture gold-standard comparability levels very well and is robust to the bilingual dictionary used. Moreover, we will show in the task of bilingual lexicon extraction that the proposed measure correlates well with the performance of the real world application.
APA, Harvard, Vancouver, ISO, and other styles
9

Forchini, Pierfranca, and Amanda Murphy. "N-grams in comparable specialized corpora." Patterns, meaningful units and specialized discourses 13, no. 3 (2008): 351–67. http://dx.doi.org/10.1075/ijcl.13.3.06for.

Full text
Abstract:
This paper investigates the idiom principle realized as four-word phrases (4-grams) headed by prepositions in specialized corpora in English and Italian. Concentrating on at the end of, it reports that the collocates of at the end of regard time, and that apparently synonymic 4-grams are not used in the same contexts. It then explores realizations of at the end of in a specialized comparable corpus of Italian. Two findings emerge: firstly, that the most obvious equivalent, alla fine d*, occurs more frequently than in the English corpus; secondly, this n-gram is frequently used, but has weaker collocational relations, and several synonymic 3-grams share its collocates. This invites contrastive research on lexical variation and repetition and on the strength of collocations of multi-word units in English and Italian. Lastly, the paper recounts an experiment with students who gained awareness of language by concentrating on phraseology in comparable corpora.
APA, Harvard, Vancouver, ISO, and other styles
10

Visky, Mihaela. "L’UTILISATION DU CORPUS COMPARABLE DANS L’ENSEIGNEMENT DE LA TRADUCTION." Professional Communication and Translation Studies 6 (2013): 165–76. http://dx.doi.org/10.59168/mojx8412.

Full text
Abstract:
Les corpus comparables servent de base à la traduction assistée par ordinateur, à l’analyse contrastive, à la lexicologie, etc, et ils sont aussi utilisés dans l’enseignement de la traduction. Les exercices proposés aux étudiants ont eu comme but d’améliorer la compréhension des textes sources et la reformulation dans la langue cible, surtout en ce qui concerne l’utilisation des termes et des expressions propres à chaque langue. Nous estimons que l’utilisation du corpus comparable en classe de traduction représente une initiation au milieu professionnel et une étape importante dans la formation des traducteurs.
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography