Relevant bibliographies by topics / Lexicographic corpora

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Lexicographic corpora'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Lexicographic corpora.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Lexicographic corpora"

Ceberio, Klara, and Antton Gurrutxaga. "State-of-the-art on monolingual lexicography for Basque (Basque)." Slovenščina 2.0: empirical, applied and interdisciplinary research 7, no. 1 (April 18, 2019): 53–64. http://dx.doi.org/10.4312/slo2.0.2019.1.53-64.

Full text

Abstract:

In this article, we give an overview of the evolution of Basque lexicography to the present, pointing out its main achievements and shortcomings, as well as its challenges for the future. Basque lexicography has a relatively short history, but a considerable amount of resources have been produced in the last 50 years, since the standardisation process began. After years of lexicographic work by different groups and publishers, a remarkable achievement is the Dictionary of the Academy (Euskaltzaindiaren Hiztegia), a prescriptive updated dictionary recently published and based on historical and contemporary corpora. Although the number of monolingual products is noticeably increasing in the last years, Basque dictionary making has been specially productive for bilingual purposes, due probably to the sociolinguistic status of the language. On the other hand, specialized lexicography and terminology have been very active from the beginning of the standadisartion process. Since the beginning of the XXI. century, use of corpora has known an increasing impulse. Many Basque dictionaries are freely available on the Internet.

APA, Harvard, Vancouver, ISO, and other styles

Dutsova, Ralitsa. "Web-based Digital Lexicographic Bilingual Resources." Cognitive Studies | Études cognitives, no. 15 (December 31, 2015): 369–77. http://dx.doi.org/10.11649/cs.2015.025.

Full text

Abstract:

Web-based Digital Lexicographic Bilingual ResourcesThe paper presents briefly a web-based system for creation and management of bilingual resources with Bulgarian as one of the paired language. This is useful and easy to use tool for collection and management of a large amount of different linguistic knowledge. The system uses two sets of natural language data: bilingual dictionary and aligned text corpora

APA, Harvard, Vancouver, ISO, and other styles

Kochová, Pavla. "Frequency in Corpora as a Signal of Lexicalization (On the Absolute Usage of Comparative and Superlative Adjectives)." Journal of Linguistics/Jazykovedný casopis 70, no. 2 (December 1, 2019): 148–57. http://dx.doi.org/10.2478/jazcas-2019-0046.

Full text

Abstract:

Abstract The study deals with the category of comparison of Czech adjectives from the semantic point of view; it concentrates especially on the so-called absolute (or elative) usage of comparatives and the absolute usage of superlatives and their lexicographic treatment (or absence of the lexicographic treatment) in Czech monolingual dictionaries. The question is whether their frequency in corpora can prove lexicalization of this usage.

APA, Harvard, Vancouver, ISO, and other styles

Dobrovoljc, Kaja. "Identifying dictionary-relevant formulaic sequences in written and spoken corpora." International Journal of Lexicography 33, no. 4 (April 13, 2020): 417–42. http://dx.doi.org/10.1093/ijl/ecaa008.

Full text

Abstract:

Abstract In view of the pervasiveness of formulaic language in human communication and the growing awareness of its relevance to modern lexicography, this study presents a corpus-driven identification, analysis and comparison of dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions related to interaction management and mitigation. The final evaluation of measures used in the identification process demonstrates their relative suitability for corpus-driven identification of dictionary-relevant formulaic expressions, with their precision varying in relation to corpus size and length of sequences under investigation.

APA, Harvard, Vancouver, ISO, and other styles

Petrak, Marta. "Development of a Productive Derivational Pattern on the Basis of Loan Translation?" Linguistica 60, no. 1 (December 4, 2020): 31–60. http://dx.doi.org/10.4312/linguistica.60.1.31-60.

Full text

Abstract:

This paper deals with the question of the formation of Croatian adjectives with the prefix među-. While such adjectives were very rare in late 19th and early 20th century, an analysis of relevant lexicographic works and digital corpora demonstrated that their number started to become larger in later 20th century, culminating in recent decades. Today, the [među-N-Suff]Adj derivational pattern is a productive, accounting for 134 adjectives with a frequency of ten occurrences or more retrieved from the largest extant Croatian web corpus, hrWaC. On the basis of an analysis of available older lexicographic works and digital corpora, it can be concluded that među- prefixed adjectives first entered into Croatian as loan translations (calques) of Latin(ate) and German terms. According to more recent lexicographic works and digital corpora, later on, and especially in recent decades, which coincided with a growing English influence on Croatian, među- prefixed adjectives were probably produced as equivalents of English inter- prefixed adjectives. The number of među- prefixed adjectives, as well as the variety of semantic domains in which they are used, testify to the fact that the [među-N-Suff]Adj pattern is well-established and productive in contemporary Croatian. The analysis of Croatian među- prefixed adjectives in this paper could contribute to shedding more light on the question of morphological borrowing phenomena in general.

APA, Harvard, Vancouver, ISO, and other styles

Gizatova, Guzel. "A Corpus-Based Approach to Lexicography: A New English-Russian Phraseological Dictionary." International Journal of English Linguistics 8, no. 3 (February 28, 2018): 357. http://dx.doi.org/10.5539/ijel.v8n3p357.

Full text

Abstract:

This paper addresses the principles of constructing the first English-Russian phraseological dictionary based on corpus data. The purpose of the present research is to introduce a methodology for organizing the selected items in a corpus-searchable phraseme list of a dictionary, to discuss linguistic issues presenting difficulties for bilingual lexicography and to analyze semantic asymmetry between English and Russian phrasemes. To achieve this goal, the following methodology has been introduced: analyzing and retrieving idioms from monolingual and bilingual idiomatic dictionaries, determining the degree of frequency of the selected idioms, considering variants of idioms and arranging them in a systematic way, and developing an idiom list. A phraseme is used in this article as a general term for a multi-word phrase with at least one fixed component. The article demonstrates the advantages of compiling a phraseological bilingual dictionary based on an analysis of corpus data and using authentic examples in the lexicographic description of phrasemes. Using corpora provides a new perspective on the contextual behavior of phrasemes and restrictions of their usage. The paper discusses the impact of using parallel English and Russian corpora for analysis of non-trivial features of English phrasemes, in comparison with their Russian equivalents, in the process of constructing an English-Russian phraseological dictionary. After an introduction, the article presents the methodology and data applied in the research and then discusses the results of the study; the author provides evidence of the advantages of using corpora in bilingual lexicography.

APA, Harvard, Vancouver, ISO, and other styles

Zemicheva, Svetlana S. "From “Abarmo” to “Yashchichishko”: Creating the Lexicographic Component of the Tomsk Dialect Corpus." Voprosy leksikografii, no. 18 (2020): 98–116. http://dx.doi.org/10.17223/22274200/18/5.

Full text

Abstract:

One of the most important trends in modern dialectological science is creating new electronic resources. The article gives an overview of Russian resources of this kind. Among them dialectal corpora hold a special place. The author of the article focuses on the Tomsk Dialect Corpus, which today includes more than 1,700,000 tokens. This resource is unparalleled in Russian scientific practice. It is designed as a universal information retrieval system which includes three modules: 1) textual, 2) grammatical, 3) lexicographic. The aim of the lexicographic component is to provide definitions of dialect lexemes. To do this, it is proposed to use the Dictionary of Russian Old-Timers’ Dialects of the Middle Part of the River Ob Basin (1964–1967) edited by V.V. Palagina and two supplements to it (1975, 1983–1986). The phases of the implementation of the lexicographic module into the Tomsk Dialect Corpus are described. The first phase was the automatic recognition of the above-mentioned paper dictionary. The second stage is editing the dictionary. The principles of editing the source material are determined by the fact that the lexicographic component is considered as part of a universal electronic system. Two basic editing principles are: the possibility to process a word automatically and the autonomous functioning of each dictionary entry. In accordance with them, the vocabulary and the structure of the dictionary entry were formed. At the stage of forming the vocabulary, some dictionary entries (for example, two-word ones) were discarded. The structure of the dictionary entry contains the main areas: headword, definition and contexts. One of the main editing tasks is to combine dictionary entries from different volumes of the dictionary into one. These words are marked either as homonyms, or as the meanings of one word. Examples of dictionary entries before and after editing are presented in the article. By now, about a half of the original vocabulary has been processed (letters from A to M, 12,450 entries). The final version of the electronic dictionary as part of the Tomsk Dialect Corpus is planned to be presented on the website of the Laboratory of General and Siberian Lexicography (http://losl.tsu.ru/) by June 2021. The prospects of the project include, firstly, the expansion of the vocabulary, and secondly, the implementation of search by dictionary labels (diminutives, augmentative, etc.) into the corpus. The presented solutions can be used in the development of other dialect corpora.

APA, Harvard, Vancouver, ISO, and other styles

Geyken, Alexander. "Matching Corpus Translations with Dictionary Senses." International Journal of Corpus Linguistics 2, no. 1 (January 1, 1997): 1–22. http://dx.doi.org/10.1075/ijcl.2.1.03gey.

Full text

Abstract:

This paper addresses the question to what extent translations in bilingual parallel corpora match with dictionary senses. Automatic matching of corpus translation with dictionary senses depends on the quality of the lexicographic knowledge used, the quality of corpus processing, the impact of statistics to filter relevant entries from the corpora, and finally the quality of the translations in the multilingual corpora. We focus on the influence that the latter variable has on the performance of the automatic matching. Similarly to previous approaches, we relied on Machine Readable Dictionaries (MRDs), a part-of-speech tagger, and bilingual aligned corpora. Additionally, we used a shallow sentence parser for syntactic matching. Two case studies with two different corpora from different domains were conducted. Our test set was the intersection of 500 French communication verbs within the corpora. The results confirm that the performance of the automatic matching varies considerably with the translation quality of the parallel texts.

APA, Harvard, Vancouver, ISO, and other styles

Dalpanagioti, Thomai. "Frame-semantic issues in building a bilingual lexicographic resource." Constructions and Frames 5, no. 1 (August 5, 2013): 1–34. http://dx.doi.org/10.1075/cf.5.1.01dal.

Full text

Abstract:

This paper discusses the issues that emerged from applying frame semantics to the development of a small-scale bilingual database for Greek and English motion verbs. Proposing an alternative to current lexicography in Greece, the database exploits available corpora and query systems, and carries out a (manual) frame-semantic analysis of the extracted data. The most important theoretical implication of the database is that by combining frame semantics with conceptual metaphor theory and corpus-based information on usage patterns, we can make precise (monolingual) descriptions and effective (cross-linguistic) comparisons. From a practical perspective, the database complements existing English FrameNet and contributes to the creation of a new resource, i.e. a FrameNet for Greek.

APA, Harvard, Vancouver, ISO, and other styles

Garabík, Radovan. "Word Embedding Based on Large-Scale Web Corpora as a Powerful Lexicographic Tool." Rasprave Instituta za hrvatski jezik i jezikoslovlje 46, no. 2 (October 30, 2020): 603–18. http://dx.doi.org/10.31724/rihjj.46.2.8.

Full text

Abstract:

The Aranea Project offers a set of comparable corpora for two dozens of (mostly European) languages providing a convenient dataset for nLP applications that require training on large amounts of data. The article presents word embedding models trained on the Aranea corpora and an online interface to query the models and visualize the results. The implementation is aimed towards lexicographic use but can be also useful in other fields of linguistic study since the vector space is a plausible model of semantic space of word meanings. Three different models are available – one for a combination of part of speech and lemma, one for raw word forms, and one based on fastText algorithm uses subword vectors and is not limited to whole or known words in finding their semantic relations. The article is describing the interface and major modes of its functionality; it does not try to perform detailed linguistic analysis of presented examples.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Lexicographic corpora"

Soami, Leandre Serge. "Towards the development and application of representative lexicographic corpora for the Gabonese languages." Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/4217.

Full text

Abstract:

Thesis (DLitt (Afrikaans and Dutch))--University of Stellenbosch, 2010.
ENGLISH ABSTRACT: The compilation of dictionaries is a laborious activity and it takes time, money and staff to achieve the objectives of any dictionary project. Many dictionaries have been compiled using the lexicographers’ personal intuition and guessing rather than being corpus based. That resulted in some dictionaries often being criticised by users because of the lack of representation of some important lexical items. This can probably be explained by the fact that most of these dictionaries were compiled in an era when theoretical lexicography was lacking or not well established. The last decades have witnessed the emergence of metalexicography as a theory directed also at dictionary planning in order to enhance the quality of lexicographic practice and the way in which the management and the compilation of dictionaries are dealt with. The planning of dictionaries takes into account not only the gathering of language material to be used but also the way in which this material will be treated and presented on both the macrostructural and the microstructural level as well as in the front matter texts and the back matter texts. In order to enhance the quality of the presentation in dictionaries, this dissertation pleads in favour of the formulation of a data collection policy that takes into consideration all the different sources of material, written and spoken, used in the different phases of the compilation of a dictionary. The three phases that form the main focus of this study are the material acquisition phase, the material preparation phase and the material processing phase. The involvement of the speech community in the compilation of a lexicographic corpus ensures the collection of representative and balanced data, and the different needs of that community are central to the dictionary project. The different language materials can be organised into different corpus types. The efficiency of a corpus resides in its capacity to provide different data types that can be included in the comment on semantics and the comment on form of each article in the central list of each dictionary. Some dictionaries lack a good representation of data in both these comments in the different articles. However, languages such as the Gabonese languages are in a privileged situation because they can still avoid the mistakes of other dictionary compilers by investing in corpus-based dictionaries at this early stage. Therefore, the establishment of lexicographic units with multifunctional tasks can play an important role. In a multilingual environment such as Gabon the issue of language status needs to be dealt with carefully because it is realistic to choose a certain number of languages to function as official languages. Different alphabets are presented in this study and realistic choices are made. The way in which the language material is organised will impact on the quality of the macrostructure and microstructure; this is essential because dictionaries are consulted most of the time for the spelling of a given lexical item, for a translation equivalent or for the explanation of the meaning of a lemma sign. The computerisation of a corpus is a focal point and needs to be done in a satisfactory manner that presents a clean and helpful corpus in order to provide the lexicographer with useful statistics, frequency word lists and the different concordance lines that are very important for the wording of definitions and the extraction of example sentences. This is why a corpus is seen as an indispensable tool in the improvement of the macro- and the microstructure of any type of dictionary.
AFRIKAANSE OPSOMMING: Die saamstel van woordeboeke is ’n moeisame aktiwiteit, en dit verg tyd, geld en personeel om die doelstellings van ’n woordeboekprojek te bereik. Talle woordeboeke is op grond van die navorsers se persoonlike intuïsie en raaiwerk saamgestel, in stede daarvan dat dit korpusgebaseerd is. Die gevolg is dat baie woordeboeke dikwels deur gebruikers gekritiseer word weens die gebrek aan verteenwoordiging van enkele belangrike leksikale items. Dít kan moontlik verklaar word deur die feit dat die meeste van hierdie woordeboeke saamgestel is in ’n era waartydens teoretiese leksikografie gebrekkig en nie goed gevestig was nie. In die afgelope dekades het metaleksikografie na vore getree as a teorie wat op woordeboekbeplanning gerig is ten einde die gehalte van die leksikografie-praktyk en die manier waarop die bestuur en samestelling van woordeboeke hanteer word, te verbeter. By die beplanning van woordeboeke word nie net die versameling taalmateriaal wat gebruik kan word in berekening gebring nie, maar ook die manier waarop hierdie materiaal op sowel makro- as mikrostrukturele vlakke, asook in die voorwerk en die agterwerk, hanteer en aangebied gaan word. Ten einde die gehalte van die aanbieding in woordeboeke te verbeter, lewer hierdie proefskrif ’n pleidooi vir die formulering van ’n dataversamelingsbeleid wat al die verskillende materiaalbronne, hetsy skriftelik of mondelings, wat in die verskillende stadia van die samestelling van ’n woordeboek gebruik word, in ag neem. Die drie stadia wat die hooffokus van hierdie studie is, is die stadia waarin die materiaal aangeskaf, voorberei en verwerk word. Die spraakgemeenskap se betrokkenheid by die saamstel van ’n leksikografiese korpus verseker die versameling van verteenwoordigende en gebalanseerde data, en die verskillende behoeftes van sodanige gemeenskap is die kern van die woordeboekprojek. Die verskillende taalmateriale kan in verskillende korpussoorte georden word. Die doeltreffendheid van ’n korpus berus op die vermoë daarvan om verskillende datasoorte te verskaf wat in die kommentaar op semantiek en die kommentaar op vorm van elke item in die sentrale lys van elke woordeboek ingesluit kan word. Sommige woordeboeke toon ’n gebrek aan goeie verteenwoordiging van data in albei hierdie soorte kommentaar in die verskillende items. Tale soos die Gaboenese tale is egter in ’n bevoorregte posisie, aangesien hulle nog die foute van ander woordeboeksamestellers kan vermy deur op hierdie vroeë stadium in korpusgebaseerde woordeboeke te belê. Die stigting van leksikografiese eenhede met multifunksionele take kan dus ’n belangrike rol speel. In ’n veeltalige omgewing soos Gaboen moet die kwessie van taalstatus versigtig hanteer word, aangesien dit realisties is om ’n sekere hoeveelheid tale as amptelike tale te kies. Verskillende alfabette word in hierdie studie aangebied en realistiese keuses word gemaak. Die manier waarop die taalmateriaal georden is, sal ’n uitwerking op die makro- en mikrostruktuur hê; dit is van belang omdat woordeboeke meestal vir die spelling van ’n gegewe leksikale item, vir ’n vertaalekwivalent of vir die verklaring van die betekenis van ’n lemmateken geraadpleeg word. Die rekenarisering van ’n korpus is ’n belangrike aspek en moet op ’n bevredigende wyse uitgevoer word wat ’n skoon en nuttige korpus lewer ten einde die leksikograaf van goeie statistieke, frekwensiewoordlyste en die verskillende konkordansielyne te voorsien, wat baie belangrik is vir die skryf van definisies en die onttrekking van voorbeeldsinne. Om hierdie rede word ’n korpus as ’n onmisbare instrument in die verbetering van die makro- en mikrostruktuur van enige soort woordeboek beskou.

APA, Harvard, Vancouver, ISO, and other styles

Tiedemann, Jörg. "Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing." Doctoral thesis, Uppsala University, Department of Linguistics, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3791.

Full text

Abstract:

The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.

Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.

Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.

A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).

Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.

APA, Harvard, Vancouver, ISO, and other styles

Piccato, Mariangela. "Création et exploitation d'un corpus trilingue du tourisme (italien/français/anglais) en vue de la réalisation d'une base de données lexicale informatisée." Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO20051.

Full text

Abstract:

Au cours des dernières années, le secteur touristique a été caractérisé par toute une série de changements fondamentaux. L’un de ces changements, certainement le plus important, a été le fait d’être considéré aujourd’hui comme l’activité productive capable de faire tourner l’économie d’un pays entier.Notre mémoire de recherche se situe à l’intersection de la terminologie thématique, de la linguistique de corpus et du traitement automatique des langues.Dans le premier chapitre du travail que nous allons présenter, nous chercherons à introduire aux domaines d’études théoriques sur lesquels notre recherche s’appuie.Premièrement, on traitera de la linguistique de corpus et on examinera les différentes catégories de corpus existantes. On mettra l’accent sur deux notions fondamentales dans la conception de l’outil corpus en général et dans la création de notre corpus en particulier : représentativité et contexte. Au sein du discours touristique, la représentativité, d’un côté, se relie au caractère spécial de notre micro-langue ; le contexte, de l’autre, révèle la pluralité des sous-domaines qui composent ce technolecte à mi-chemin entre la langue générale et la langue spécialisée.Dans le deuxième chapitre, nous présenterons le corpus thématique trilingue (CTT) que nous avons créé préalablement à la rédaction de la thèse proprement dite.Avant tout, on fournira les indications théoriques et pragmatiques nécessaires pour réaliser un corpus trilingue en langue de spécialité : la collecte des textes, l’homogénéisation des échantillons textuels repérés et l’annotation. Au cours de ce chapitre, nous présenterons Alinea, l’instrument qu’on a utilisé pour l’alignement de textes recueillis et pour la consultation simultanée des traductions trilingues. Dans le troisième et dernier chapitre, on passera à l’interrogation du corpus créé. Sur la base d’un terme pris comme exemple, le terme ville, on lancera la recherche dans le CTT. Ensuite, on analysera les collocations les plus usitées contenant le mot ville.En guise de conclusion de notre mémoire, nous présenterons une annexe consacrée à notre glossaire trilingue comme résultat de notre exploration de la chaîne terminologique qu’on aura analysée précédemment. Pour conclure, l’objectif général de notre étude sera d’explorer la chaîne de gestion terminologique à travers la création d’un glossaire trilingue dans le domaine du tourisme. Notre orientation méthodologique de caractère sémasiologique impliquera ainsi au moins quatre objectifs spécifiques :• créer un corpus trilingue du tourisme (CTT), capable d’attester des usages en contexte des termes.• extraire des termes en utilisant des techniques diverses, telle que l’étude fréquentielle des éléments du corpus.• vérifier les données obtenues et les compléter à l’aide de ressources externes.• répertorier et décrire l’ensemble des termes sous forme d’un glossaire trilingue à sujet touristique (GTT)
Our study concerns the language of tourism from a lexicographical perspective.Exploiting the web we realized a corpus ad hoc. This corpus is composed by about 10.000 texts in three languages (French, Italian and English), aligned using “Alinea”.Starting from terminological extraction, we analysed some collocations at the aim to create a trilingual and tri-directional glossary.We chose this subject according to the increasing importance taken from tourism economy in the world.Our study fields are thematic terminology, corpus linguistics and automatic language treatment.The first chapter presents the study field of our research. First of all, we introduced to corpus linguistics presenting the different categories of corpus and pointing out our attention on two main notions: representativeness and context.Therefore, we explained the link between Language for Special Purposes and tourism discourse as a Specialized Discourse.In the second chapter, we showed the trilingual thematic corpus we created during our researches. We described the main steps to create a corpus: collection of texts, cleaning and annotation.In this chapter, we gave a particular attention to the presentation of “Alinea”.Finally, the third chapter is a study of frequent collocations with the term “town” (ville).The annexes present the glossary as well as the methodological principals we followed in the redaction

APA, Harvard, Vancouver, ISO, and other styles

Ribeiro, Renato Railo. "Atos de fala em dicionários híbridos italiano>português-brasileiro: sugestão para dicionarização de ilocuções via corpora." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/8/8148/tde-29012016-130432/.

Full text

Abstract:

O objetivo do estudo foi sugerir a inserção, em dicionários híbridos italiano>portuguêsbrasileiro tais como o Parola Chiave (2012), de informações acerca da dimensão pragmático-ilocucionária de ambas as línguas, a partir do uso de corpora eletrônicos. A metodologia empregada foi a seguinte: (a) fundamentação teórica, baseada: no conceito de Lexicografia de Krieger (2008) e no de dicionário híbrido de Höfling, Silva e Tosqui (2004); no de Pragmática Linguística e no de atos de fala de Bianchi (2008) e Sbisà (2009); no de Linguística de Corpus de Berber-Sardinha (2000), (2003) e (2004) e Tagnin (2004) e (2013); (b) estabelecimento de critérios para a investigação de ilocuções em corpora italiano e português-brasileiro, a saber: adoção da teoria dos atos de fala proposta por Austin (1990[1975]) (em função de seu critério para reconhecimento de ilocuções, a forma verbal na primeira pessoa do singular do presente do indicativo na voz ativa, e de sua associação ilocuções-verbos); adoção de dois corpora online, Corpus Paisà e Corpus do Português (o primeiro do italiano, o segundo do português-brasileiro, escolhidos em função da extensão de cada um, o que os torna representativos de suas respectivas línguas); adoção de cinco verbos ilocucionários comissivos do italiano, promettere, giurare, assicurare, impegnarsi, garantire (em função da mútua relação sinonímica que possuem entre si, segundo o dicionário Sinonimi e contrari minore (2009)); adoção de cinco verbos ilocucionários comissivos do português-brasileiro, prometer, jurar, assegurar, comprometer-se, garantir (em função de serem semanticamente equivalentes aos respectivos verbos italianos, segundo o Parola Chiave); adoção da forma verbal citada acima como sintaxe de busca para pesquisa em corpora; (c) análise qualitativa e quantitativa: obtiveram-se os números totais de ocorrências de cada ilocução; realizou-se uma análise qualitativa de modo a excluir casos anômalos, cujos critérios de exclusão foram: casos de homonímia, negação, repetição e ininteligibilidade; após obtenção dos números reais, excluíram-se da pesquisa as ilocuções mi impegno e comprometo-me/ me comprometo (em função da baixa frequência); (d) discussão acerca das possibilidades de dicionarização de ilocuções a partir dos resultados de corpora. Como resultado final, a sugestão foi a de inserir: na microestrutura, marcas de uso referentes às classes de ilocução; na micromedioestrutura, remissivas de modo a conduzir o leitor a um texto externo à nomenclatura; na macroestrutura, um texto externo à nomenclatura que contenha: (i) explicações referentes às classes de ilocuções e lista de respectivas espécies convencionalmente recorrentes do italiano dispostas segundo frequência; (ii) espécies de ilocuções equivalentes do português-brasileiro dispostas segundo frequência; (iii) exemplos de uso, retirados dos corpora, de tais verbos ilocucionários desempenhando sua função ilocucionária convencional.
This study aims to suggest a method of inserting, in hybrid Italian>Brazilian-Portuguese dictionaries such as Parola Chiave (2012), information about the pragmaticillocutionary dimension of both languages, through electronic corpora. The methodology used was as follows: (a) theoretical foundation based on: Krieger\'s concept of Lexicography (2008) and Höfling, Silva & Tosqui\'s hybrid dictionary (2004); Bianchi (2008) and Sbisà\'s (2009) Pragmalinguistics and speech acts; Berber- Sardinha (2000, 2003, 2004) and Tagnin\'s (2004, 2013) Corpus Linguistics; (b) establishing criteria for illocution research in corpora, namely: adopting the theory of speech acts proposed by Austin (1990[1995]) as research paradigm (due to its criteria for recognizing illocutions, the verb form in the first-person singular of the present indicative in active voice and its illocution-verbs association); adopting two online corpora, Corpus Paisà and Portuguese Corpus (due to the length of each one, which makes them representative of their languages); adopting five Italian illocutionary verbs of commissive class, promettere, giuare, assicurare, impegnarsi, garantire (due to the mutual synonymy relation they have one to each other, according to Sinonimi e contrari minore (2009)); adopting five illocutionary verbs of Brazilian-Portuguese, as known: prometer, jurar, assegurar, comprometer-se, garantir (for being semantically equivalent to their corresponding verb in Italian, according to Parola Chiave (2009)); adopting the verb form mentioned above as syntax search to corpora research; (c) quantitative and qualitative analysis: the total number of occurrences of each illocution was obtained; a qualitative analysis was conducted in order to exclude anomalous cases, of which exclusion criteria were: cases of homonyms, denial, occurrences repetition and unintelligibility; a quantitative analysis was conducted in order to exclude the illocutions mi impegno and comprometo-me / me comprometo (due to their low frequency); (d) discussion around the possibilities of lexicographical records of illocutions from the results of corpora. As a final result, the suggestion was to insert: in the dictionary microstructure, signs of usage referring to illocution classes; in its micromediumstructure: cross references in order to conduct the reader to a section out of nomenclature; in its macrostructure: a text, external to the nomenclature, containing: (i) explanations related to illocution classes and a list of conventionally recurring species of Italian arranged by frequency; (ii) equivalent illocutions species of Portuguese- Brazilian arranged by frequency; (iii) examples of usage, taken from the corpora, of illocutionary verbs performing their conventional illocutionary function.

APA, Harvard, Vancouver, ISO, and other styles

Trypanagnostopoulou, Sofia. "The Treatment of phraseology in English-Greek dictionaries." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/667104.

Full text

Abstract:

Phraseological units are an important part of every language and due to their distinctive characteristics, they require special attention in their lexicographic description. However, phraseology is often underrepresented in lexicography, especially in bilingual dictionaries. Even though many important theoretical propositions have been suggested by linguists in the field of phraseology and idioms, from the perspective of lexicography, research is rather limited and concentrates mainly on monolingual lexicography. In our attempt to fill this gap, we have examined the treatment of phraseology in bilingual English-Greek dictionaries. Specifically, we proceeded to a corpus-based comparative analysis of the main English-Greek dictionaries (paper and electronic editions), in order to detect the problematic aspects in the description of phraseological units in terms of dictionary macrostructure and microstructure. Our attention focused on various issues, such as phraseological coverage, translation equivalents, grammatical and syntactic information, usage labels and more. In order to extract information about the use of phraseological units and to retrieve potential translation equivalents, we built a parallel English – Greek corpus, consisted of texts collected from TED talks. While parallel corpora have been widely used in several fields of linguistics, they have not been extensively exploited as a tool in bilingual lexicography. The results of the dictionary assessment have shown that even if the general quality of the examined dictionaries is rather high, they present various problems and omissions, such as poor phraseological inclusion, insufficient grammatical/syntactic or stylistic information, inadequate translation equivalents and so on. Based on the information retrieved from our parallel corpus we compiled for our study, we have proposed solutions for their improvement, which could be applied both in this language combination and in bilingual dictionaries in general. Our attempt aims to make a lexicographical proposal on how bilingual dictionaries would improve the representation of phraseology. This model could be used in the compilation of bilingual dictionaries of general use, as well as dictionaries of phraseology.
Les unitats fraseològiques i les expressions idiomàtiques en especial constitueixen una part important de totes les llengües. Requereixen una atenció especial per part de la lexicografia ateses les seves característiques i atès que el seu significat no és composicional. Això no obstant, la representació de la fraseologia als diccionaris, i especialment al diccionaris bilingües, sovint és deficient. Malgrat el fet que s’hagin proposat diverses aproximacions lingüístiques per analitzar la fraseologia, hi ha hagut relativament poca recerca sobre la fraseologia des de la perspectiva de la lexicografia, i la que hi ha se centra principalment en els diccionaris monolingües. En aquesta tesi s’analitza el tractament de la fraseologia i, de manera especial, de les frases fetes, en diccionaris bilingües de la combinatòria lingüística anglès-grec. Es proposa analitzar els principal diccionaris bilingües d’aquesta combinatòria disponibles en format imprès i en format digital i utilitzar les dades d’un corpus paral·lel per tal de detectar els punts més problemàtics amb relació a la macrostructura i microstructura del diccionari. Es tractaran els temes següents: la selecció de fraseologia inclosa, els equivalents, la informació gramatical i sintàctica, i les etiquetes de registre, entre altres. Per tal d’obtenir informació sobre l’ús real de les unitats fraseològiques i per tal d’identificar els equivalents potencials, es crea un corpus paral·lel anglès-grec basat en un grup de textos corresponents a ponències de la fundació TED. Tot i que s’han utilitzat els corpus paral·lels en diversos estudis lingüístics, fins ara el seu ús en l’elaboració dels diccionaris bilingües ha estat relativament limitat. Els resultats de l’avaluació dels diccionaris demostren que, malgrat la bona qualitat general dels diccionaris estudiats, hi ha diversos problemes i llacunes en relació amb la fraseologia, com ara un nivell pobre d’inclusió de frases, una representació insuficient d’informació gramatical, sintàctica i estilística, i una identificació d’equivalents no satisfactoris, entre altres. Basant-nos en la informació extreta del corpus paral·lel desenvolupat per a aquesta tesi, hem proposat solucions per millorar els diccionaris d’aquesta combinatòria en especial i, més generalment, que es podrien implementar en la confecció de diccionaris bilingües. El nostre objectiu és fer una proposta factible en què els diccionaris bilingües incloguessin una informació fraseològica més acurada, que es podria adoptar tant pels diccionaris bilingües generals com pels diccionaris de fraseologia.
Las unidades fraseológicas y las expresiones idiomáticas en especial constituyen una parte importante de todas las lenguas. Requieren una atención especial por parte de la Lexicografía debido a sus características y a la falta de composicionalidad de su significado. No obstante, la representación de la fraseología en los diccionarios y especialmente en los diccionarios bilingües suele ser deficiente. A pesar de que se hayan propuesto varias aproximaciones al análisis de la fraseología desde la Lingüística, la investigación en fraseología desde la perspectiva de la Lexicografía es relativamente pobre y se centra principalmente en los diccionarios monolingües. En nuestra investigación, analizamos el tratamiento de la fraseología y, muy particularmente, de las frases hechas, en diccionarios bilingües de la combinatoria lingüística inglés-griego. Proponemos analizar los principales diccionarios bilingües de esta combinatoria y utilizar datos de un corpus paralelo para detectar los puntos más problemáticos en relación con la descripción de la fraseología y en relación con la macrostructura y la microstructura del diccionario. Se tratan los siguientes temas: la selección de fraseología incluida, los equivalentes, la información gramatical i sintáctica, i las etiquetas de registre, entre otros. Para obtener información sobre el uso real de unidades fraseológicas y para identificar equivalentes potenciales, se crea un corpus paralelo inglés-griego basado en textos correspondientes a ponencias de la fundación TED. Aunque se ha utilizado los corpus paralelos en varios estudios lingüísticos, hasta la fecha su uso en la elaboración de diccionarios bilingües ha sido relativamente limitado. Los resultados de la evaluación de los diccionarios demuestran que, a pesar de la buena calidad general de los diccionarios estudiados, hay varios problemas y carencias en relación con la fraseología, como, por ejemplo, el nivel pobre de inclusión de frases, una representación insuficiente de información gramatical, sintáctica y estilística, y una identificación de equivalentes no satisfactorios, entre otros. Basándonos en la información extraída del corpus paralelo desarrollado para esta tesis, se han propuesto soluciones para mejorar los diccionarios bilingües existentes en esta combinatoria específicamente y, más generalmente, que se podrían aplicar a la confección de diccionarios bilingües. Nuestro objetivo es desarrollar una propuesta factible en la que los diccionarios bilingües incluyeran una información fraseológica más esmerada, que se podría adoptar tanto para los diccionarios bilingües generales como para los diccionarios de fraseología.

APA, Harvard, Vancouver, ISO, and other styles

Abdulhay, Authoul. "Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00836764.

Full text

Abstract:

Cette thèse vise à la mise en œuvre et à l'évaluation de techniques d'extraction de relations sémantiques à partir d'un corpus multilingue aligné. Ces relations seront extraites par transitivité de l'équivalence traductionnelle, deux lexèmes possédant les mêmes équivalents dans une langue cible étant susceptibles de partager un même sens. D'abord, nos observations porteront sur la comparaison sémantique d'équivalents traductionnels dans des corpus multilingues alignés. A partir des équivalences, nous tâcherons d'extraire des "cliques", ou sous-graphes maximaux complets connexes, dont toutes les unités sont en interrelation, du fait d'une probable intersection sémantique. Ces cliques présentent l'intérêt de renseigner à la fois sur la synonymie et la polysémie des unités, et d'apporter une forme de désambiguïsation sémantique. Elles seront créées à partir de l'extraction automatique de correspondances lexicales, basée sur l'observation des occurrences et cooccurrences en corpus. Le recours à des techniques de lemmatisation sera envisagé. Ensuite nous tâcherons de relier ces cliques avec un lexique sémantique (de type Wordnet) afin d'évaluer la possibilité de récupérer pour les unités arabes des relations sémantiques définies pour des unités en anglais ou en français. Ces relations permettraient de construire automatiquement un réseau utile pour certaines applications de traitement de la langue arabe, comme les moteurs de question-réponse, la traduction automatique, les systèmes d'alignement, la recherche d'information, etc.

APA, Harvard, Vancouver, ISO, and other styles

GORLA, CHIARA. "Influssi e riflessi della lingue indiane sul british english: analisi dei prestiti e della produttività lessicale in prospettiva diacronica e sincronica." Doctoral thesis, Università Cattolica del Sacro Cuore, 2008. http://hdl.handle.net/10280/274.

Full text

Abstract:

La tesi si concentra sugli influssi lessicali che le lingue indiane hanno esercitato sulla lingua inglese sia in prospettiva diacronica sia sincronica. La prima parte dell'elaborato indaga, tramite l'impiego di uno strumento lessicografico, l'Oxford English Dictionary edizione on-line, la presenza in inglese di prestiti veri e propri, ma anche di derivati e composti, sorti in seguito al contatto tra l'inglese le lingue indiane a partire dal Sedicesimo secolo e fino ai nostri giorni, arrivando a individuare 1791 forme lessicali. La seconda parte intende verificare l'effettiva presenza, la frequenza d'uso e il significato di tali prestiti, composti e derivati nel British English contemporaneo, avvalendosi degli strumenti offerti dalla linguistica dei corpora. Il corpus di riferimento impiegato in questa seconda fase della ricerca è Bank of English. L'elaborato, oltre a delineare lo scenario storico culturale di riferimento, mette in evidenza le procedure metodologiche impiegate, e ricostruisce l'impianto teorico sulle questioni di interferenze tra codici linguistici, lingue in contatto e prestiti lessicali, riferendosi ai maggiori e più recenti studi in materia.
The research focuses on lexical influences exerted by Indian languages on British English as a result of linguistic contacts between Great Britain and India. Both diachronic and synchronic perspectives are taken into consideration in evaluating the extent of such lexical influences. The first part of the research analyses the presence of words of East Indian origin in English by means of the Oxford English Dictionary, on-line edition, be these words authentic lexical borrowings, or derivatives or compounds arisen as a consequence of such linguistic contacts. The historical period taken into consideration goes from the 16th century till nowadays. The second part of the research aims to verify the actual presence, frequency of usage and meaning of such words in contemporary British English by means of a linguistic corpora tool, namely the Bank of English by Harper Collins. The historical and cultural background of the relationships between Great Britain and India, as well as the theoretical background about linguistic interferences as a whole are also illustrated, with reference to the most authoritative and recent studies.

APA, Harvard, Vancouver, ISO, and other styles

Shoba, Feziwe Martha. "Exploring the use of parallel corpora in the complilation of specialised bilingual dictionaries of technical terms: a case study of English and isiXhosa." Thesis, 2018. http://hdl.handle.net/10500/25478.

Full text

Abstract:

Text in English
Abstracts in English, isiXhosa and Afrikaans
The Constitution of the Republic of South Africa, Act 108 of 1996, mandates the state to take practical and positive measures to elevate the status and the use of indigenous languages. The implementation of this pronouncement resulted in a growing demand for specialised translations in fields like technology, science, commerce, law and finance. The lack of terminology and resources such as specialised bilingual dictionaries in indigenous languages, particularly isiXhosa remains a growing concern that hinders the translation and the intellectualisation of isiXhosa. A growing number of African scholars affirm the importance of specialised dictionaries in the African languages as tools for language and terminology development so that African languages can be used in the areas of science and technology. In the light of the background above, this study explored how parallel corpora can be interrogated using a bilingual concordancer, ParaConc to extract bilingual terminology that can be used to create specialised bilingual dictionaries. A corpus-based approach was selected due to its speed, efficiency and accuracy in extracting bilingual terms in their immediate contexts. In enhancing the research outcomes, Descriptive Translations Studies (DTS) and Corpus-based translation studies (CTS) were used in a complementary manner. Because the study is interdisciplinary, the function theories of lexicography that emphasise the function and needs of users were also applied. The analysis and extraction of bilingual terminology for dictionary making was successful through the use of the following ParaConc features, namely frequencies, hot word lists, hot words, search facility and concordances (Key Word in Context), among others. The findings revealed that English-isiXhosa Parallel Corpus is a repository of translation equivalents and other information categories that can make specialised dictionaries more user-friendly and multifunctional. The frequency lists were revealed as an effective method of selecting headwords for inclusion in a dictionary. The results also unraveled the complex functions of bilingual concordances where information on collocations and multiword units, sense distinction and usage examples could be easily identifiable proving that this approach is more efficient than the traditional method. The study contributes to the knowledge on corpus-based lexicography, standardisation of finance terminology resource development and making of user-friendly dictionaries that are tailor-made for different needs of users.
Umgaqo-siseko weli loMzantsi Afrika ukhululele uRhulumente ukuba athabathe amanyathelo abonakalayo ekuphuhliseni nasekuphuculeni iilwimi zesiNtu. Esi sindululo sibangele ukwanda kokuguqulelwa kwamaxwebhu angezobuchwepheshe, inzululwazi, umthetho, ezemali noqoqosho angesiNgesi eguqulelwa kwiilwimi ebezifudula zingasiwe-so ezinjengesiXhosa. Ukunqongophala kwesigama kunye nezichazi-magama kube yingxaki enkulu ekuguquleleni ngakumbi izichazi-magama ezilwimi-mbini eziqulethe isigama esikhethekileyo. Iingcali ezininzi ziyangqinelana ukuba olu hlobo lwezi zichazi-magama luyimfuneko kuba ludlala iindima enkulu ekuphuhlisweni kweelwimi zesiNtu, ekuyileni isigama, nasekusetyenzisweni kwazo kumabakala obunzululwazi nobuchwepheshe. Olu phando ke luvavanya ukusetyenziswa kwekhophasi equlethe amaxwebhu esiNgesi neenguqulelo zawo zesiXhosa njengovimba wokudimbaza isigama sezemali esinokunceda ekuqulunqweni kwesichazi-magama esilwimi-mbini. Isizathu esibangele ukukhetha le ndlela yophando esebenzisa ikhompyutha kukuba iyakhawuleza, ulwazi oluthathwe kwikhophasi luchanekile, yaye isigama kwikhophasi singqamana ngqo nomxholo wamaxwebhu nto leyo eyenza kube lula ukufumana iintsingiselo nemizekelo ephilayo. Ukutyebisa olu phando indlela yekhophasi iye yaxhaswa zezinye iindlela zophando ezityunjiweyo: ufundo lwenguguqulelo oluchazayo (DTS) kunye neendlela zokuguqulela ezijoliswe kumsebenzi nakuhlobo lwabasebenzisi zinguqulelo ezo. Kanti ke ziqwalaselwe neenkqubo zophando lobhalo-zichazi-magama eziinjongo zokuqulunqa izichazi-magama ezesebenzisekayo neziluncedo kuninzi lwabasebenzisi zichazi-magama ngakumbi kwisizwe esisebenzisa iilwimi ezininzi. Ukuhlalutya nokudimbaza isigama kwikhophasi kolu phando kusetyenziswe isixhobo sekhompyutha esilungiselelwe ikhophasi enelwiimi ezimbini nangaphezulu ebizwa ngokuba yiParaConc. Iziphumo zolu phando zibonise mhlophe ukuba ikhophasi eneenguqulelo nguvimba weendidi ngendidi zamagama nolwazi olunokuphucula izichazi-magama zeli xesha. Kaloku abaguquleli basebenzise amaqhinga ngamaqhinga ukunika iinguqulelo bekhokelwa yimigomo nemithetho yoguqulelo enxuse abasebenzisi bamaxwebhu aguqulelweyo. Ubuchule beParaConc bokukwazi ukuhlela amagama ngokwendlela afumaneka ngayo kunye neenkcukacha zamanani budandalazise indlela eyiyo yokukhetha imichazwa enokungena kwisichazi-magama. Iziphumo zikwabonakalise iintlaninge yolwazi olufumaneka kwiKWIC, lwazi olo olungelula ukulufumana xa usebenzisa undlela-ndala wokwakha isichazi-magama. Esi sifundo esihlanganyele uGuqulelo olusekelwe kwiKhophasi noQulunqo-zichazi-magama zobuchwepheshe luya kuba negalelo elingathethekiyo kwindlela yokwakha izichazi-magama kwilwiimi zeSintu ngokubanzi nancakasana kwisiXhosa, nto leyo eya kothula umthwalo kubaqulunqi-zichazi-magama. Ukwakha nokuqulunqa izichazi-magama ezilwimi-mbini zezemali kuya kwandisa imithombo yesigama esinqongopheleyo kananjalo sivelise izichazi-magama eziluncedo kwisininzi sabantu.
Die Grondwet van die Republiek van Suid-Afrika, Wet 108 van 1996, gee aan die staat die mandaat om praktiese en positiewe maatreëls te tref om die status en gebruik van inheemse tale te verhoog. Die implementering van hierdie uitspraak het gelei tot ’n toenemende vraag na gespesialiseerde vertalings in domeine soos tegnologie, wetenskap, handel, regte en finansies. Die gebrek aan terminologie en hulpbronne soos gespesialiseerde woordeboeke in inheemse tale, veral Xhosa, wek toenemende kommer wat die vertaling en die intellektualisering van Xhosa belemmer. ’n Toenemende aantal vakkundiges in Afrika beklemtoon die belangrikheid van gespesialiseerde woordeboeke in die Afrikatale as instrumente vir taal- en terminologie-ontwikkeling sodat Afrikatale gebruik kan word in die areas van wetenskap en tegnologie. In die lig van die voorafgaande agtergrond het hierdie studie ondersoek ingestel na hoe parallelle korpora deursoek kan word deur ’n tweetalige konkordanser (ParaConc) te gebruik om tweetalige terminologie te ontgin wat gebruik kan word in die onwikkeling van tweetalige gespesialiseerde woordeboeke. ’n Korpusgebaseerde benadering is gekies vir die spoed, doeltreffendheid en akkuraatheid waarmee dit tweetalige terme uit hulle onmiddellike kontekste kan onttrek. Beskrywende Vertaalstudies (DTS) en Korpusgebaseerde Vertaalstudies (CTS) is op ’n aanvullende wyse gebruik om die navorsingsuitkomste te verbeter. Aangesien die studie interdissiplinêr is, is die funksieteorieë van leksikografie wat die funksie en behoeftes van gebruikers beklemtoon, ook toegepas. Die analise en ontginning van tweetalige terminologie om woordeboeke te ontwikkel was suksesvol deur, onder andere, gebruik te maak van die volgende ParaConc-eienskappe, naamlik, frekwensies, hotword-lyste, hot words, die soekfunksie en konkordansies (Sleutelwoord-in-Konteks). Die bevindings toon dat ’n Engels-Xhosa Parallelle Korpus ’n bron van vertaalekwivalente en ander inligtingskategorieë is wat gespesialiseerde woordeboeke meer gebruikersvriendelik en multifunksioneel kan maak. Die frekwensielyste is geïdentifiseer as ’n doeltreffende metode om hoofwoorde te selekteer wat opgeneem kan word in ’n woordeboek. Die bevindings het ook die komplekse funksies van tweetalige konkordansers ontknoop waar inligting oor kollokasies en veelvuldigewoord-eenhede, betekenisonderskeiding en gebruiksvoorbeelde maklik identifiseer kon word wat aandui dat hierdie metode viii doeltreffender is as die tradisionele metode. Die studie dra by tot die kennisveld van korpusgebaseerde leksikografie, standaardisering van finansiële terminologie, hulpbronontwikkeling en die ontwikkeling van gebruikersvriendelike woordeboeke wat doelgemaak is vir verskillende behoeftes van gebruikers.
Linguistics and Modern Languages
D. Litt. et Phil. (Linguistics (Translation Studies))

APA, Harvard, Vancouver, ISO, and other styles

Sikora, Marek. "Blízká synonyma v kontrastním pohledu z hlediska korpusové lingvistiky." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-389227.

Full text

Abstract:

This diploma thesis occupies itself with the subject of near synonymy, concretely with adjectives. On the basis of corpus linguistic methods two pairs of near synonyms have been researched - verschieden/unterschiedlich and bedeutend/bedeutsam. The 15 primary collocators (according to the syntactic position of each adjective) have been examined using the InterCorp parallel corpus methods in order to find out the most frequent Czech equivalence. Keywords: lexical-semantic relations, near synonymy, lexicography, corpora, cooccurrence analysis, Self Organizing Maps, CCDB

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Lexicographic corpora"

Billero, Riccardo, Annick Farina, and María Carlota Nicolás Martínez, eds. I Corpora LBC. Florence: Firenze University Press, 2020. http://dx.doi.org/10.36253/978-88-5518-253-9.

Full text

Abstract:

Nowadays, lexicographical studies require an interaction with Digital Humanities. This volume presents the genesis and structure of the LBC database, a digital work support tool developed by the Multilanguage Cultural Heritage Lexicon Research Unit under the aegis of the University of Florence, which allows to carry out text research in six different digital corpora (French, English, Italian, Russian, Spanish, German). The authors illustrate the specificities of each corpus in terms of the chosen sources and propose lexicographical and translational uses.

APA, Harvard, Vancouver, ISO, and other styles

Teubert, Wolfgang, ed. Text Corpora and Multilingual Lexicography. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Marzá, Nuria Edo. The specialised lexicographical approach: A step further in dictionary-making. Bern: Peter Lang, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

The specialised lexicographical approach: A step further in dictionary-making. Bern: Peter Lang, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Marzá, Nuria Edo. The specialised lexicographical approach: A step further in dictionary-making. Bern: Peter Lang, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Ledinek, Nina. Terminologija in sodobna terminografija. Ljubljana: Založba ZRC, ZRC SAZU, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Korpora, Web und Datenbanken: Computergestützte Methoden in der modernen Phraseologie und Lexikographie = Corpora, web and databases : computer-based methods in modern phraseology and lexicography. Baltmannsweiler: Schneider Verlag Hohengehren, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

A web of new words: A corpus-based study of the conventionalization process of English neologisms. Frankfurt am Main: Peter Lang Edition, 2015.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Wörter im Grenzbereich von Lexikon und Grammatik im Serbokroatischen. München, Germany: Lincom Europa, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Kordić, Snježana. Riječi na granici punoznačnosti. Zagreb, Croatia: Hrvatska sveučilišna naklada, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Lexicographic corpora"

Farina, Annick, and Riccardo Billero. "Corpus in “Natural” Language Versus “Translation” Language: LBC Corpora, A Tool for Bilingual Lexicographic Writing." In Studies in Classification, Data Analysis, and Knowledge Organization, 167–78. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52680-1_14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sinclair, John McH. "4.1 Corpora for lexicography." In A Practical Guide to Lexicography, 167–78. Amsterdam: John Benjamins Publishing Company, 2003. http://dx.doi.org/10.1075/tlrp.6.21sin.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Teubert, Wolfgang. "Corpus linguistics and lexicography*." In Text Corpora and Multilingual Lexicography, 109–33. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.11teu.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mihailov, Mihail, and Hannu Tommola. "Compiling parallel text corpora." In Text Corpora and Multilingual Lexicography, 59–67. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.07mih.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cmejrek, Martin, and Jan Curín. "Automatic extraction of terminological translation lexicon from Czech-English parallel texts." In Text Corpora and Multilingual Lexicography, 1–10. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.02cme.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rossini Favretti, Rema, F. Tamburini, and E. Martelli. "Words from Bononia Legal Corpus." In Text Corpora and Multilingual Lexicography, 11–30. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.03ros.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Feng, Zhiwei. "Hybrid approaches for automatic segmentation and annotation of a Chinese text corpus." In Text Corpora and Multilingual Lexicography, 31–37. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.04fen.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jakopin, Primoz. "Distance between languages as measured by the minimal-entropy model." In Text Corpora and Multilingual Lexicography, 39–47. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.05jak.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Marcinkevičienė, Rūta. "The importance of the syntagmatic dimension in the multilingual lexical database." In Text Corpora and Multilingual Lexicography, 49–58. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.06mar.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sinclair, John McH. "Data-derived multilingual lexicons." In Text Corpora and Multilingual Lexicography, 69–81. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.08sin.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Lexicographic corpora"

Shangyi, Wu, and Li Junwei. "A preliminary discussion of the effect computer-based corpora have on modern lexicography." In 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet). IEEE, 2012. http://dx.doi.org/10.1109/cecnet.2012.6201593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ke, J., I. Chiu, J. S. Wallace, and L. H. Shu. "Supporting Biomimetic Design by Embedding Metadata in Natural-Language Corpora." In ASME 2010 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2010. http://dx.doi.org/10.1115/detc2010-29057.

Full text

Abstract:

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge in retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes how inserting metadata such as part-of-speech, word sense and lexicographical data for each word in a natural-language source can help users identify relevant biological stimuli for biomimetic design. Although this research is still exploratory, initial qualitative observations demonstrate successful identification and separation of biological phenomena relevant to either desired functions or desired qualities. In addition, by incorporating the aforementioned metadata, we can automatically remove search results where search keywords act on abstract nouns or where keywords are used in irrelevant senses. The benefits of embedding metadata are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our method can be used to hide 64% of the search results that are unlikely to contain useful biological phenomena, reducing the effort to systematically identify relevant biological analogies.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Lexicographic corpora'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Lexicographic corpora"

Dissertations / Theses on the topic "Lexicographic corpora"

Books on the topic "Lexicographic corpora"

Book chapters on the topic "Lexicographic corpora"

Conference papers on the topic "Lexicographic corpora"