Academic literature on the topic 'Lexicographic corpora'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Lexicographic corpora.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Lexicographic corpora"
Ceberio, Klara, and Antton Gurrutxaga. "State-of-the-art on monolingual lexicography for Basque (Basque)." Slovenščina 2.0: empirical, applied and interdisciplinary research 7, no. 1 (April 18, 2019): 53–64. http://dx.doi.org/10.4312/slo2.0.2019.1.53-64.
Full textDutsova, Ralitsa. "Web-based Digital Lexicographic Bilingual Resources." Cognitive Studies | Études cognitives, no. 15 (December 31, 2015): 369–77. http://dx.doi.org/10.11649/cs.2015.025.
Full textKochová, Pavla. "Frequency in Corpora as a Signal of Lexicalization (On the Absolute Usage of Comparative and Superlative Adjectives)." Journal of Linguistics/Jazykovedný casopis 70, no. 2 (December 1, 2019): 148–57. http://dx.doi.org/10.2478/jazcas-2019-0046.
Full textDobrovoljc, Kaja. "Identifying dictionary-relevant formulaic sequences in written and spoken corpora." International Journal of Lexicography 33, no. 4 (April 13, 2020): 417–42. http://dx.doi.org/10.1093/ijl/ecaa008.
Full textPetrak, Marta. "Development of a Productive Derivational Pattern on the Basis of Loan Translation?" Linguistica 60, no. 1 (December 4, 2020): 31–60. http://dx.doi.org/10.4312/linguistica.60.1.31-60.
Full textGizatova, Guzel. "A Corpus-Based Approach to Lexicography: A New English-Russian Phraseological Dictionary." International Journal of English Linguistics 8, no. 3 (February 28, 2018): 357. http://dx.doi.org/10.5539/ijel.v8n3p357.
Full textZemicheva, Svetlana S. "From “Abarmo” to “Yashchichishko”: Creating the Lexicographic Component of the Tomsk Dialect Corpus." Voprosy leksikografii, no. 18 (2020): 98–116. http://dx.doi.org/10.17223/22274200/18/5.
Full textGeyken, Alexander. "Matching Corpus Translations with Dictionary Senses." International Journal of Corpus Linguistics 2, no. 1 (January 1, 1997): 1–22. http://dx.doi.org/10.1075/ijcl.2.1.03gey.
Full textDalpanagioti, Thomai. "Frame-semantic issues in building a bilingual lexicographic resource." Constructions and Frames 5, no. 1 (August 5, 2013): 1–34. http://dx.doi.org/10.1075/cf.5.1.01dal.
Full textGarabík, Radovan. "Word Embedding Based on Large-Scale Web Corpora as a Powerful Lexicographic Tool." Rasprave Instituta za hrvatski jezik i jezikoslovlje 46, no. 2 (October 30, 2020): 603–18. http://dx.doi.org/10.31724/rihjj.46.2.8.
Full textDissertations / Theses on the topic "Lexicographic corpora"
Soami, Leandre Serge. "Towards the development and application of representative lexicographic corpora for the Gabonese languages." Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/4217.
Full textENGLISH ABSTRACT: The compilation of dictionaries is a laborious activity and it takes time, money and staff to achieve the objectives of any dictionary project. Many dictionaries have been compiled using the lexicographers’ personal intuition and guessing rather than being corpus based. That resulted in some dictionaries often being criticised by users because of the lack of representation of some important lexical items. This can probably be explained by the fact that most of these dictionaries were compiled in an era when theoretical lexicography was lacking or not well established. The last decades have witnessed the emergence of metalexicography as a theory directed also at dictionary planning in order to enhance the quality of lexicographic practice and the way in which the management and the compilation of dictionaries are dealt with. The planning of dictionaries takes into account not only the gathering of language material to be used but also the way in which this material will be treated and presented on both the macrostructural and the microstructural level as well as in the front matter texts and the back matter texts. In order to enhance the quality of the presentation in dictionaries, this dissertation pleads in favour of the formulation of a data collection policy that takes into consideration all the different sources of material, written and spoken, used in the different phases of the compilation of a dictionary. The three phases that form the main focus of this study are the material acquisition phase, the material preparation phase and the material processing phase. The involvement of the speech community in the compilation of a lexicographic corpus ensures the collection of representative and balanced data, and the different needs of that community are central to the dictionary project. The different language materials can be organised into different corpus types. The efficiency of a corpus resides in its capacity to provide different data types that can be included in the comment on semantics and the comment on form of each article in the central list of each dictionary. Some dictionaries lack a good representation of data in both these comments in the different articles. However, languages such as the Gabonese languages are in a privileged situation because they can still avoid the mistakes of other dictionary compilers by investing in corpus-based dictionaries at this early stage. Therefore, the establishment of lexicographic units with multifunctional tasks can play an important role. In a multilingual environment such as Gabon the issue of language status needs to be dealt with carefully because it is realistic to choose a certain number of languages to function as official languages. Different alphabets are presented in this study and realistic choices are made. The way in which the language material is organised will impact on the quality of the macrostructure and microstructure; this is essential because dictionaries are consulted most of the time for the spelling of a given lexical item, for a translation equivalent or for the explanation of the meaning of a lemma sign. The computerisation of a corpus is a focal point and needs to be done in a satisfactory manner that presents a clean and helpful corpus in order to provide the lexicographer with useful statistics, frequency word lists and the different concordance lines that are very important for the wording of definitions and the extraction of example sentences. This is why a corpus is seen as an indispensable tool in the improvement of the macro- and the microstructure of any type of dictionary.
AFRIKAANSE OPSOMMING: Die saamstel van woordeboeke is ’n moeisame aktiwiteit, en dit verg tyd, geld en personeel om die doelstellings van ’n woordeboekprojek te bereik. Talle woordeboeke is op grond van die navorsers se persoonlike intuïsie en raaiwerk saamgestel, in stede daarvan dat dit korpusgebaseerd is. Die gevolg is dat baie woordeboeke dikwels deur gebruikers gekritiseer word weens die gebrek aan verteenwoordiging van enkele belangrike leksikale items. Dít kan moontlik verklaar word deur die feit dat die meeste van hierdie woordeboeke saamgestel is in ’n era waartydens teoretiese leksikografie gebrekkig en nie goed gevestig was nie. In die afgelope dekades het metaleksikografie na vore getree as a teorie wat op woordeboekbeplanning gerig is ten einde die gehalte van die leksikografie-praktyk en die manier waarop die bestuur en samestelling van woordeboeke hanteer word, te verbeter. By die beplanning van woordeboeke word nie net die versameling taalmateriaal wat gebruik kan word in berekening gebring nie, maar ook die manier waarop hierdie materiaal op sowel makro- as mikrostrukturele vlakke, asook in die voorwerk en die agterwerk, hanteer en aangebied gaan word. Ten einde die gehalte van die aanbieding in woordeboeke te verbeter, lewer hierdie proefskrif ’n pleidooi vir die formulering van ’n dataversamelingsbeleid wat al die verskillende materiaalbronne, hetsy skriftelik of mondelings, wat in die verskillende stadia van die samestelling van ’n woordeboek gebruik word, in ag neem. Die drie stadia wat die hooffokus van hierdie studie is, is die stadia waarin die materiaal aangeskaf, voorberei en verwerk word. Die spraakgemeenskap se betrokkenheid by die saamstel van ’n leksikografiese korpus verseker die versameling van verteenwoordigende en gebalanseerde data, en die verskillende behoeftes van sodanige gemeenskap is die kern van die woordeboekprojek. Die verskillende taalmateriale kan in verskillende korpussoorte georden word. Die doeltreffendheid van ’n korpus berus op die vermoë daarvan om verskillende datasoorte te verskaf wat in die kommentaar op semantiek en die kommentaar op vorm van elke item in die sentrale lys van elke woordeboek ingesluit kan word. Sommige woordeboeke toon ’n gebrek aan goeie verteenwoordiging van data in albei hierdie soorte kommentaar in die verskillende items. Tale soos die Gaboenese tale is egter in ’n bevoorregte posisie, aangesien hulle nog die foute van ander woordeboeksamestellers kan vermy deur op hierdie vroeë stadium in korpusgebaseerde woordeboeke te belê. Die stigting van leksikografiese eenhede met multifunksionele take kan dus ’n belangrike rol speel. In ’n veeltalige omgewing soos Gaboen moet die kwessie van taalstatus versigtig hanteer word, aangesien dit realisties is om ’n sekere hoeveelheid tale as amptelike tale te kies. Verskillende alfabette word in hierdie studie aangebied en realistiese keuses word gemaak. Die manier waarop die taalmateriaal georden is, sal ’n uitwerking op die makro- en mikrostruktuur hê; dit is van belang omdat woordeboeke meestal vir die spelling van ’n gegewe leksikale item, vir ’n vertaalekwivalent of vir die verklaring van die betekenis van ’n lemmateken geraadpleeg word. Die rekenarisering van ’n korpus is ’n belangrike aspek en moet op ’n bevredigende wyse uitgevoer word wat ’n skoon en nuttige korpus lewer ten einde die leksikograaf van goeie statistieke, frekwensiewoordlyste en die verskillende konkordansielyne te voorsien, wat baie belangrik is vir die skryf van definisies en die onttrekking van voorbeeldsinne. Om hierdie rede word ’n korpus as ’n onmisbare instrument in die verbetering van die makro- en mikrostruktuur van enige soort woordeboek beskou.
Tiedemann, Jörg. "Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing." Doctoral thesis, Uppsala University, Department of Linguistics, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3791.
Full textThe focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.
Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.
Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.
A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).
Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.
Piccato, Mariangela. "Création et exploitation d'un corpus trilingue du tourisme (italien/français/anglais) en vue de la réalisation d'une base de données lexicale informatisée." Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO20051.
Full textOur study concerns the language of tourism from a lexicographical perspective.Exploiting the web we realized a corpus ad hoc. This corpus is composed by about 10.000 texts in three languages (French, Italian and English), aligned using “Alinea”.Starting from terminological extraction, we analysed some collocations at the aim to create a trilingual and tri-directional glossary.We chose this subject according to the increasing importance taken from tourism economy in the world.Our study fields are thematic terminology, corpus linguistics and automatic language treatment.The first chapter presents the study field of our research. First of all, we introduced to corpus linguistics presenting the different categories of corpus and pointing out our attention on two main notions: representativeness and context.Therefore, we explained the link between Language for Special Purposes and tourism discourse as a Specialized Discourse.In the second chapter, we showed the trilingual thematic corpus we created during our researches. We described the main steps to create a corpus: collection of texts, cleaning and annotation.In this chapter, we gave a particular attention to the presentation of “Alinea”.Finally, the third chapter is a study of frequent collocations with the term “town” (ville).The annexes present the glossary as well as the methodological principals we followed in the redaction
Ribeiro, Renato Railo. "Atos de fala em dicionários híbridos italiano>português-brasileiro: sugestão para dicionarização de ilocuções via corpora." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/8/8148/tde-29012016-130432/.
Full textThis study aims to suggest a method of inserting, in hybrid Italian>Brazilian-Portuguese dictionaries such as Parola Chiave (2012), information about the pragmaticillocutionary dimension of both languages, through electronic corpora. The methodology used was as follows: (a) theoretical foundation based on: Krieger\'s concept of Lexicography (2008) and Höfling, Silva & Tosqui\'s hybrid dictionary (2004); Bianchi (2008) and Sbisà\'s (2009) Pragmalinguistics and speech acts; Berber- Sardinha (2000, 2003, 2004) and Tagnin\'s (2004, 2013) Corpus Linguistics; (b) establishing criteria for illocution research in corpora, namely: adopting the theory of speech acts proposed by Austin (1990[1995]) as research paradigm (due to its criteria for recognizing illocutions, the verb form in the first-person singular of the present indicative in active voice and its illocution-verbs association); adopting two online corpora, Corpus Paisà and Portuguese Corpus (due to the length of each one, which makes them representative of their languages); adopting five Italian illocutionary verbs of commissive class, promettere, giuare, assicurare, impegnarsi, garantire (due to the mutual synonymy relation they have one to each other, according to Sinonimi e contrari minore (2009)); adopting five illocutionary verbs of Brazilian-Portuguese, as known: prometer, jurar, assegurar, comprometer-se, garantir (for being semantically equivalent to their corresponding verb in Italian, according to Parola Chiave (2009)); adopting the verb form mentioned above as syntax search to corpora research; (c) quantitative and qualitative analysis: the total number of occurrences of each illocution was obtained; a qualitative analysis was conducted in order to exclude anomalous cases, of which exclusion criteria were: cases of homonyms, denial, occurrences repetition and unintelligibility; a quantitative analysis was conducted in order to exclude the illocutions mi impegno and comprometo-me / me comprometo (due to their low frequency); (d) discussion around the possibilities of lexicographical records of illocutions from the results of corpora. As a final result, the suggestion was to insert: in the dictionary microstructure, signs of usage referring to illocution classes; in its micromediumstructure: cross references in order to conduct the reader to a section out of nomenclature; in its macrostructure: a text, external to the nomenclature, containing: (i) explanations related to illocution classes and a list of conventionally recurring species of Italian arranged by frequency; (ii) equivalent illocutions species of Portuguese- Brazilian arranged by frequency; (iii) examples of usage, taken from the corpora, of illocutionary verbs performing their conventional illocutionary function.
Trypanagnostopoulou, Sofia. "The Treatment of phraseology in English-Greek dictionaries." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/667104.
Full textLes unitats fraseològiques i les expressions idiomàtiques en especial constitueixen una part important de totes les llengües. Requereixen una atenció especial per part de la lexicografia ateses les seves característiques i atès que el seu significat no és composicional. Això no obstant, la representació de la fraseologia als diccionaris, i especialment al diccionaris bilingües, sovint és deficient. Malgrat el fet que s’hagin proposat diverses aproximacions lingüístiques per analitzar la fraseologia, hi ha hagut relativament poca recerca sobre la fraseologia des de la perspectiva de la lexicografia, i la que hi ha se centra principalment en els diccionaris monolingües. En aquesta tesi s’analitza el tractament de la fraseologia i, de manera especial, de les frases fetes, en diccionaris bilingües de la combinatòria lingüística anglès-grec. Es proposa analitzar els principal diccionaris bilingües d’aquesta combinatòria disponibles en format imprès i en format digital i utilitzar les dades d’un corpus paral·lel per tal de detectar els punts més problemàtics amb relació a la macrostructura i microstructura del diccionari. Es tractaran els temes següents: la selecció de fraseologia inclosa, els equivalents, la informació gramatical i sintàctica, i les etiquetes de registre, entre altres. Per tal d’obtenir informació sobre l’ús real de les unitats fraseològiques i per tal d’identificar els equivalents potencials, es crea un corpus paral·lel anglès-grec basat en un grup de textos corresponents a ponències de la fundació TED. Tot i que s’han utilitzat els corpus paral·lels en diversos estudis lingüístics, fins ara el seu ús en l’elaboració dels diccionaris bilingües ha estat relativament limitat. Els resultats de l’avaluació dels diccionaris demostren que, malgrat la bona qualitat general dels diccionaris estudiats, hi ha diversos problemes i llacunes en relació amb la fraseologia, com ara un nivell pobre d’inclusió de frases, una representació insuficient d’informació gramatical, sintàctica i estilística, i una identificació d’equivalents no satisfactoris, entre altres. Basant-nos en la informació extreta del corpus paral·lel desenvolupat per a aquesta tesi, hem proposat solucions per millorar els diccionaris d’aquesta combinatòria en especial i, més generalment, que es podrien implementar en la confecció de diccionaris bilingües. El nostre objectiu és fer una proposta factible en què els diccionaris bilingües incloguessin una informació fraseològica més acurada, que es podria adoptar tant pels diccionaris bilingües generals com pels diccionaris de fraseologia.
Las unidades fraseológicas y las expresiones idiomáticas en especial constituyen una parte importante de todas las lenguas. Requieren una atención especial por parte de la Lexicografía debido a sus características y a la falta de composicionalidad de su significado. No obstante, la representación de la fraseología en los diccionarios y especialmente en los diccionarios bilingües suele ser deficiente. A pesar de que se hayan propuesto varias aproximaciones al análisis de la fraseología desde la Lingüística, la investigación en fraseología desde la perspectiva de la Lexicografía es relativamente pobre y se centra principalmente en los diccionarios monolingües. En nuestra investigación, analizamos el tratamiento de la fraseología y, muy particularmente, de las frases hechas, en diccionarios bilingües de la combinatoria lingüística inglés-griego. Proponemos analizar los principales diccionarios bilingües de esta combinatoria y utilizar datos de un corpus paralelo para detectar los puntos más problemáticos en relación con la descripción de la fraseología y en relación con la macrostructura y la microstructura del diccionario. Se tratan los siguientes temas: la selección de fraseología incluida, los equivalentes, la información gramatical i sintáctica, i las etiquetas de registre, entre otros. Para obtener información sobre el uso real de unidades fraseológicas y para identificar equivalentes potenciales, se crea un corpus paralelo inglés-griego basado en textos correspondientes a ponencias de la fundación TED. Aunque se ha utilizado los corpus paralelos en varios estudios lingüísticos, hasta la fecha su uso en la elaboración de diccionarios bilingües ha sido relativamente limitado. Los resultados de la evaluación de los diccionarios demuestran que, a pesar de la buena calidad general de los diccionarios estudiados, hay varios problemas y carencias en relación con la fraseología, como, por ejemplo, el nivel pobre de inclusión de frases, una representación insuficiente de información gramatical, sintáctica y estilística, y una identificación de equivalentes no satisfactorios, entre otros. Basándonos en la información extraída del corpus paralelo desarrollado para esta tesis, se han propuesto soluciones para mejorar los diccionarios bilingües existentes en esta combinatoria específicamente y, más generalmente, que se podrían aplicar a la confección de diccionarios bilingües. Nuestro objetivo es desarrollar una propuesta factible en la que los diccionarios bilingües incluyeran una información fraseológica más esmerada, que se podría adoptar tanto para los diccionarios bilingües generales como para los diccionarios de fraseología.
Abdulhay, Authoul. "Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00836764.
Full textGORLA, CHIARA. "Influssi e riflessi della lingue indiane sul british english: analisi dei prestiti e della produttività lessicale in prospettiva diacronica e sincronica." Doctoral thesis, Università Cattolica del Sacro Cuore, 2008. http://hdl.handle.net/10280/274.
Full textThe research focuses on lexical influences exerted by Indian languages on British English as a result of linguistic contacts between Great Britain and India. Both diachronic and synchronic perspectives are taken into consideration in evaluating the extent of such lexical influences. The first part of the research analyses the presence of words of East Indian origin in English by means of the Oxford English Dictionary, on-line edition, be these words authentic lexical borrowings, or derivatives or compounds arisen as a consequence of such linguistic contacts. The historical period taken into consideration goes from the 16th century till nowadays. The second part of the research aims to verify the actual presence, frequency of usage and meaning of such words in contemporary British English by means of a linguistic corpora tool, namely the Bank of English by Harper Collins. The historical and cultural background of the relationships between Great Britain and India, as well as the theoretical background about linguistic interferences as a whole are also illustrated, with reference to the most authoritative and recent studies.
Shoba, Feziwe Martha. "Exploring the use of parallel corpora in the complilation of specialised bilingual dictionaries of technical terms: a case study of English and isiXhosa." Thesis, 2018. http://hdl.handle.net/10500/25478.
Full textAbstracts in English, isiXhosa and Afrikaans
The Constitution of the Republic of South Africa, Act 108 of 1996, mandates the state to take practical and positive measures to elevate the status and the use of indigenous languages. The implementation of this pronouncement resulted in a growing demand for specialised translations in fields like technology, science, commerce, law and finance. The lack of terminology and resources such as specialised bilingual dictionaries in indigenous languages, particularly isiXhosa remains a growing concern that hinders the translation and the intellectualisation of isiXhosa. A growing number of African scholars affirm the importance of specialised dictionaries in the African languages as tools for language and terminology development so that African languages can be used in the areas of science and technology. In the light of the background above, this study explored how parallel corpora can be interrogated using a bilingual concordancer, ParaConc to extract bilingual terminology that can be used to create specialised bilingual dictionaries. A corpus-based approach was selected due to its speed, efficiency and accuracy in extracting bilingual terms in their immediate contexts. In enhancing the research outcomes, Descriptive Translations Studies (DTS) and Corpus-based translation studies (CTS) were used in a complementary manner. Because the study is interdisciplinary, the function theories of lexicography that emphasise the function and needs of users were also applied. The analysis and extraction of bilingual terminology for dictionary making was successful through the use of the following ParaConc features, namely frequencies, hot word lists, hot words, search facility and concordances (Key Word in Context), among others. The findings revealed that English-isiXhosa Parallel Corpus is a repository of translation equivalents and other information categories that can make specialised dictionaries more user-friendly and multifunctional. The frequency lists were revealed as an effective method of selecting headwords for inclusion in a dictionary. The results also unraveled the complex functions of bilingual concordances where information on collocations and multiword units, sense distinction and usage examples could be easily identifiable proving that this approach is more efficient than the traditional method. The study contributes to the knowledge on corpus-based lexicography, standardisation of finance terminology resource development and making of user-friendly dictionaries that are tailor-made for different needs of users.
Umgaqo-siseko weli loMzantsi Afrika ukhululele uRhulumente ukuba athabathe amanyathelo abonakalayo ekuphuhliseni nasekuphuculeni iilwimi zesiNtu. Esi sindululo sibangele ukwanda kokuguqulelwa kwamaxwebhu angezobuchwepheshe, inzululwazi, umthetho, ezemali noqoqosho angesiNgesi eguqulelwa kwiilwimi ebezifudula zingasiwe-so ezinjengesiXhosa. Ukunqongophala kwesigama kunye nezichazi-magama kube yingxaki enkulu ekuguquleleni ngakumbi izichazi-magama ezilwimi-mbini eziqulethe isigama esikhethekileyo. Iingcali ezininzi ziyangqinelana ukuba olu hlobo lwezi zichazi-magama luyimfuneko kuba ludlala iindima enkulu ekuphuhlisweni kweelwimi zesiNtu, ekuyileni isigama, nasekusetyenzisweni kwazo kumabakala obunzululwazi nobuchwepheshe. Olu phando ke luvavanya ukusetyenziswa kwekhophasi equlethe amaxwebhu esiNgesi neenguqulelo zawo zesiXhosa njengovimba wokudimbaza isigama sezemali esinokunceda ekuqulunqweni kwesichazi-magama esilwimi-mbini. Isizathu esibangele ukukhetha le ndlela yophando esebenzisa ikhompyutha kukuba iyakhawuleza, ulwazi oluthathwe kwikhophasi luchanekile, yaye isigama kwikhophasi singqamana ngqo nomxholo wamaxwebhu nto leyo eyenza kube lula ukufumana iintsingiselo nemizekelo ephilayo. Ukutyebisa olu phando indlela yekhophasi iye yaxhaswa zezinye iindlela zophando ezityunjiweyo: ufundo lwenguguqulelo oluchazayo (DTS) kunye neendlela zokuguqulela ezijoliswe kumsebenzi nakuhlobo lwabasebenzisi zinguqulelo ezo. Kanti ke ziqwalaselwe neenkqubo zophando lobhalo-zichazi-magama eziinjongo zokuqulunqa izichazi-magama ezesebenzisekayo neziluncedo kuninzi lwabasebenzisi zichazi-magama ngakumbi kwisizwe esisebenzisa iilwimi ezininzi. Ukuhlalutya nokudimbaza isigama kwikhophasi kolu phando kusetyenziswe isixhobo sekhompyutha esilungiselelwe ikhophasi enelwiimi ezimbini nangaphezulu ebizwa ngokuba yiParaConc. Iziphumo zolu phando zibonise mhlophe ukuba ikhophasi eneenguqulelo nguvimba weendidi ngendidi zamagama nolwazi olunokuphucula izichazi-magama zeli xesha. Kaloku abaguquleli basebenzise amaqhinga ngamaqhinga ukunika iinguqulelo bekhokelwa yimigomo nemithetho yoguqulelo enxuse abasebenzisi bamaxwebhu aguqulelweyo. Ubuchule beParaConc bokukwazi ukuhlela amagama ngokwendlela afumaneka ngayo kunye neenkcukacha zamanani budandalazise indlela eyiyo yokukhetha imichazwa enokungena kwisichazi-magama. Iziphumo zikwabonakalise iintlaninge yolwazi olufumaneka kwiKWIC, lwazi olo olungelula ukulufumana xa usebenzisa undlela-ndala wokwakha isichazi-magama. Esi sifundo esihlanganyele uGuqulelo olusekelwe kwiKhophasi noQulunqo-zichazi-magama zobuchwepheshe luya kuba negalelo elingathethekiyo kwindlela yokwakha izichazi-magama kwilwiimi zeSintu ngokubanzi nancakasana kwisiXhosa, nto leyo eya kothula umthwalo kubaqulunqi-zichazi-magama. Ukwakha nokuqulunqa izichazi-magama ezilwimi-mbini zezemali kuya kwandisa imithombo yesigama esinqongopheleyo kananjalo sivelise izichazi-magama eziluncedo kwisininzi sabantu.
Die Grondwet van die Republiek van Suid-Afrika, Wet 108 van 1996, gee aan die staat die mandaat om praktiese en positiewe maatreëls te tref om die status en gebruik van inheemse tale te verhoog. Die implementering van hierdie uitspraak het gelei tot ’n toenemende vraag na gespesialiseerde vertalings in domeine soos tegnologie, wetenskap, handel, regte en finansies. Die gebrek aan terminologie en hulpbronne soos gespesialiseerde woordeboeke in inheemse tale, veral Xhosa, wek toenemende kommer wat die vertaling en die intellektualisering van Xhosa belemmer. ’n Toenemende aantal vakkundiges in Afrika beklemtoon die belangrikheid van gespesialiseerde woordeboeke in die Afrikatale as instrumente vir taal- en terminologie-ontwikkeling sodat Afrikatale gebruik kan word in die areas van wetenskap en tegnologie. In die lig van die voorafgaande agtergrond het hierdie studie ondersoek ingestel na hoe parallelle korpora deursoek kan word deur ’n tweetalige konkordanser (ParaConc) te gebruik om tweetalige terminologie te ontgin wat gebruik kan word in die onwikkeling van tweetalige gespesialiseerde woordeboeke. ’n Korpusgebaseerde benadering is gekies vir die spoed, doeltreffendheid en akkuraatheid waarmee dit tweetalige terme uit hulle onmiddellike kontekste kan onttrek. Beskrywende Vertaalstudies (DTS) en Korpusgebaseerde Vertaalstudies (CTS) is op ’n aanvullende wyse gebruik om die navorsingsuitkomste te verbeter. Aangesien die studie interdissiplinêr is, is die funksieteorieë van leksikografie wat die funksie en behoeftes van gebruikers beklemtoon, ook toegepas. Die analise en ontginning van tweetalige terminologie om woordeboeke te ontwikkel was suksesvol deur, onder andere, gebruik te maak van die volgende ParaConc-eienskappe, naamlik, frekwensies, hotword-lyste, hot words, die soekfunksie en konkordansies (Sleutelwoord-in-Konteks). Die bevindings toon dat ’n Engels-Xhosa Parallelle Korpus ’n bron van vertaalekwivalente en ander inligtingskategorieë is wat gespesialiseerde woordeboeke meer gebruikersvriendelik en multifunksioneel kan maak. Die frekwensielyste is geïdentifiseer as ’n doeltreffende metode om hoofwoorde te selekteer wat opgeneem kan word in ’n woordeboek. Die bevindings het ook die komplekse funksies van tweetalige konkordansers ontknoop waar inligting oor kollokasies en veelvuldigewoord-eenhede, betekenisonderskeiding en gebruiksvoorbeelde maklik identifiseer kon word wat aandui dat hierdie metode viii doeltreffender is as die tradisionele metode. Die studie dra by tot die kennisveld van korpusgebaseerde leksikografie, standaardisering van finansiële terminologie, hulpbronontwikkeling en die ontwikkeling van gebruikersvriendelike woordeboeke wat doelgemaak is vir verskillende behoeftes van gebruikers.
Linguistics and Modern Languages
D. Litt. et Phil. (Linguistics (Translation Studies))
Sikora, Marek. "Blízká synonyma v kontrastním pohledu z hlediska korpusové lingvistiky." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-389227.
Full textBooks on the topic "Lexicographic corpora"
Billero, Riccardo, Annick Farina, and María Carlota Nicolás Martínez, eds. I Corpora LBC. Florence: Firenze University Press, 2020. http://dx.doi.org/10.36253/978-88-5518-253-9.
Full textTeubert, Wolfgang, ed. Text Corpora and Multilingual Lexicography. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.
Full textMarzá, Nuria Edo. The specialised lexicographical approach: A step further in dictionary-making. Bern: Peter Lang, 2009.
Find full textThe specialised lexicographical approach: A step further in dictionary-making. Bern: Peter Lang, 2009.
Find full textMarzá, Nuria Edo. The specialised lexicographical approach: A step further in dictionary-making. Bern: Peter Lang, 2009.
Find full textLedinek, Nina. Terminologija in sodobna terminografija. Ljubljana: Založba ZRC, ZRC SAZU, 2009.
Find full textKorpora, Web und Datenbanken: Computergestützte Methoden in der modernen Phraseologie und Lexikographie = Corpora, web and databases : computer-based methods in modern phraseology and lexicography. Baltmannsweiler: Schneider Verlag Hohengehren, 2010.
Find full textA web of new words: A corpus-based study of the conventionalization process of English neologisms. Frankfurt am Main: Peter Lang Edition, 2015.
Find full textWörter im Grenzbereich von Lexikon und Grammatik im Serbokroatischen. München, Germany: Lincom Europa, 2001.
Find full textKordić, Snježana. Riječi na granici punoznačnosti. Zagreb, Croatia: Hrvatska sveučilišna naklada, 2002.
Find full textBook chapters on the topic "Lexicographic corpora"
Farina, Annick, and Riccardo Billero. "Corpus in “Natural” Language Versus “Translation” Language: LBC Corpora, A Tool for Bilingual Lexicographic Writing." In Studies in Classification, Data Analysis, and Knowledge Organization, 167–78. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52680-1_14.
Full textSinclair, John McH. "4.1 Corpora for lexicography." In A Practical Guide to Lexicography, 167–78. Amsterdam: John Benjamins Publishing Company, 2003. http://dx.doi.org/10.1075/tlrp.6.21sin.
Full textTeubert, Wolfgang. "Corpus linguistics and lexicography*." In Text Corpora and Multilingual Lexicography, 109–33. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.11teu.
Full textMihailov, Mihail, and Hannu Tommola. "Compiling parallel text corpora." In Text Corpora and Multilingual Lexicography, 59–67. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.07mih.
Full textCmejrek, Martin, and Jan Curín. "Automatic extraction of terminological translation lexicon from Czech-English parallel texts." In Text Corpora and Multilingual Lexicography, 1–10. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.02cme.
Full textRossini Favretti, Rema, F. Tamburini, and E. Martelli. "Words from Bononia Legal Corpus." In Text Corpora and Multilingual Lexicography, 11–30. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.03ros.
Full textFeng, Zhiwei. "Hybrid approaches for automatic segmentation and annotation of a Chinese text corpus." In Text Corpora and Multilingual Lexicography, 31–37. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.04fen.
Full textJakopin, Primoz. "Distance between languages as measured by the minimal-entropy model." In Text Corpora and Multilingual Lexicography, 39–47. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.05jak.
Full textMarcinkevičienė, Rūta. "The importance of the syntagmatic dimension in the multilingual lexical database." In Text Corpora and Multilingual Lexicography, 49–58. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.06mar.
Full textSinclair, John McH. "Data-derived multilingual lexicons." In Text Corpora and Multilingual Lexicography, 69–81. Amsterdam: John Benjamins Publishing Company, 2007. http://dx.doi.org/10.1075/bct.8.08sin.
Full textConference papers on the topic "Lexicographic corpora"
Shangyi, Wu, and Li Junwei. "A preliminary discussion of the effect computer-based corpora have on modern lexicography." In 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet). IEEE, 2012. http://dx.doi.org/10.1109/cecnet.2012.6201593.
Full textKe, J., I. Chiu, J. S. Wallace, and L. H. Shu. "Supporting Biomimetic Design by Embedding Metadata in Natural-Language Corpora." In ASME 2010 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2010. http://dx.doi.org/10.1115/detc2010-29057.
Full text