Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Low resource language.

Статті в журналах з теми "Low resource language"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Low resource language".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Pakray, Partha, Alexander Gelbukh, and Sivaji Bandyopadhyay. "Natural language processing applications for low-resource languages." Natural Language Processing 31, no. 2 (2025): 183–97. https://doi.org/10.1017/nlp.2024.33.

Повний текст джерела
Анотація:
AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disproportionately benefited high-resource languages with abundant data for training complex models. Low-resource languages, often spoken by smaller or marginalized communities, need help realizing the full potential of NLP applications. The primary challenges in developing NLP applications for low-resource languages stem from the need for large, well-annotated datasets, standardized tools, and linguistic resources. This
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Lin, Donghui, Yohei Murakami, and Toru Ishida. "Towards Language Service Creation and Customization for Low-Resource Languages." Information 11, no. 2 (2020): 67. http://dx.doi.org/10.3390/info11020067.

Повний текст джерела
Анотація:
The most challenging issue with low-resource languages is the difficulty of obtaining enough language resources. In this paper, we propose a language service framework for low-resource languages that enables the automatic creation and customization of new resources from existing ones. To achieve this goal, we first introduce a service-oriented language infrastructure, the Language Grid; it realizes new language services by supporting the sharing and combining of language resources. We then show the applicability of the Language Grid to low-resource languages. Furthermore, we describe how we ca
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Ranasinghe, Tharindu, and Marcos Zampieri. "Multilingual Offensive Language Identification for Low-resource Languages." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 1 (2022): 1–13. http://dx.doi.org/10.1145/3457610.

Повний текст джерела
Анотація:
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Cassano, Federico, John Gouwar, Francesca Lucchetti, et al. "Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs." Proceedings of the ACM on Programming Languages 8, OOPSLA2 (2024): 677–708. http://dx.doi.org/10.1145/3689735.

Повний текст джерела
Анотація:
Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Abigail Rai. "Part-of-Speech (POS) Tagging of Low-Resource Language (Limbu) with Deep learning." Panamerican Mathematical Journal 35, no. 1s (2024): 149–57. http://dx.doi.org/10.52783/pmj.v35.i1s.2297.

Повний текст джерела
Анотація:
POS tagging is a basic Natural Language Processing (NLP) task that tags the words in an input text according to its grammatical values. Although POS Tagging is a fundamental application for very resourced languages, such as Limbu, is still unknown due to only few tagged datasets and linguistic resources. This research project uses deep learning techniques, transfer learning, and the BiLSTM-CRF model to develop an accurate POS-tagging system for the Limbu language. Using annotated and unannotated language data, we progress in achieving a small yet informative dataset of Limbu text. Skilled mult
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Nitu, Melania, and Mihai Dascalu. "Natural Language Processing Tools for Romanian – Going Beyond a Low-Resource Language." Interaction Design and Architecture(s), no. 60 (March 15, 2024): 7–26. http://dx.doi.org/10.55612/s-5002-060-001sp.

Повний текст джерела
Анотація:
Advances in Natural Language Processing bring innovative instruments to the educational field to improve the quality of the didactic process by addressing challenges like language barriers and creating personalized learning experiences. Most research in the domain is dedicated to high-resource languages, such as English, while languages with limited coverage, like Romanian, are still underrepresented in the field. Operating on low-resource languages is essential to ensure equitable access to educational opportunities and to preserve linguistic diversity. Through continuous investments in devel
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Zhou, Shuyan, Shruti Rijhwani, John Wieting, Jaime Carbonell, and Graham Neubig. "Improving Candidate Generation for Low-resource Cross-lingual Entity Linking." Transactions of the Association for Computational Linguistics 8 (July 2020): 109–24. http://dx.doi.org/10.1162/tacl_a_00303.

Повний текст джерела
Анотація:
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages, but these do not extend well to low-resource languages with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for res
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Vargas, Francielle, Wolfgang Schmeisser-Nieto, Zohar Rabinovich, Thiago A. S. Pardo, and Fabrício Benevenuto. "Discourse annotation guideline for low-resource languages." Natural Language Processing 31, no. 2 (2025): 700–743. https://doi.org/10.1017/nlp.2024.19.

Повний текст джерела
Анотація:
AbstractMost existing discourse annotation guidelines have focused on the English language. As a result, there is a significant lack of research and resources concerning computational discourse-level language understanding and generation for other languages. To fill this relevant gap, we introduce the first discourse annotation guideline using the rhetorical structure theory (RST) for low-resource languages. Specifically, this guideline provides accurate examples of discourse coherence relations in three romance languages: Italian, Portuguese, and Spanish. We further discuss theoretical defini
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Li, Zihao, Yucheng Shi, Zirui Liu, et al. "Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 27 (2025): 28186–94. https://doi.org/10.1609/aaai.v39i27.35038.

Повний текст джерела
Анотація:
The development of Large Language Models (LLMs) relies on extensive text corpora, which are often unevenly distributed across languages. This imbalance results in LLMs performing significantly better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate. Currently, there is a lack of quantitative methods to evaluate the performance of LLMs in these low-resource languages. To address this gap, we propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using inte
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Azragul Yusup, Azragul Yusup, Degang Chen Azragul Yusup, Yifei Ge Degang Chen, Hongliang Mao Yifei Ge, and Nujian Wang Hongliang Mao. "Resource Construction and Ensemble Learning based Sentiment Analysis for the Low-resource Language Uyghur." 網際網路技術學刊 24, no. 4 (2023): 1009–16. http://dx.doi.org/10.53106/160792642023072404018.

Повний текст джерела
Анотація:
<p>To address the problem of scarce low-resource sentiment analysis corpus nowadays, this paper proposes a sentence-level sentiment analysis resource conversion method HTL based on the syntactic-semantic knowledge of the low-resource language Uyghur to convert high-resource corpus to low-resource corpus. In the conversion process, a k-fold cross-filtering method is proposed to reduce the distortion of data samples, which is used to select high-quality samples for conversion; finally, the Uyghur sentiment analysis dataset USD is constructed; the Baseline of this dataset is verified under
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Mati, Diellza Nagavci, Mentor Hamiti, Arsim Susuri, Besnik Selimi, and Jaumin Ajdari. "Building Dictionaries for Low Resource Languages: Challenges of Unsupervised Learning." Annals of Emerging Technologies in Computing 5, no. 3 (2021): 52–58. http://dx.doi.org/10.33166/aetic.2021.03.005.

Повний текст джерела
Анотація:
The development of natural language processing resources for Albanian has grown steadily in recent years. This paper presents research conducted on unsupervised learning-the challenges associated with building a dictionary for the Albanian language and creating part-of-speech tagging models. The majority of languages have their own dictionary, but languages with low resources suffer from a lack of resources. It facilitates the sharing of information and services for users and whole communities through natural language processing. The experimentation corpora for the Albanian language includes 2
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Kashyap, Gaurav. "Multilingual NLP: Techniques for Creating Models that Understand and Generate Multiple Languages with Minimal Resources." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 12 (2024): 1–5. https://doi.org/10.55041/ijsrem7648.

Повний текст джерела
Анотація:
Models that can process human language in a variety of applications have been developed as a result of the quick development of natural language processing (NLP). Scaling NLP technologies to support multiple languages with minimal resources is still a major challenge, even though many models work well in high-resource languages. By developing models that can comprehend and produce text in multiple languages, especially those with little linguistic information, multilingual natural language processing (NLP) seeks to overcome this difficulty. This study examines the methods used in multilingual
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Rijhwani, Shruti, Jiateng Xie, Graham Neubig, and Jaime Carbonell. "Zero-Shot Neural Transfer for Cross-Lingual Entity Linking." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6924–31. http://dx.doi.org/10.1609/aaai.v33i01.33016924.

Повний текст джерела
Анотація:
Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-basedentity
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Qarah, Faisal, and Tawfeeq Alsanoosy. "Evaluation of Arabic Large Language Models on Moroccan Dialect." Engineering, Technology & Applied Science Research 15, no. 3 (2025): 22478–85. https://doi.org/10.48084/etasr.10331.

Повний текст джерела
Анотація:
Large Language Models (LLMs) have shown outstanding performance in many Natural Language Processing (NLP) tasks for high-resource languages, especially English, primarily because most of them were trained on widely available text resources. As a result, many low-resource languages, such as Arabic and African languages and their dialects, are not well studied, raising concerns about whether LLMs can perform fairly across them. Therefore, evaluating the performance of LLMs for low-resource languages and diverse dialects is crucial. This study investigated the performance of LLMs in Moroccan Arab
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Lee, Chanhee, Kisu Yang, Taesun Whang, Chanjun Park, Andrew Matteson, and Heuiseok Lim. "Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models." Applied Sciences 11, no. 5 (2021): 1974. http://dx.doi.org/10.3390/app11051974.

Повний текст джерела
Анотація:
Language model pretraining is an effective method for improving the performance of downstream natural language processing tasks. Even though language modeling is unsupervised and thus collecting data for it is relatively less expensive, it is still a challenging process for languages with limited resources. This results in great technological disparity between high- and low-resource languages for numerous downstream natural language processing tasks. In this paper, we aim to make this technology more accessible by enabling data efficient training of pretrained language models. It is achieved b
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Lee, Jaeseong, Dohyeon Lee, and Seung-won Hwang. "Script, Language, and Labels: Overcoming Three Discrepancies for Low-Resource Language Specialization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13004–13. http://dx.doi.org/10.1609/aaai.v37i11.26528.

Повний текст джерела
Анотація:
Although multilingual pretrained models (mPLMs) enabled support of various natural language processing in diverse languages, its limited coverage of 100+ languages lets 6500+ languages remain ‘unseen’. One common approach for an unseen language is specializing the model for it as target, by performing additional masked language modeling (MLM) with the target language corpus. However, we argue that, due to the discrepancy from multilingual MLM pretraining, a naive specialization as such can be suboptimal. Specifically, we pose three discrepancies to overcome. Script and linguistic discrepancy o
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Mozafari, Marzieh, Khouloud Mnassri, Reza Farahbakhsh, and Noel Crespi. "Offensive language detection in low resource languages: A use case of Persian language." PLOS ONE 19, no. 6 (2024): e0304166. http://dx.doi.org/10.1371/journal.pone.0304166.

Повний текст джерела
Анотація:
THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. Different types of abusive content such as offensive language, hate speech, aggression, etc. have become prevalent in social media and many efforts have been dedicated to automatically detect this phenomenon in different resource-rich languages such as English. This is mainly due to the comparative lack of annotated data related to offensive language in low-resource languages, especially the ones spoken in Asian countries. To reduce the vulnerability among social media users from these regions
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Laskar, Sahinur Rahman, Abdullah Faiz Ur Rahman Khilji, Partha Pakray, and Sivaji Bandyopadhyay. "Improved neural machine translation for low-resource English–Assamese pair." Journal of Intelligent & Fuzzy Systems 42, no. 5 (2022): 4727–38. http://dx.doi.org/10.3233/jifs-219260.

Повний текст джерела
Анотація:
Language translation is essential to bring the world closer and plays a significant part in building a community among people of different linguistic backgrounds. Machine translation dramatically helps in removing the language barrier and allows easier communication among linguistically diverse communities. Due to the unavailability of resources, major languages of the world are accounted as low-resource languages. This leads to a challenging task of automating translation among various such languages to benefit indigenous speakers. This article investigates neural machine translation for the
Стилі APA, Harvard, Vancouver, ISO та ін.
19

A. Baldha, Nirav. "Question Answering for Low Resource Languages Using Natural Language Processing." International Journal of Scientific Research and Engineering Trends 8, no. 2 (2022): 1122–26. http://dx.doi.org/10.61137/ijsret.vol.8.issue2.207.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Shikali, Casper S., and Refuoe Mokhosi. "Enhancing African low-resource languages: Swahili data for language modelling." Data in Brief 31 (August 2020): 105951. http://dx.doi.org/10.1016/j.dib.2020.105951.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Xiao, Yubei, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, and Liang Lin. "Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14112–20. http://dx.doi.org/10.1609/aaai.v35i16.17661.

Повний текст джерела
Анотація:
Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different source languages, the quantity and difficulty vary greatly because of their different data scales and diverse phonological systems, which leads to task-quantity and task-difficulty imbalance issues and
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Chen, Siqi, Yijie Pei, Zunwang Ke, and Wushour Silamu. "Low-Resource Named Entity Recognition via the Pre-Training Model." Symmetry 13, no. 5 (2021): 786. http://dx.doi.org/10.3390/sym13050786.

Повний текст джерела
Анотація:
Named entity recognition (NER) is an important task in the processing of natural language, which needs to determine entity boundaries and classify them into pre-defined categories. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to obtain high performance. However, there is minimal annotated data available about Uyghur and Hungarian (UH languages) NER tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to pro
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Vinodh Gunnam. "Tackling Low-Resource Languages: Efficient Transfer Learning Techniques for Multilingual NLP." International Journal for Research Publication and Seminar 13, no. 4 (2022): 354–59. http://dx.doi.org/10.36676/jrps.v13.i4.1601.

Повний текст джерела
Анотація:
Therefore, it aims to review the most efficient techniques for transfer learning about low-resource language in multilingual NLP. Some languages need reliable data; the problem is that they lack the resources to achieve high model accuracy. One of the solutions presented is transfer learning, a technique that enables knowledge from other high-resource languages to be utilized for LRLs. This study also uses simulation reports, real-time case studies, and experiences to support how these techniques work. The major issues are also outlined, such as lack of training data, model complexity and lang
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Thakkar, Gaurish, Nives Mikelić Preradović, and Marko Tadić. "Transferring Sentiment Cross-Lingually within and across Same-Family Languages." Applied Sciences 14, no. 13 (2024): 5652. http://dx.doi.org/10.3390/app14135652.

Повний текст джерела
Анотація:
Natural language processing for languages with limited resources is hampered by a lack of data. Using English as a hub language for such languages, cross-lingual sentiment analysis has been developed. The sheer quantity of English language resources raises questions about its status as the primary resource. This research aims to examine the impact on sentiment analysis of adding data from same-family versus distant-family languages. We analyze the performance using low-resource and high-resource data from the same language family (Slavic), investigate the effect of using a distant-family langu
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Bajpai, Ashutosh, and Tanmoy Chakraborty. "Multilingual LLMs Inherently Reward In-Language Time-Sensitive Semantic Alignment for Low-Resource Languages." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 22 (2025): 23469–77. https://doi.org/10.1609/aaai.v39i22.34515.

Повний текст джерела
Анотація:
The unwavering disparity in labeled resources between resource-rich languages and those considered low-resource remains a significant impediment for Large Language Models (LLMs). Recent strides in cross-lingual in-context learning (X-ICL), mainly through semantically aligned examples retrieved from multilingual pre-trained transformers, have shown promise in mitigating this issue. However, our investigation reveals that LLMs intrinsically reward in-language semantically aligned cross-lingual instances over direct cross-lingual semantic alignments, with a pronounced disparity in handling time–s
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Et. al., Syed Abdul Basit Andrabi,. "A Review of Machine Translation for South Asian Low Resource Languages." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 5 (2021): 1134–47. http://dx.doi.org/10.17762/turcomat.v12i5.1777.

Повний текст джерела
Анотація:
Machine translation is an application of natural language processing. Humans use native languages to communicate with one another, whereas programming languages communicate between humans and computers. NLP is the field that involves a broad set of techniques for analysis, manipulation and automatic generation of human languages or natural languages with the help of computers. It is essential to provide access to information to people for their development in the present information age. It is necessary to put equal emphasis on removing the barrier of language between different divisions of so
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Rakhimova, Diana, Aidana Karibayeva, and Assem Turarbek. "The Task of Post-Editing Machine Translation for the Low-Resource Language." Applied Sciences 14, no. 2 (2024): 486. http://dx.doi.org/10.3390/app14020486.

Повний текст джерела
Анотація:
In recent years, machine translation has made significant advancements; however, its effectiveness can vary widely depending on the language pair. Languages with limited resources, such as Kazakh, Uzbek, Kalmyk, Tatar, and others, often encounter challenges in achieving high-quality machine translations. Kazakh is an agglutinative language with complex morphology, making it a low-resource language. This article addresses the task of post-editing machine translation for the Kazakh language. The research begins by discussing the history and evolution of machine translation and how it has develop
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Kalluri, Kartheek. "ADAPTING LLMs FOR LOW RESOURCE LANGUAGES-TECHNIQUES AND ETHICAL CONSIDERATIONS." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 12 (2024): 1–6. https://doi.org/10.55041/isjem00140.

Повний текст джерела
Анотація:
Adaptive large language models (LLMs) to resource-scarce languages and also analyze the ethical considerations involved. Already incorporated the elements of mixed methods. It consists of a literature review, corpus collection, expert interviews, and shareholders meeting. Some adaptation techniques examined in this study are data augmentation, multilingual pre-training, change of architecture, and parameter-efficient fine-tuning. The quantitative analysis indicated model performance improvements for under-resourced languages, particularly through cross-lingual knowledge transfer and data augme
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Kim, Bosung, Juae Kim, Youngjoong Ko, and Jungyun Seo. "Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 7 (2021): 6393–401. http://dx.doi.org/10.1609/aaai.v35i7.16793.

Повний текст джерела
Анотація:
Commonsense reasoning is one of the ultimate goals of artificial intelligence research because it simulates the human thinking process. However, most commonsense reasoning studies have focused on English because available commonsense knowledge for low-resource languages is scarce due to high construction costs. Translation is one of the typical methods for augmenting data for low-resource languages; however, translation entails ambiguity problems, where one word can be translated into multiple words due to polysemes and homonyms. Previous studies have suggested methods to measure the validity
Стилі APA, Harvard, Vancouver, ISO та ін.
30

Zhang, Mozhi, Yoshinari Fujinuma, and Jordan Boyd-Graber. "Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 9547–54. http://dx.doi.org/10.1609/aaai.v34i05.6500.

Повний текст джерела
Анотація:
Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on t
Стилі APA, Harvard, Vancouver, ISO та ін.
31

ZENNAKI, O., N. SEMMAR, and L. BESACIER. "A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages." Natural Language Engineering 25, no. 1 (2018): 43–67. http://dx.doi.org/10.1017/s1351324918000293.

Повний текст джерела
Анотація:
AbstractThis work focuses on the rapid development of linguistic annotation tools for low-resource languages (languages that have no labeled training data). We experiment with several cross-lingual annotation projection methods using recurrent neural networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between source and target languages. More precisely, our approach has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Meeus, Quentin, Marie-Francine Moens, and Hugo Van hamme. "Bidirectional Representations for Low-Resource Spoken Language Understanding." Applied Sciences 13, no. 20 (2023): 11291. http://dx.doi.org/10.3390/app132011291.

Повний текст джерела
Анотація:
Speech representation models lack the ability to efficiently store semantic information and require fine tuning to deliver decent performance. In this research, we introduce a transformer encoder–decoder framework with a multiobjective training strategy, incorporating connectionist temporal classification (CTC) and masked language modeling (MLM) objectives. This approach enables the model to learn contextual bidirectional representations. We evaluate the representations in a challenging low-resource scenario, where training data is limited, necessitating expressive speech embeddings to compens
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Berthelier, Benoit. "Division and the Digital Language Divide: A Critical Perspective on Natural Language Processing Resources for the South and North Korean Languages." Korean Studies 47, no. 1 (2023): 243–73. http://dx.doi.org/10.1353/ks.2023.a908624.

Повний текст джерела
Анотація:
Abstract: The digital world is marked by large asymmetries in the volume of content available between different languages. As a direct corollary, this inequality also exists, amplified, in the number of resources (labeled and unlabeled datasets, pretrained models, academic research) available for the computational analysis of these languages or what is generally called natural language processing (NLP). NLP literature divides languages between high- and low-resource languages. Thanks to early private and public investment in the field, the Korean language is generally considered to be a high-r
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Mi, Chenggang, Shaolin Zhu, and Rui Nie. "Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion." Computational Intelligence and Neuroscience 2021 (April 8, 2021): 1–9. http://dx.doi.org/10.1155/2021/9975078.

Повний текст джерела
Анотація:
Loanword identification is studied in recent years to alleviate data sparseness in several natural language processing (NLP) tasks, such as machine translation, cross-lingual information retrieval, and so on. However, recent studies on this topic usually put efforts on high-resource languages (such as Chinese, English, and Russian); for low-resource languages, such as Uyghur and Mongolian, due to the limitation of resources and lack of annotated data, loanword identification on these languages tends to have lower performance. To overcome this problem, we first propose a lexical constraint-base
Стилі APA, Harvard, Vancouver, ISO та ін.
35

Shi, Xiayang, Xinyi Liu, Zhenqiang Yu, Pei Cheng, and Chun Xu. "Extracting Parallel Sentences from Low-Resource Language Pairs with Minimal Supervision." Journal of Physics: Conference Series 2171, no. 1 (2022): 012044. http://dx.doi.org/10.1088/1742-6596/2171/1/012044.

Повний текст джерела
Анотація:
Abstract At present, machine translation in the market depends on parallel sentence corpus, and the number of parallel sentences will affect the performance of machine translation, especially in low resource corpus. In recent years, the use of non parallel corpora to learn cross language word representation as low resources and less supervision to obtain bilingual sentence pairs provides a new idea. In this paper, we propose a new method. First, we create cross domain mappings in a small number of single languages. Then a classifier is constructed to extract bilingual parallel sentence pairs.
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Sabouri, Sadra, Elnaz Rahmati Rahmati, Soroush Gooran, and Hossein Sameti. "naab: A ready-to-use plug-and-play corpus for Farsi." Journal of Artificial Intelligence, Applications, and Innovations 1, no. 2 (2024): 1–8. https://doi.org/10.61838/jaiai.1.2.1.

Повний текст джерела
Анотація:
The rise of large language models (LLMs) has transformed numerous natural language processing (NLP) tasks, yet their performance in low and mid-resource languages, such as Farsi, still lags behind resource-rich languages like English. To address this gap, we introduce Naab, the largest publicly available, cleaned, and ready-to-use Farsi textual corpus. Naab consists of 130GB of data, comprising over 250 million paragraphs and 15 billion words. Named after the Farsi word ناب (meaning "pure" or "high-grade"), this corpus is openly accessible via Hugging Face, offering researchers a valuable reso
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Adjeisah, Michael, Guohua Liu, Douglas Omwenga Nyabuga, Richard Nuetey Nortey, and Jinling Song. "Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation." Computational Intelligence and Neuroscience 2021 (April 11, 2021): 1–10. http://dx.doi.org/10.1155/2021/6682385.

Повний текст джерела
Анотація:
Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corp
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Visser, Ruan, Trieko Grobler, and Marcel Dunaiski. "Insights into Low-Resource Language Modelling: Improving Model Performances for South African Languages." JUCS - Journal of Universal Computer Science 30, no. 13 (2024): 1849–71. https://doi.org/10.3897/jucs.118889.

Повний текст джерела
Анотація:
To address the gap in natural language processing for Southern African languages, our paper presents an in-depth analysis of language model development under resource-constrained conditions. We investigate the interplay between model size, pretraining objectives, and multilingual dataset composition in the context of low-resource languages such as Zulu and Xhosa. In our approach, we initially pretrain language models from scratch on specific low-resource languages using a variety of model configurations, and incrementally add related languages to explore the effect of additional languages on t
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Visser, Ruan, Trieko Grobler, and Marcel Dunaiski. "Insights into Low-Resource Language Modelling: Improving Model Performances for South African Languages." JUCS - Journal of Universal Computer Science 30, no. (13) (2024): 1849–71. https://doi.org/10.3897/jucs.118889.

Повний текст джерела
Анотація:
To address the gap in natural language processing for Southern African languages, our paper presents an in-depth analysis of language model development under resource-constrained conditions. We investigate the interplay between model size, pretraining objectives, and multilingual dataset composition in the context of low-resource languages such as Zulu and Xhosa. In our approach, we initially pretrain language models from scratch on specific low-resource languages using a variety of model configurations, and incrementally add related languages to explore the effect of additional languages on t
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Xiao, Jingxuan, and Jiawei Wu. "Transfer Learning for Cross-Language Natural Language Processing Models." Journal of Computer Technology and Applied Mathematics 1, no. 3 (2024): 30–38. https://doi.org/10.5281/zenodo.13366733.

Повний текст джерела
Анотація:
Cross-language natural language processing (NLP) presents numerous challenges due to the wide array of linguistic structures and vocabulary found within each language. Transfer learning has proven itself successful at meeting these challenges by drawing upon knowledge gained in highly resourced languages to enhance performance in lower resource ones. This paper investigates the application of transfer learning in cross-language NLP, exploring various methodologies, models and their efficacy. More specifically, we investigate mechanisms related to model adaptation, fine-tuning techniques and in
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Supriya, Musica, U. Dinesh Acharya, and Ashalatha Nayak. "Enhancing Neural Machine Translation Quality for Kannada–Tulu Language Pairs through Transformer Architecture: A Linguistic Feature Integration." Designs 8, no. 5 (2024): 100. http://dx.doi.org/10.3390/designs8050100.

Повний текст джерела
Анотація:
The rise of intelligent systems demands good machine translation models that are less data hungry and more efficient, especially for low- and extremely-low-resource languages with few or no data available. By integrating a linguistic feature to enhance the quality of translation, we have developed a generic Neural Machine Translation (NMT) model for Kannada–Tulu language pairs. The NMT model uses Transformer architecture and a state-of-the-art model for translating text from Kannada to Tulu and learns based on the parallel data. Kannada and Tulu are both low-resource Dravidian languages, with
Стилі APA, Harvard, Vancouver, ISO та ін.
42

V Kadam, Ashlesha. "Natural Language Understanding of Low-Resource Languages in Voice Assistants: Advancements, Challenges and Mitigation Strategies." International Journal of Language, Literature and Culture 3, no. 5 (2023): 20–23. http://dx.doi.org/10.22161/ijllc.3.5.3.

Повний текст джерела
Анотація:
This paper presents an exploration of low resource languages and the specific challenges that arise in natural language understanding of these by a voice assistant. While voice assistants have made significant strides when it comes to their understanding of mainstream languages, this paper focuses on extending this understanding to low resource languages in order to maintain diversity of linguistics and also delight the customer. In this paper, the specific nuances of natural language understanding when it comes to these low resource languages has been discussed. The paper also proposes techni
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Zhu, ShaoLin, Xiao Li, YaTing Yang, Lei Wang, and ChengGang Mi. "A Novel Deep Learning Method for Obtaining Bilingual Corpus from Multilingual Website." Mathematical Problems in Engineering 2019 (January 10, 2019): 1–7. http://dx.doi.org/10.1155/2019/7495436.

Повний текст джерела
Анотація:
Machine translation needs a large number of parallel sentence pairs to make sure of having a good translation performance. However, the lack of parallel corpus heavily limits machine translation for low-resources language pairs. We propose a novel method that combines the continuous word embeddings with deep learning to obtain parallel sentences. Since parallel sentences are very invaluable for low-resources language pair, we introduce cross-lingual semantic representation to induce bilingual signals. Our experiments show that we can achieve promising results under lacking external resources f
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Tela, Abrhalei, Abraham Woubie, and Ville Hautamäki. "Transferring monolingual model to low-resource language: the case of Tigrinya." Applied Computing and Intelligence 4, no. 2 (2024): 184–94. http://dx.doi.org/10.3934/aci.2024011.

Повний текст джерела
Анотація:
<p>In recent years, transformer models have achieved great success in natural language processing (NLP) tasks. Most of the current results are achieved by using monolingual transformer models, where the model is pre-trained using a single-language unlabelled text corpus. Then, the model is fine-tuned to the specific downstream task. However, the cost of pre-training a new transformer model is high for most languages. In this work, we propose a cost-effective transfer learning method to adopt a strong source language model, trained from a large monolingual corpus to a low-resource languag
Стилі APA, Harvard, Vancouver, ISO та ін.
45

Wu, Yike, Shiwan Zhao, Ying Zhang, Xiaojie Yuan, and Zhong Su. "When Pairs Meet Triplets: Improving Low-Resource Captioning via Multi-Objective Optimization." ACM Transactions on Multimedia Computing, Communications, and Applications 18, no. 3 (2022): 1–20. http://dx.doi.org/10.1145/3492325.

Повний текст джерела
Анотація:
Image captioning for low-resource languages has attracted much attention recently. Researchers propose to augment the low-resource caption dataset into (image, rich-resource language, and low-resource language) triplets and develop the dual attention mechanism to exploit the existence of triplets in training to improve the performance. However, datasets in triplet form are usually small due to their high collecting cost. On the other hand, there are already many large-scale datasets, which contain one pair from the triplet, such as caption datasets in the rich-resource language and translation
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Grönroos, Stig-Arne, Kristiina Jokinen, Katri Hiovain, Mikko Kurimo, and Sami Virpioja. "Low-Resource Active Learning of North Sámi Morphological Segmentation." Septentrio Conference Series, no. 2 (June 17, 2015): 20. http://dx.doi.org/10.7557/5.3465.

Повний текст джерела
Анотація:
Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications.We study how to create a statistical model for morphological segmentation of North Sámi language with a large unannotated corpus and a small amount of human-annotated word forms selected using an active learning approach. For statistical learning, we use the semi-supervised Morfess
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Chaka, Chaka. "Currently Available GenAI-Powered Large Language Models and Low-Resource Languages: Any Offerings? Wait Until You See." International Journal of Learning, Teaching and Educational Research 23, no. 12 (2024): 148–73. https://doi.org/10.26803/ijlter.23.12.9.

Повний текст джерела
Анотація:
A lot of hype has accompanied the increasing number of generative artificial intelligence-powered large language models (LLMs). Similarly, much has been written about what currently available LLMs can and cannot do, including their benefits and risks, especially in higher education. However, few use cases have investigated the performance and generative capabilities of LLMs in low-resource languages. With this in mind, one of the purposes of the current study was to explore the extent to which seven, currently available, free-to-use versions of LLMs (ChatGPT, Claude, Copilot, Gemini, GroqChat,
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Murakami, Yohei. "Indonesia Language Sphere: an ecosystem for dictionary development for low-resource languages." Journal of Physics: Conference Series 1192 (March 2019): 012001. http://dx.doi.org/10.1088/1742-6596/1192/1/012001.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Pakray, Partha, Alexander Gelbukh, and Sivaji Bandyopadhyay. "Preface: Special issue on Natural Language Processing applications for low-resource languages." Natural Language Processing 31, no. 2 (2025): 181–82. https://doi.org/10.1017/nlp.2024.34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Chen, Xilun, Yu Sun, Ben Athiwaratkun, Claire Cardie, and Kilian Weinberger. "Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification." Transactions of the Association for Computational Linguistics 6 (December 2018): 557–70. http://dx.doi.org/10.1162/tacl_a_00039.

Повний текст джерела
Анотація:
In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN 1 ) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment class
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!