To see the other types of publications on this topic, follow the link: Under-resourced language.

Journal articles on the topic 'Under-resourced language'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Under-resourced language.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chen, Wenda, Mark Hasegawa-Johnson, and Nancy F. Chen. "Mismatched Crowdsourcing based Language Perception for Under-resourced Languages." Procedia Computer Science 81 (2016): 23–29. http://dx.doi.org/10.1016/j.procs.2016.04.025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ko, Tom, and Brian Mak. "Eigentrigraphemes for under-resourced languages." Speech Communication 56 (January 2014): 132–41. http://dx.doi.org/10.1016/j.specom.2013.01.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tarmizi, Nursyahirah, Suhaila Saee, and Dayang Hanani Abang Ibrahim. "Author identification for Under-Resourced language (KadazanDusun)." Indonesian Journal of Electrical Engineering and Computer Science 17, no. 1 (2020): 248. http://dx.doi.org/10.11591/ijeecs.v17.i1.pp248-255.

Full text
Abstract:
<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which incl
APA, Harvard, Vancouver, ISO, and other styles
4

Nursyahirah, Tarmizi, Saee Suhaila, and Hanani Abang Ibrahim Dayang. "Author identification for under-resourced language Kadazandusun." Indonesian Journal of Electrical Engineering and Computer Science (IJEECS) 17, no. 1 (2020): 248–55. https://doi.org/10.11591/ijeecs.v17.i1.pp248-255.

Full text
Abstract:
This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes charact
APA, Harvard, Vancouver, ISO, and other styles
5

Selamat, Ali, and Nicholas Akosu. "Word-length algorithm for language identification of under-resourced languages." Journal of King Saud University - Computer and Information Sciences 28, no. 4 (2016): 457–69. http://dx.doi.org/10.1016/j.jksuci.2014.12.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cunliffe, Daniel, Andreas Vlachidis, Daniel Williams, and Douglas Tudhope. "Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit." Computer Speech & Language 72 (March 2022): 101311. http://dx.doi.org/10.1016/j.csl.2021.101311.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Phung, Trung-Nghia, Duc-Binh Nguyen, and Ngoc-Phuong Pham. "A Review on Speech Recognition for Under-Resourced Languages." International Journal of Knowledge and Systems Science 15, no. 1 (2023): 1–16. http://dx.doi.org/10.4018/ijkss.332869.

Full text
Abstract:
Fundamental speech recognition technologies for high-resourced languages are currently successful to build high-quality applications with the use of deep learning models. However, the problem of “borrowing” these speech recognition technologies for under-resourced languages like Vietnamese still has challenges. This study reviews fundamental studies on speech recognition in general as well as speech recognition in Vietnamese, an under-resourced language in particular. Then, it specifies the urgent issues that need current research attention to build Vietnamese speech recognition applications i
APA, Harvard, Vancouver, ISO, and other styles
8

Daigneault, Anna Luisa, and Gregory D. S. Anderson. "Living Dictionaries: A Platform for Indigenous and Under-Resourced Languages." Dictionaries: Journal of the Dictionary Society of North America 44, no. 2 (2023): 57–74. http://dx.doi.org/10.1353/dic.2023.a915065.

Full text
Abstract:
ABSTRACT: Due to globalization, cultural assimilation, the long-term impacts of colonization, and official (or de facto ) policies hostile to linguistic diversity, many languages of the world are threatened or endangered. Free online technological tools can assist in documentation efforts and revitalization programs, while also providing safe online spaces in which materials can be systematically recorded and shared. Led by community activists and linguists, Living Dictionaries are collaborative multimedia projects that are editable, expandable, and searchable. Using the latest web technologie
APA, Harvard, Vancouver, ISO, and other styles
9

Viet-Bac Le and L. Besacier. "Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language." IEEE Transactions on Audio, Speech, and Language Processing 17, no. 8 (2009): 1471–82. http://dx.doi.org/10.1109/tasl.2009.2021723.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Obare, Stephen, and Kennedy Ogada. "A review of natural language processing techniques for under-resourced languages." Advances in Science, Technology and Engineering Systems Journal 10, no. 02 (2025): 35–41. https://doi.org/10.25046/aj100204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Kurimo, Mikko, Seppo Enarvi, Ottokar Tilk, Matti Varjokallio, André Mansikkaniemi, and Tanel Alumäe. "Modeling under-resourced languages for speech recognition." Language Resources and Evaluation 51, no. 4 (2016): 961–87. http://dx.doi.org/10.1007/s10579-016-9336-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

El-Haj, Mahmoud, Udo Kruschwitz, and Chris Fox. "Creating language resources for under-resourced languages: methodologies, and experiments with Arabic." Language Resources and Evaluation 49, no. 3 (2014): 549–80. http://dx.doi.org/10.1007/s10579-014-9274-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Kituku, Benson, Wanjiku Nganga, and Lawrence Muchemi. "Grammar Engineering for the Ekegusii Language in Grammatical Framework." European Journal of Engineering and Technology Research 6, no. 3 (2021): 20–29. http://dx.doi.org/10.24018/ejers.2021.6.3.2382.

Full text
Abstract:
The knowledge-driven economy uses technology, thereby increasing the demand for language tools and resources to acquire and distribute the knowledge. Such tools and resources are scarce for the under resourced, spoken Bantu languages. This paper develops a computational grammar for the Ekegusii language in the Grammatical Framework (GF) to bridge the gap. The grammar development uses a bottom-up and modular-driven approach. A machine translation experiment was set up to evaluate the grammar resulting in BLEU and PER of 55.95% and 19.49%, respectively. This work contributes by providing computa
APA, Harvard, Vancouver, ISO, and other styles
14

Kituku, Benson, Wanjiku Nganga, and Lawrence Muchemi. "Grammar Engineering for the Ekegusii Language in Grammatical Framework." European Journal of Engineering and Technology Research 6, no. 3 (2021): 20–29. http://dx.doi.org/10.24018/ejeng.2021.6.3.2382.

Full text
Abstract:
The knowledge-driven economy uses technology, thereby increasing the demand for language tools and resources to acquire and distribute the knowledge. Such tools and resources are scarce for the under resourced, spoken Bantu languages. This paper develops a computational grammar for the Ekegusii language in the Grammatical Framework (GF) to bridge the gap. The grammar development uses a bottom-up and modular-driven approach. A machine translation experiment was set up to evaluate the grammar resulting in BLEU and PER of 55.95% and 19.49%, respectively. This work contributes by providing computa
APA, Harvard, Vancouver, ISO, and other styles
15

Allah, Fadoua Ataa, and Siham Boulaknadel. "NEW TRENDS IN LESS-RESOURCED LANGUAGE PROCESSING: CASE OF AMAZIGH LANGUAGE." International Journal on Natural Language Computing 12, no. 2 (2023): 75–89. http://dx.doi.org/10.5121/ijnlc.2023.12207.

Full text
Abstract:
The coronavirus (COVID-19) pandemic has dramatically changed lifestyles in much of the world. It forced people to profoundly review their relationships and interactions with digital technologies. Nevertheless, people prefer using these technologies in their favorite languages. Unfortunately, most languages are considered even as low or less-resourced, and they do not have the potential to keep up with the new needs. Therefore, this study explores how this kind of languages, mainly the Amazigh, will behave in the wholly digital environment, and what to expect for new trends. Contrary to last de
APA, Harvard, Vancouver, ISO, and other styles
16

Bahar, Kadar, and Nehad T. A. Ramaha. "Exploring Somali Sentiment Analysis: A Resource-Light Approach for Small-scale Text Classification." International Conference on Applied Engineering and Natural Sciences 1, no. 1 (2023): 620–28. http://dx.doi.org/10.59287/icaens.1069.

Full text
Abstract:
Sentiment analysis, a fundamental task in natural language processing (NLP), plays a crucial role in understanding people's opinions and emotions expressed in textual data. While sentiment analysis has been extensively studied for major languages, under-resourced languages like Somali have received limited attention in this domain. This paper aims to address this research gap by proposing a resource light approach for sentiment analysis in Somali, which is tailored to the language's unique characteristics and limited linguistic resources. We present a methodology that combines lexicon-based me
APA, Harvard, Vancouver, ISO, and other styles
17

Fadoua, Ataa Allah, and Boulaknadel Siham. "New Trends in Less-Resourced Language Processing: Case of Amazigh Language." International Journal on Natural Language Computing (IJNLC) 12, no. 2 (2023): 15. https://doi.org/10.5281/zenodo.8069560.

Full text
Abstract:
The coronavirus (COVID-19) pandemic has dramatically changed lifestyles in much of the world. It forced people to profoundly review their relationships and interactions with digital technologies. Nevertheless, people prefer using these technologies in their favorite languages. Unfortunately, most languages are considered even as low or less-resourced, and they do not have the potential to keep up with the new needs. Therefore, this study explores how this kind of languages, mainly the Amazigh, will behave in the wholly digital environment, and what to expect for new trends. Contrary to last de
APA, Harvard, Vancouver, ISO, and other styles
18

Tilves Santiago, Darío, Carmén García Mateo, Soledad Torres Guijarro, Laura Docío Fernández, and José Luis Alba Castro. "Estudio de bases de datos para el reconocimiento automático de lenguas de signos." Hesperia: Anuario de Filología Hispánica 22 (March 13, 2020): 145–60. http://dx.doi.org/10.35869/hafh.v22i0.1658.

Full text
Abstract:
Automatic sign language recognition (ASLR) is quite a complex task, not only for the difficulty of dealing with very dynamic video information, but also because almost every sign language (SL) can be considered as an under-resourced language when it comes to language technology. Spanish sign language (LSE) is one of those under-resourced languages. Developing technology for SSL implies a number of technical challenges that must be tackled down in a structured and sequential manner. In this paper, some problems of machine-learning- based ASLR are addressed. A review of publicly available datase
APA, Harvard, Vancouver, ISO, and other styles
19

Tilves Santiago, Darío, Carmén García Mateo, Soledad Torres Guijarro, Laura Docío Fernández, and José Luis Alba Castro. "Estudio de bases de datos para el reconocimiento automático de lenguas de signos." Hesperia: Anuario de Filología Hispánica 23 (March 13, 2020): 145–60. http://dx.doi.org/10.35869/hafh.v23i0.1658.

Full text
Abstract:
Automatic sign language recognition (ASLR) is quite a complex task, not only for the difficulty of dealing with very dynamic video information, but also because almost every sign language (SL) can be considered as an under-resourced language when it comes to language technology. Spanish sign language (LSE) is one of those under-resourced languages. Developing technology for SSL implies a number of technical challenges that must be tackled down in a structured and sequential manner. In this paper, some problems of machine-learning- based ASLR are addressed. A review of publicly available datase
APA, Harvard, Vancouver, ISO, and other styles
20

Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. "Distilling Monolingual Models from Large Multilingual Transformers." Electronics 12, no. 4 (2023): 1022. http://dx.doi.org/10.3390/electronics12041022.

Full text
Abstract:
Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-à-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on
APA, Harvard, Vancouver, ISO, and other styles
21

Nabende, Peter. "A Review and evaluation of Machine Translation methods for Lumasaaba." Journal of Digital Science, no. 1 (May 28, 2020): 3–17. http://dx.doi.org/10.33847/2686-8296.2.1_1.

Full text
Abstract:
Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven method
APA, Harvard, Vancouver, ISO, and other styles
22

Mukhitdinova, Badia, Ramazon Abdullaev, Gulnoza Odilova, et al. "Wireless Mobile Network with Transfer Learning Algorithm for Multilingual Education and Historical Research." Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications 16, no. 1 (2025): 599–608. https://doi.org/10.58346/jowua.2025.i1.035.

Full text
Abstract:
Notwithstanding recent advancements in Automatic Speech Recognition (ASR), acknowledging children's speech continues to pose a considerable problem. This mainly results from significant acoustic fluctuation and the scarcity of available data for training in Wireless Mobile Networks (WMN). This issue is especially pronounced in languages other than English, typically under-resourced. This research examines children's ASR in various under-resourced languages by amalgamating different small kids’ voice datasets. Specifically, the study examines the subsequent research questions: Does a novel two-
APA, Harvard, Vancouver, ISO, and other styles
23

Kamau, Gabriel. "Data-Driven Part-of-Speech Tagging for the Gikuyu Language: Development, Challenges, and Prospects." International Journal on Natural Language Computing 13, no. 5/6 (2024): 15–26. https://doi.org/10.5121/ijnlc.2024.13602.

Full text
Abstract:
This paper presents the development of a data-driven Part-of-Speech (POS) tagger for Gikuyu, a Bantu language spoken in Kenya. Gikuyu, like many indigenous African languages, is under-resourced, with limited computational tools for linguistic processing. By employing a corpus sourced primarily from the Gikuyu Bible and leveraging a Memory-Based Tagging (MBT) approach, this study demonstrates the feasibility of creating a robust POS tagging system. The tagger achieved a precision of 90.44%, a recall of 88.34%, and an F-score of 91.35%. These results underscore its potential for applications in
APA, Harvard, Vancouver, ISO, and other styles
24

Tiedemann, Jörg, and Zeljko Agić. "Synthetic Treebanking for Cross-Lingual Dependency Parsing." Journal of Artificial Intelligence Research 55 (January 27, 2016): 209–48. http://dx.doi.org/10.1613/jair.4785.

Full text
Abstract:
How do we parse the languages for which no treebanks are available? This contribution addresses the cross-lingual viewpoint on statistical dependency parsing, in which we attempt to make use of resource-rich source language treebanks to build and adapt models for the under-resourced target languages. We outline the benefits, and indicate the drawbacks of the current major approaches. We emphasize synthetic treebanking: the automatic creation of target language treebanks by means of annotation projection and machine translation. We present competitive results in cross-lingual dependency parsing
APA, Harvard, Vancouver, ISO, and other styles
25

Devin, Hoesen, Puji Lestar Dessi, and Hendratmo Widyantoro Dwi. "Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content." TELKOMNIKA Telecommunication, Computing, Electronics and Control 16, no. 3 (2018): 1226–38. https://doi.org/10.12928/TELKOMNIKA.v16i3.7984.

Full text
Abstract:
Training speech recognizer with under-resourced language data still proves difficult. Indonesian language is considered under-resourced because the lack of a standard speech corpus, text corpus, and dictionary. In this research, the efficacy of augmenting limited Indonesian speech training data with highly - resourced-language training data, such as English, to train Indonesian speech recogniz er was analyzed. The training was performed in form of shared-hidden-layer deep-neural-network (SHL-DNN) training. An SHL-DNN has language-independent hidden layers and can be pre-trained and trained usi
APA, Harvard, Vancouver, ISO, and other styles
26

Saee, Suhaila, Ranaivo-Malancon Bali, Lay-Ki Soon, and Tek-Yong Lim. "Crawling Social Media to Create Morphological Resource of Under-Resourced Language: Melanau Language." Advanced Science Letters 23, no. 11 (2017): 11503–7. http://dx.doi.org/10.1166/asl.2017.10316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Tzudir, Moakala, Shikha Baghel, Priyankoo Sarmah, and S. R. Mahadeva Prasanna. "Under-resourced dialect identification in Ao using source information." Journal of the Acoustical Society of America 152, no. 3 (2022): 1755–66. http://dx.doi.org/10.1121/10.0014176.

Full text
Abstract:
This paper reports the findings of an automatic dialect identification (DID) task conducted on Ao speech data using source features. Considering that Ao is a tone language, in this study for DID, the gammatonegram of the linear prediction residual is proposed as a feature. As Ao is an under-resourced language, data augmentation was carried out to increase the size of the speech corpus. The results showed that data augmentation improved DID by 14%. A perception test conducted on Ao speakers showed better DID by the subjects when utterance duration was 3 s. Accordingly, automatic DID was conduct
APA, Harvard, Vancouver, ISO, and other styles
28

Sainin, Mohd Shamrie, Minah Sintian, Suraya Alias, and Asni Tahir. "The Application of Computer-Aided Under-Resourced Language Translation for Malay into Kadazandusun." Annals of Emerging Technologies in Computing 7, no. 5 (2023): 11–24. http://dx.doi.org/10.33166/aetic.2023.05.002.

Full text
Abstract:
A computer-aided language translation using a Machine translation (MT) is an application performed by computers (machines) that translates one natural language to another. There are many online language translation tools, but thus far none offers a sequence of text translations for the under-resourced Kadazandusun language. Although there are web-based and mobile applications of Kadazandusun dictionaries available, the systems do not translate more than one word. Hence, this paper aims to present the discussion of the preliminary translation of Malay to Kadazandusun. The basic word-to-word wit
APA, Harvard, Vancouver, ISO, and other styles
29

Besacier, Laurent, Etienne Barnard, Alexey Karpov, and Tanja Schultz. "Automatic speech recognition for under-resourced languages: A survey." Speech Communication 56 (January 2014): 85–100. http://dx.doi.org/10.1016/j.specom.2013.07.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Tune, Kula Kekeba, and Vasudeva Varma. "Building CLIA for Resource-Scarce African Languages." International Journal of Information Retrieval Research 5, no. 1 (2015): 48–67. http://dx.doi.org/10.4018/ijirr.2015010104.

Full text
Abstract:
Since most of the existing major search engines and commercial Information Retrieval (IR) systems are primarily designed for well-resourced European and Asian languages, they have paid little attention to the development of Cross-Language Information Access (CLIA) technologies for resource-scarce African languages. This paper presents the authors' experience in building CLIA for indigenous African languages, with a special focus on the development and evaluation of Oromo-English-CLIR. The authors have adopted a knowledge-based query translation approach to design and implement their initial Or
APA, Harvard, Vancouver, ISO, and other styles
31

Barroso, Nora, Karmele López de Ipiña, Carmen Hernández, Aitzol Ezeiza, and Manuel Graña. "Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages." International Journal of Speech Technology 15, no. 1 (2011): 41–47. http://dx.doi.org/10.1007/s10772-011-9114-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Fraisse, Amel, Zheng Zhang, Alex Zhai, et al. "A Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity." Information 10, no. 10 (2019): 303. http://dx.doi.org/10.3390/info10100303.

Full text
Abstract:
This paper proposes a new collaborative and inclusive model for Knowledge Organization Systems (KOS) for sustaining cultural heritage and language diversity. It is based on contributions of end-users as well as scientific and scholarly communities from across borders, languages, nations, continents, and disciplines. It consists in collecting knowledge about all worldwide translations of one original work and sharing that data through a digital and interactive global knowledge map. Collected translations are processed in order to build multilingual parallel corpora for a large number of under-r
APA, Harvard, Vancouver, ISO, and other styles
33

Hoesen, Devin, Dessi Puji Lestari, and Dwi Hendratmo Widyantoro. "Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content." TELKOMNIKA (Telecommunication Computing Electronics and Control) 16, no. 3 (2018): 1226. http://dx.doi.org/10.12928/telkomnika.v16i0.7984.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Hoesen, Devin, Dessi Puji Lestari, and Dwi Hendratmo Widyantoro. "Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content." TELKOMNIKA (Telecommunication Computing Electronics and Control) 16, no. 3 (2018): 1226. http://dx.doi.org/10.12928/telkomnika.v16i3.7984.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sene-Mongaba, Bienvenu. "The Making of Lingala Corpus: An Under-resourced Language and the Internet." Procedia - Social and Behavioral Sciences 198 (July 2015): 442–50. http://dx.doi.org/10.1016/j.sbspro.2015.07.464.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Imseng, David, Petr Motlicek, Hervé Bourlard, and Philip N. Garner. "Using out-of-language data to improve an under-resourced speech recognizer." Speech Communication 56 (January 2014): 142–51. http://dx.doi.org/10.1016/j.specom.2013.01.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Besacier, Laurent, Etienne Barnard, Alexey Karpov, and Tanja Schultz. "Introduction to the special issue on processing under-resourced languages." Speech Communication 56 (January 2014): 83–84. http://dx.doi.org/10.1016/j.specom.2013.09.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Jansen van Vüren, Joshua, and Thomas Niesler. "Improving N-Best Rescoring in Under-Resourced Code-Switched Speech Recognition Using Pretraining and Data Augmentation." Languages 7, no. 3 (2022): 236. http://dx.doi.org/10.3390/languages7030236.

Full text
Abstract:
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gram augmentation as well as optimised pretraining of long short-term memory (LSTM) language models with larger corpora of out-of-domain monolingual text. Our investigation specifically considers the impact of the way in which multiple monolingual datasets are interleaved prior to being presented as input to a language model. In addition, we consider the application of large pretrained transformer-based architectures, and present the first investigation employing these models in English-Bantu code-
APA, Harvard, Vancouver, ISO, and other styles
39

Păiș, Vasile, Verginica Barbu Mititelu, Elena Irimia, Radu Ion, and Dan Tufiș. "Under-Represented Speech Dataset from Open Data: Case Study on the Romanian Language." Applied Sciences 14, no. 19 (2024): 9043. http://dx.doi.org/10.3390/app14199043.

Full text
Abstract:
This paper introduces the USPDATRO dataset. This is a speech dataset, in the Romanian language, constructed from open data, focusing on under-represented voice types (children, young and old people, and female voices). The paper covers the methodology behind the dataset construction, specific details regarding the dataset, and evaluation of existing Romanian Automatic Speech Recognition (ASR) systems, with different architectures. Results indicate that more under-represented speech content is needed in the training of ASR systems. Our approach can be extended to other low-resourced languages,
APA, Harvard, Vancouver, ISO, and other styles
40

Tachbelie, Martha Yifiru, Solomon Teferra Abate, and Laurent Besacier. "Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic." Speech Communication 56 (January 2014): 181–94. http://dx.doi.org/10.1016/j.specom.2013.01.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Knez, Timotej, Miha Štravs, and Slavko Žitnik. "Semi-Supervised Relation Extraction Corpus Construction and Models Creation for Under-Resourced Languages: A Use Case for Slovene." Information 16, no. 2 (2025): 143. https://doi.org/10.3390/info16020143.

Full text
Abstract:
The goal of relation extraction is to recognize head and tail entities in a document and determine a relation between them. While a lot of progress was made in solving automated relation extraction in widely used languages such as English, the use of these methods for under-resourced languages and domains is limited due to the lack of training data. In this work, we present a pipeline using distant supervision for constructing a relation extraction corpus in an arbitrary language. The corpus construction combines Wikipedia documents in the target language with relations in the WikiData knowled
APA, Harvard, Vancouver, ISO, and other styles
42

de Vries, Nic J., Marelie H. Davel, Jaco Badenhorst, et al. "A smartphone-based ASR data collection tool for under-resourced languages." Speech Communication 56 (January 2014): 119–31. http://dx.doi.org/10.1016/j.specom.2013.07.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Adebayo, Bakare Mustaphaa, Kalaiarasi Sonai Muthu Anbananthen, Saravanan Muthaiyah, and Saravanan Nathan Lurudusamy. "Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language." HighTech and Innovation Journal 5, no. 2 (2024): 272–81. http://dx.doi.org/10.28991/hij-2024-05-02-04.

Full text
Abstract:
Despite the widespread use of Malay, under-resourced languages like Malay face challenges in Natural Language Processing (NLP), particularly in Part-of-Speech (POS) tagging. The scarcity of annotated corpora poses a primary obstacle to POS tagging in Malay. This study aims to enhance the effectiveness and reliability of POS tagging models explicitly tailored for under-resourced languages within the field of NLP, focusing on Malay. Existing models, which rely on Conditional Random Fields and Hidden Markov Models, exhibit limitations, underscoring the need for more robust approaches. The researc
APA, Harvard, Vancouver, ISO, and other styles
44

Muhamediyeva, Dildora Kabulovna, Baxrixon Ibragimovna Otaxonova, Munisaxon Rashodovna Raxmonova, and Nilufar Sirojidovna Mirzayeva. "TEXT MINING AND SENTIMENT ANALYSIS FOR UZBEK: EVALUATING SVM AND NAIVE BAYES FOR UNDER-RESOURCED LANGUAGE PROCESSING." DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE 2, no. 610 (2024): 46–54. https://doi.org/10.5281/zenodo.14646366.

Full text
Abstract:
This study explores the application of text mining techniques to classify and analyze Uzbek text, focusing on the performance of Support Vector Machine (SVM) and Naive Bayes algorithms. Due to the unique linguistic structure of Uzbek, an under-resourced language with an agglutinative morphology and dual-script usage (Cyrillic and Latin), text mining presents several challenges. We collected a dataset from various Uzbek text sources, including news articles and social media posts, and applied customized preprocessing steps such as script normalization, tokenization, and stop word removal.The pr
APA, Harvard, Vancouver, ISO, and other styles
45

Žitkus, Voldemaras, Rita Butkienė, Rimantas Butleris, Rytis Maskeliūnas, Robertas Damaševičius, and Marcin Woźniak. "Minimalistic Approach to Coreference Resolution in Lithuanian Medical Records." Computational and Mathematical Methods in Medicine 2019 (March 20, 2019): 1–14. http://dx.doi.org/10.1155/2019/9079840.

Full text
Abstract:
Coreference resolution is a challenging part of natural language processing (NLP) with applications in machine translation, semantic search and other information retrieval, and decision support systems. Coreference resolution requires linguistic preprocessing and rich language resources for automatically identifying and resolving such expressions. Many rarer and under-resourced languages (such as Lithuanian) lack the required language resources and tools. We present a method for coreference resolution in Lithuanian language and its application for processing e-health records from a hospital re
APA, Harvard, Vancouver, ISO, and other styles
46

Yantseva, Victoria, and Kostiantyn Kucher. "Stance Classification of Social Media Texts for Under-Resourced Scenarios in Social Sciences." Data 7, no. 11 (2022): 159. http://dx.doi.org/10.3390/data7110159.

Full text
Abstract:
In this work, we explore the performance of supervised stance classification methods for social media texts in under-resourced languages and using limited amounts of labeled data. In particular, we focus specifically on the possibilities and limitations of the application of classic machine learning versus deep learning in social sciences. To achieve this goal, we use a training dataset of 5.7K messages posted on Flashback Forum, a Swedish discussion platform, further supplemented with the previously published ABSAbank-Imm annotated dataset, and evaluate the performance of various model parame
APA, Harvard, Vancouver, ISO, and other styles
47

Singh, Ramesh Bahadur. "Navigating English Language Education Challenges in Resource-limited Contexts." KMC Journal 6, no. 1 (2024): 135–52. http://dx.doi.org/10.3126/kmcj.v6i1.62336.

Full text
Abstract:
Nepal's diverse linguistic landscape challenges the under-resourced education system, particularly in teaching English. Despite the demand for English from parents and communities due to globalization, rural schools still face difficulties in providing English language education, despite student, parent, and community expectations, and government policies. The purpose of this study was to explore the challenges faced by English language teachers in an under-resourced context and their coping strategies in Nepal. A qualitative research approach with the narrative inquiry was used and two public
APA, Harvard, Vancouver, ISO, and other styles
48

Mabokela, Koena Ronny, Mpho Primus, and Turgay Celik. "Explainable Pre-Trained Language Models for Sentiment Analysis in Low-Resourced Languages." Big Data and Cognitive Computing 8, no. 11 (2024): 160. http://dx.doi.org/10.3390/bdcc8110160.

Full text
Abstract:
Sentiment analysis is a crucial tool for measuring public opinion and understanding human communication across digital social media platforms. However, due to linguistic complexities and limited data or computational resources, it is under-represented in many African languages. While state-of-the-art Afrocentric pre-trained language models (PLMs) have been developed for various natural language processing (NLP) tasks, their applications in eXplainable Artificial Intelligence (XAI) remain largely unexplored. In this study, we propose a novel approach that combines Afrocentric PLMs with XAI tech
APA, Harvard, Vancouver, ISO, and other styles
49

Cao, Bochun. "AI Tutor: Solution for Chinas Disadvantaged and Under-resourced Children." Lecture Notes in Education Psychology and Public Media 32, no. 1 (2023): 133–41. http://dx.doi.org/10.54254/2753-7048/32/20230834.

Full text
Abstract:
This academic paper delves into the pressing issue of educational disparities in China, particularly focusing on the challenges faced by disadvantaged and under-resourced children, including both migrant and left-behind children. The paper underscores the socioeconomic and geographical complexities that exacerbate these disparities, emphasizing the need for innovative solutions. It then introduces the transformative potential of AI tutors, leveraging recent advancements in large language models (LLMs), to bridge the educational gap. The study highlights the significant impact of migration on c
APA, Harvard, Vancouver, ISO, and other styles
50

Khan, Naira. "Developing Digital Resources for Computational Bangla." Dhaka University Journal of Linguistics 10, no. 20 (2018): 31–56. https://doi.org/10.70438/dujl/1020/0002.

Full text
Abstract:
As the world moves towards a digitally-literate global society, digitising languages has become integral for information exchange in every language. Despite being one of the most widely spoken languages of the world, Bangla is one of the most digitally under-resourced languages. In this respect, Bangla computing has become an essential next phase in the evolutionary path of the language. A number of endeavours in computational modeling can be noted as setting the precursors for a robust repository of computational resources for Bangla. From corpus-development, to Bangla WordNet, to POS tagging
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!