Siga este enlace para ver otros tipos de publicaciones sobre el tema: Less-resourced languages.

Artículos de revistas sobre el tema "Less-resourced languages"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Less-resourced languages".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Allah, Fadoua Ataa, and Siham Boulaknadel. "NEW TRENDS IN LESS-RESOURCED LANGUAGE PROCESSING: CASE OF AMAZIGH LANGUAGE." International Journal on Natural Language Computing 12, no. 2 (2023): 75–89. http://dx.doi.org/10.5121/ijnlc.2023.12207.

Texto completo
Resumen
The coronavirus (COVID-19) pandemic has dramatically changed lifestyles in much of the world. It forced people to profoundly review their relationships and interactions with digital technologies. Nevertheless, people prefer using these technologies in their favorite languages. Unfortunately, most languages are considered even as low or less-resourced, and they do not have the potential to keep up with the new needs. Therefore, this study explores how this kind of languages, mainly the Amazigh, will behave in the wholly digital environment, and what to expect for new trends. Contrary to last de
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Fadoua, Ataa Allah, and Boulaknadel Siham. "New Trends in Less-Resourced Language Processing: Case of Amazigh Language." International Journal on Natural Language Computing (IJNLC) 12, no. 2 (2023): 15. https://doi.org/10.5281/zenodo.8069560.

Texto completo
Resumen
The coronavirus (COVID-19) pandemic has dramatically changed lifestyles in much of the world. It forced people to profoundly review their relationships and interactions with digital technologies. Nevertheless, people prefer using these technologies in their favorite languages. Unfortunately, most languages are considered even as low or less-resourced, and they do not have the potential to keep up with the new needs. Therefore, this study explores how this kind of languages, mainly the Amazigh, will behave in the wholly digital environment, and what to expect for new trends. Contrary to last de
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Ulčar, Matej, Aleš Žagar, Carlos S. Armendariz, et al. "Mono- and cross-lingual evaluation of representation language models on less-resourced languages." Computer Speech & Language 95 (January 2026): 101852. https://doi.org/10.1016/j.csl.2025.101852.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Dell’Orletta, Felice, Simonetta Montemagni, and Giulia Venturi. "Assessing document and sentence readability in less resourced languages and across textual genres." Recent Advances in Automatic Readability Assessment and Text Simplification 165, no. 2 (2014): 163–93. http://dx.doi.org/10.1075/itl.165.2.03del.

Texto completo
Resumen
In this paper, we tackle three underresearched issues of the automatic readability assessment literature, namely the evaluation of text readability in less resourced languages, with respect to sentences (as opposed to documents) as well as across textual genres. Different solutions to these issues have been tested by using and refining READ‑IT, the first advanced readability assessment tool for Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ‑IT readability assessment is carried out with respect to both documents and senten
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Nitu, Melania, and Mihai Dascalu. "Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for Romanian." Applied Sciences 14, no. 7 (2024): 2700. http://dx.doi.org/10.3390/app14072700.

Texto completo
Resumen
Authorship attribution for less-resourced languages like Romanian, characterized by the scarcity of large, annotated datasets and the limited number of available NLP tools, poses unique challenges. This study focuses on a hybrid Transformer combining handcrafted linguistic features, ranging from surface indices like word frequencies to syntax, semantics, and discourse markers, with contextualized embeddings from a Romanian BERT encoder. The methodology involves extracting contextualized representations from a pre-trained Romanian BERT model and concatenating them with linguistic features, sele
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Neshir, Girma, Andreas Rauber, and Solomon Atnafu. "Meta-Learner for Amharic Sentiment Classification." Applied Sciences 11, no. 18 (2021): 8489. http://dx.doi.org/10.3390/app11188489.

Texto completo
Resumen
The emergence of the World Wide Web facilitates the growth of user-generated texts in less-resourced languages. Sentiment analysis of these texts may serve as a key performance indicator of the quality of services delivered by companies and government institutions. The presence of user-generated texts is an opportunity for assisting managers and policy-makers. These texts are used to improve performance and increase the level of customers’ satisfaction. Because of this potential, sentiment analysis has been widely researched in the past few years. A plethora of approaches and tools have been d
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Pelicon, Andraž, Ravi Shekhar, Blaž Škrlj, Matthew Purver, and Senja Pollak. "Investigating cross-lingual training for offensive language detection." PeerJ Computer Science 7 (June 25, 2021): e559. http://dx.doi.org/10.7717/peerj-cs.559.

Texto completo
Resumen
Platforms that feature user-generated content (social media, online forums, newspaper comment sections etc.) have to detect and filter offensive speech within large, fast-changing datasets. While many automatic methods have been proposed and achieve good accuracies, most of these focus on the English language, and are hard to apply directly to languages in which few labeled datasets exist. Recent work has therefore investigated the use of cross-lingual transfer learning to solve this problem, training a model in a well-resourced language and transferring to a less-resourced target language; bu
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Akhtar, Md Shad, Palaash Sawant, Sukanta Sen, Asif Ekbal, and Pushpak Bhattacharyya. "Improving Word Embedding Coverage in Less-Resourced Languages Through Multi-Linguality and Cross-Linguality." ACM Transactions on Asian and Low-Resource Language Information Processing 18, no. 2 (2019): 1–22. http://dx.doi.org/10.1145/3273931.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Robnik-Šikonja, Marko, Kristjan Reba, and Igor Mozetič. "Cross-lingual transfer of sentiment classifiers." Slovenščina 2.0: empirical, applied and interdisciplinary research 9, no. 1 (2021): 1–25. http://dx.doi.org/10.4312/slo2.0.2021.1.1-25.

Texto completo
Resumen
Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use c
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Hutin, Mathilde, and Marc Allassonnière-Tang. "Operation LiLi: Using Crowd-Sourced Data and Automatic Alignment to Investigate the Phonetics and Phonology of Less-Resourced Languages." Languages 7, no. 3 (2022): 234. http://dx.doi.org/10.3390/languages7030234.

Texto completo
Resumen
Less-resourced languages are usually left out of phonetic studies based on large corpora. We contribute to the recent efforts to fill this gap by assessing how to use open-access, crowd-sourced audio data from Lingua Libre for phonetic research. Lingua Libre is a participative linguistic library developed by Wikimedia France in 2015. It contains more than 670k recordings in approximately 150 languages across nearly 740 speakers. As a proof of concept, we consider the Inventory Size Hypothesis, which predicts that, in a given system, variation in the realization of each vowel will be inversely
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Shekhar, Ravi, Marko Pranjić., Senja Pollak, Andraž Pelicon, and Matthew Purver. "Automating News Comment Moderation with Limited Resources: Benchmarking in Croatian and Estonian." Journal for Language Technology and Computational Linguistics 34, no. 1 (2020): 49–79. https://doi.org/10.5281/zenodo.4032371.

Texto completo
Resumen
This article describes initial work into the automatic classification of user-generated content in news media to support human moderators. We work with real-world data — comments posted by readers under online news articles — in two less-resourced European languages, Croatian and Estonian. We describe our dataset, and experiments into automatic classification using a range of models. Performance obtained is reasonable but not as good as might be expected given similar work in offensive language classification in other languages; we then investigate possible reasons in terms of the
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Xu, Liang, Haoyang Du, Songkai Jia, Cathy Ennis, Elaine Uí Dhonnchadha, and Monica Ward. "Rekindling Connections to Languages through Socio-Cultural Immersion Using Game-Based Learning and Virtual Reality: Cipher VR Case Study." European Conference on Games Based Learning 18, no. 1 (2024): 872–77. http://dx.doi.org/10.34190/ecgbl.18.1.2722.

Texto completo
Resumen
Traditional language learning methods often fall short in engaging learners, especially in the context of indigenous languages like Irish. In this study we show how the language learning game Cipher VR combines digital game-based language learning with Virtual Reality (VR) to reconnect learners with indigenous languages, using the Irish language as a case study. Initially designed for English, Cipher has undergone several iterations to adapt to the Irish context, and is now completing its metamorphosis into a VR platform aimed at meeting the needs of less-resourced and endangered languages. Th
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Pandit, Rajat, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash, and Mohini Mohan Sardar. "Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language." Informatics 6, no. 2 (2019): 19. http://dx.doi.org/10.3390/informatics6020019.

Texto completo
Resumen
Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest stat
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Vetulani, Zygmunt, Grażyna Vetulani, and Panchanan Mohanty. "Development of real size IT systems with language competence as a challenge for a Less-Resourced Language: a methodological proposal for Indo-Aryan languages." Journal of Information and Telecommunication 5, no. 4 (2021): 514–35. http://dx.doi.org/10.1080/24751839.2021.1966236.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

McGillivray, Barbara, Daria Kondakova, Annie Burman, et al. "A new corpus annotation framework for Latin diachronic lexical semantics." Journal of Latin Linguistics 21, no. 1 (2022): 47–105. http://dx.doi.org/10.1515/joll-2022-2007.

Texto completo
Resumen
Abstract We present a new corpus-based resource and methodology for the annotation of Latin lexical semantics, consisting of 2,399 annotated passages of 40 lemmas from the Latin diachronic corpus LatinISE. We also describe how the annotation was designed, analyse annotators’ styles, and present the preliminary results of a study on the lexical semantics and diachronic change of the 40 lemmas. We complement this analysis with a case study on semantic vagueness. As the availability of digital corpora of ancient languages increases, and as computational research develops new methods for large-sca
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Maia, Belinda. "Corpora, translation, terminology… and beyond: objectives and perspectives." Tradterm 37, no. 1 (2021): 10–29. http://dx.doi.org/10.11606/issn.2317-9511.v37p10-29.

Texto completo
Resumen
This paper will not describe any specific research in corpus linguistics. Instead, it will first reflect on the way many of us teaching languages and translation in university departments develop and use corpora in our research and teaching methodology. One of the objectives is to highlight the work by Professor Stella Tagnin and those of us with whom she has worked over twenty years, even if it does not bring anything new to the immediate area. It will go on to analyze how, apart from the didactic uses of these resources, and related research, their potential for Natural Language Processing (
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Fernandez de Landa, Joseba, Rodrigo Agerri, and Iñaki Alegria. "Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case." Information 10, no. 6 (2019): 212. http://dx.doi.org/10.3390/info10060212.

Texto completo
Resumen
Social networks like Twitter are increasingly important in the creation of new ways of communication. They have also become useful tools for social and linguistic research due to the massive amounts of public textual data available. This is particularly important for less resourced languages, as it allows to apply current natural language processing techniques to large amounts of unstructured data. In this work, we study the linguistic and social aspects of young and adult people’s behaviour based on their tweets’ contents and the social relations that arise from them. With this objective in m
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Bhagath, Parabattina, Malempati Shanmukha, and Pradip K. Das. "Hindi spoken digit analysis for native and non-native speakers." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 2 (2025): 1561. https://doi.org/10.11591/ijai.v14.i2.pp1561-1567.

Texto completo
Resumen
<p>Automated speech recognition (ASR) is the process of using an algorithm or<br />automated system to recognize and translate spoken words of a specific language. ASR has various applications in fields such as mobile speech recognition, the internet of things and human-machine interaction. Researchers have been working on issues related to ASR for more than 60 years. One of the many use cases of ASR is designing applications such as digit recognition that aid differently-abled individuals, children and elderly people. However, there is a lack of spoken language data in under-devel
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Parabattina, Bhagath, Shanmukha Malempati, and K. Das Pradip. "Hindi spoken digit analysis for native and non-native speakers." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 2 (2025): 1561–67. https://doi.org/10.11591/ijai.v14.i2.pp1561-1567.

Texto completo
Resumen
Automated speech recognition (ASR) is the process of using an algorithm or automated system to recognize and translate spoken words of a specific lan guage. ASR has various applications in fields such as mobile speech recogni tion, the internet of things and human-machine interaction. Researchers have been working on issues related to ASR for more than 60 years. One of the many use cases of ASR is designing applications such as digit recognition that aid differently-abled individuals, children and elderly people. However, there is a lack of spoken language data in under-developed and low-resou
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Nitu, Melania, and Mihai Dascalu. "Natural Language Processing Tools for Romanian – Going Beyond a Low-Resource Language." Interaction Design and Architecture(s), no. 60 (March 15, 2024): 7–26. http://dx.doi.org/10.55612/s-5002-060-001sp.

Texto completo
Resumen
Advances in Natural Language Processing bring innovative instruments to the educational field to improve the quality of the didactic process by addressing challenges like language barriers and creating personalized learning experiences. Most research in the domain is dedicated to high-resource languages, such as English, while languages with limited coverage, like Romanian, are still underrepresented in the field. Operating on low-resource languages is essential to ensure equitable access to educational opportunities and to preserve linguistic diversity. Through continuous investments in devel
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Jaworski, Rafał, Sanja Seljan, and Ivan Dunđer. "Four Million Segments and Counting: Building an English-Croatian Parallel Corpus through Crowdsourcing Using a Novel Gamification-Based Platform." Information 14, no. 4 (2023): 226. http://dx.doi.org/10.3390/info14040226.

Texto completo
Resumen
Parallel corpora have been widely used in the fields of natural language processing and translation as they provide crucial multilingual information. They are used to train machine translation systems, compile dictionaries, or generate inter-language word embeddings. There are many corpora available publicly; however, support for some languages is still limited. In this paper, the authors present a framework for collecting, organizing, and storing corpora. The solution was originally designed to obtain data for less-resourced languages, but it proved to work very well for the collection of hig
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Azzat, Media, Karwan Jacksi, and Ismael Ali. "The Kurdish Language Corpus: State of the Art." Science Journal of University of Zakho 11, no. 1 (2023): 125–31. http://dx.doi.org/10.25271/sjuoz.2023.11.1.1123.

Texto completo
Resumen
The notable growth of the digital communities and different online news streams led to the growing availability of online natural language content. However not all natural languages have the enough attention of being made readable and comprehendible to machines. Among these less resourced and paid attention languages is the Kurdish language. Creating the machine-readable text is the first step toward applications of text mining and semantic web, such as translation, information retrieval and recommendation systems. With the de facto challenges in the Kurdish language, such as the scarcity of l
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Wissik, Tanja. "Impact of automatic term extraction on terminology work." Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 31, no. 1 (2025): 110–35. https://doi.org/10.1075/term.00085.wis.

Texto completo
Resumen
Abstract A crucial task in any type of terminology work is identifying and extracting terms from relevant sources, which can be done manually or via (semi-)automatic term extraction processes. Given the recent advances in automatic term extraction (ATE) research, this paper explores the impact of ATE on terminology work in institutional settings (academic institutions, administrations, European institutions and international organizations) based on qualitative data. The analysis of 15 semi-structured expert interviews conducted in 2023 shows that the newest advances in research in ATE have not
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Ferreira Cruz, André, Gil Rocha, and Henrique Lopes Cardoso. "Coreference Resolution: Toward End-to-End and Cross-Lingual Systems." Information 11, no. 2 (2020): 74. http://dx.doi.org/10.3390/info11020074.

Texto completo
Resumen
The task of coreference resolution has attracted considerable attention in the literature due to its importance in deep language understanding and its potential as a subtask in a variety of complex natural language processing problems. In this study, we outlined the field’s terminology, describe existing metrics, their differences and shortcomings, as well as the available corpora and external resources. We analyzed existing state-of-the-art models and approaches, and reviewed recent advances and trends in the field, namely end-to-end systems that jointly model different subtasks of coreferenc
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Teferra, Solomon, Martha Yifiru, and Tanja Schultz. "DNN-based Multilingual Acoustic Modeling for Four Ethiopian Languages." SINET: Ethiopian Journal of Science 46, no. 3 (2024): 237–49. http://dx.doi.org/10.4314/sinet.v46i3.2.

Texto completo
Resumen
In this paper, we present the results of experiments conducted on multilingual acoustic modeling in the development of an Automatic Speech Recognition (ASR) system using speech data of phonetically much related Ethiopian languages (Amharic, Tigrigna, Oromo and Wolaytta) with multilingual (ML) mix and multitask approaches. The use of speech data from only phonetically much related languages brought improvement over results reported in a previous work that used 26 languages (including the four languages). A maximum Word Error Rate (WER) reduction from 25.03% (in the previous work) to 21.52% has
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Lopes-Cardoso, Henrique, Tomás Freitas Osório, Luís Vilar Barbosa, et al. "Robust Complaint Processing in Portuguese." Information 12, no. 12 (2021): 525. http://dx.doi.org/10.3390/info12120525.

Texto completo
Resumen
The Natural Language Processing (NLP) community has witnessed huge improvements in the last years. However, most achievements are evaluated on benchmarked curated corpora, with little attention devoted to user-generated content and less-resourced languages. Despite the fact that recent approaches target the development of multi-lingual tools and models, they still underperform in languages such as Portuguese, for which linguistic resources do not abound. This paper exposes a set of challenges encountered when dealing with a real-world complex NLP problem, based on user-generated complaint data
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Rijhwani, Shruti, Daisy Rosenblum, Antonios Anastasopoulos, and Graham Neubig. "Lexically Aware Semi-Supervised Learning for OCR Post-Correction." Transactions of the Association for Computational Linguistics 9 (2021): 1285–302. http://dx.doi.org/10.1162/tacl_a_00427.

Texto completo
Resumen
Abstract Much of the existing linguistic data in many languages of the world is locked away in non- digitized books and documents. Optical character recognition (OCR) can be used to produce digitized text, and previous work has demonstrated the utility of neural post-correction methods that improve the results of general- purpose OCR systems on recognition of less- well-resourced languages. However, these methods rely on manually curated post- correction data, which are relatively scarce compared to the non-annotated raw images that need to be digitized. In this paper, we present a semi-superv
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Vivek A. Manwar. "Enhancement of Query Refinement for Marathi Language Information Retrieval." Communications on Applied Nonlinear Analysis 32, no. 9s (2025): 2487–505. https://doi.org/10.52783/cana.v32.4544.

Texto completo
Resumen
Introduction: The Information Retrieval (IR) enables users to access relevant data in a language different from their queries. The IR involves the capability to submit a query in one language and retrieve documents in the same or another language, like Marathi or English. This IR is accomplished by creating a system that compares a query in one language with data in the same or another language. Objectives: This study presents a novel approach to enhance query refinement for Marathi language information retrieval, focusing on addressing the language's unique linguistic complexities. Methods: p
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Badawi, Soran S. "Using Multilingual Bidirectional Encoder Representations from Transformers on Medical Corpus for Kurdish Text Classification." ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY 11, no. 1 (2023): 10–15. http://dx.doi.org/10.14500/aro.11088.

Texto completo
Resumen
Technology has dominated a huge part of human life. Furthermore, technology users use language continuously to express feelings and sentiments about things. The science behind identifying human attitudes toward a particular product, service,or topic is one of the most active fields of research, and it is called sentiment analysis. While the English language is making real progress in sentiment analysis daily, other less-resourced languages, such as Kurdish, still suffer from fundamental issues and challenges in Natural Language Processing (NLP). This paper experimentswith the recently publishe
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Yeshambel, Tilahun, Josiane Mothe, and Yaregal Assabie. "Amharic Adhoc Information Retrieval System Based on Morphological Features." Applied Sciences 12, no. 3 (2022): 1294. http://dx.doi.org/10.3390/app12031294.

Texto completo
Resumen
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic. Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of Amharic nat
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Rananga, Seani, Bassey Isong, Abiodun Modupe, and Vukosi Marivate. "Misinformation Detection: A Review for High and Low-Resource Languages." Journal of Information Systems and Informatics 6, no. 4 (2024): 2892–922. https://doi.org/10.51519/journalisi.v6i4.931.

Texto completo
Resumen
The rapid spread of misinformation on platforms like Twitter, and Facebook, and in news headlines highlights the urgent need for effective ways to detect it. Currently, researchers are increasingly using machine learning (ML) and deep learning (DL) techniques to tackle misinformation detection (MID) because of their proven success. However, this task is still challenging due to the complexity of deceptive language, digital editing tools, and the lack of reliable linguistic resources for non-English languages. This paper provides a comprehensive analysis of relevant research, providing insights
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Kumar, Aarti, and Sujoy Das. "Dealing with Relevance Ranking in Cross-Lingual Cross-Script Text Reuse." International Journal of Information Retrieval Research 6, no. 1 (2016): 16–35. http://dx.doi.org/10.4018/ijirr.2016010102.

Texto completo
Resumen
Proliferation of multilingual content on the web has paved way for text reuse to get cross-lingual and also cross script. Identifying cross language text reuse becomes tougher if one considers cross-script less resourced languages. This paper focuses on identifying text reuse between English-Hindi news articles and improving their relevance ranking using two phases (i) Heuristic retrieval phase for reducing search space and (ii) post processing phase for improving the relevance ranking. Dictionary based strategy of Cross-Language Information Retrieval is used for heuristic retrieval and Parse
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Ataa Allah, Fadoua, and Siham Boulaknadel. "Morpho-Lexicon for standard Moroccan Amazigh." MATEC Web of Conferences 210 (2018): 04024. http://dx.doi.org/10.1051/matecconf/201821004024.

Texto completo
Resumen
Standardized resources are key components for the development of applications related to human language technology. Therefore, it is important to adopt it for designing lexical resources, especially for less commonly resourced languages such Amazigh. This language is spoken by many North African communities, including Morocco. Due to historical, geographical and sociolinguistic factors, the Amazigh language is characterized by the proliferation of many intervarieties, which has led to a complex morphology. This latter poses significant challenge to NLP tasks, especially that Amazigh language b
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Liapis, Charalampos M., Konstantinos Kyritsis, Isidoros Perikos, Nikolaos Spatiotis, and Michael Paraskevas. "A Hybrid Ensemble Approach for Greek Text Classification Based on Multilingual Models." Big Data and Cognitive Computing 8, no. 10 (2024): 137. http://dx.doi.org/10.3390/bdcc8100137.

Texto completo
Resumen
The present study explores the field of text classification in the Greek language. A novel ensemble classification scheme based on generated embeddings from Greek text made by the multilingual capabilities of the E5 model is presented. Our approach incorporates partial transfer learning by using pre-trained models to extract embeddings, enabling the evaluation of classical classifiers on Greek data. Additionally, we enhance the predictive capability while maintaining the costs low by employing a soft voting combination scheme that exploits the strengths of XGBoost, K-nearest neighbors, and log
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Tarmizi, Nursyahirah, Suhaila Saee, and Dayang Hanani Abang Ibrahim. "TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING." ASEAN Engineering Journal 13, no. 2 (2023): 145–57. http://dx.doi.org/10.11113/aej.v13.19171.

Texto completo
Resumen
Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media forensic investigation (SMF) to unveil the genuine identity of the offender by analysing the text written in OSN by the candidate culprits. Several challenges in AI dealing with OSN text, including limited text length and informal language full of internet jargon and grammatical errors that further
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Rackevičienė, Sigita, Liudmila Mockienė, Andrius Utka, and Aivaras Rokas. "Methodological Framework for the Development of an English-Lithuanian Cybersecurity Termbase." Studies about Languages, no. 39 (November 27, 2021): 85–92. http://dx.doi.org/10.5755/j01.sal.1.39.29156.

Texto completo
Resumen
The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and lin
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Idrees, Saman, and Hossein Hassani. "Exploiting Script Similarities to Compensate for the Large Amount of Data in Training Tesseract LSTM: Towards Kurdish OCR." Applied Sciences 11, no. 20 (2021): 9752. http://dx.doi.org/10.3390/app11209752.

Texto completo
Resumen
Applications based on Long-Short-Term Memory (LSTM) require large amounts of data for their training. Tesseract LSTM is a popular Optical Character Recognition (OCR) engine that has been trained and used in various languages. However, its training becomes obstructed when the target language is not resourceful. This research suggests a remedy for the problem of scant data in training Tesseract LSTM for a new language by exploiting a training dataset for a language with a similar script. The target of the experiment is Kurdish. It is a multi-dialect language and is considered less-resourced. We
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Teixeira de Sousa, Lílian. "Sobre a constituição de corpora para línguas com poucos recursos." Revista Linguíʃtica 16, no. 1 (2020): 43–61. http://dx.doi.org/10.31513/linguistica.2020.v16n1a31709.

Texto completo
Resumen
O uso de corpora em estudos linguísticos é bastante antigo, já a área da Linguística de Corpus é relativamente nova, tendo sua origem vinculada à ampliação do acesso a computadores e, consequentemente, ao Processamento de Linguagem Natural (PLN). À medida que a área foi ganhando influência na pesquisa linguística, o conceito de corpus foi se tornando mais específico e elementos como amplitude e referência, além de legibilidade por máquina e tamanho finito, passaram a se tornar fundamentais para a composição de amostras na área. Ao mesmo tempo, no entanto, foram surgindo corpora menores e bem m
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Prince C. Azubuike and Innocent I Umeh. "Design and implementation of an automated web-based Igbo text analyzer using natural language processing (NLP) tools." World Journal of Advanced Research and Reviews 23, no. 3 (2024): 1036–45. http://dx.doi.org/10.30574/wjarr.2024.23.3.2691.

Texto completo
Resumen
Presently in the world, the Igbo language is one of the less-resourced languages because there are not many developed and easy-to-find digital resources for it. Digital resources such as Igbo text corpora, Igbo electronic dictionaries, Igbo morphological analyzers, and Igbo thesauri, which can analyze Igbo text documents, are very limited. This work aims to design and develop an automated Igbo text analyzer using Natural Language Processing (NLP) tools. The development of this web-based Igbo text analyzer involves the analysis of the lexical and grammatical characteristics of the Igbo language
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Prince, C. Azubuike, and I. Umeh Innocent. "Design and implementation of an automated web-based Igbo text analyzer using natural language processing (NLP) tools." World Journal of Advanced Research and Reviews 23, no. 3 (2024): 1036–45. https://doi.org/10.5281/zenodo.14938082.

Texto completo
Resumen
Presently in the world, the Igbo language is one of the less-resourced languages because there are not many developed and easy-to-find digital resources for it. Digital resources such as Igbo text corpora, Igbo electronic dictionaries, Igbo morphological analyzers, and Igbo thesauri, which can analyze Igbo text documents, are very limited. This work aims to design and develop an automated Igbo text analyzer using Natural Language Processing (NLP) tools. The development of this web-based Igbo text analyzer involves the analysis of the lexical and grammatical characteristics of the Igbo language
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Šmajdek, Uroš, Matjaž Zupanič, Maj Zirkelbach, and Meta Jazbinšek. "Adapting an English Corpus and a Question Answering System for Slovene." Slovenščina 2.0: empirical applied and interdisciplinary research 11, no. 1 (2023): 247–74. http://dx.doi.org/10.4312/slo2.0.2023.1.247-274.

Texto completo
Resumen
Developing effective question answering (QA) models for less-resourced languages like Slovene is challenging due to the lack of proper training data. Modern machine translation tools can address this issue, but this presents another challenge: the given answers must be found in their exact form within the given context since the model is trained to locate answers and not generate them. To address this challenge, we propose a method that embeds the answers within the context before translation and evaluate its effectiveness on the SQuAD 2.0 dataset translated using both eTranslation and Google
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Trinley, Ngawang, Tenzin, Dirk Schmidt, Helios Hildt, and Tenzin Kaldan. "Taming the Wild Etext: Managing, Annotating, and Sharing Tibetan Corpora in Open Spaces." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 2 (2021): 1–23. http://dx.doi.org/10.1145/3418060.

Texto completo
Resumen
Digital text is quickly becoming essential to modern daily life. The article you are reading right now is born digital; unlike texts of the not-so-distant past, it may never be printed at all. Worldwide, the trend is clear: Digital text is on the way in, and print is on its way out. Year-by-year, more and more readers are turning to ebooks, internet news, and other forms of ereading, while generation by generation, print is becoming less and less relevant. 1 1 Pew research shows 50% of Americans have a dedicated ereading device, with yearly gains in ereadership [1]; industry research, too, sho
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Eshetu, Abebawu, Getenesh Teshome, and Tewodros Abebe. "Learning Word and Sub-word Vectors for Amharic (Less Resourced Language)." International Journal of Advanced Engineering Research and Science 7, no. 8 (2020): 358–66. http://dx.doi.org/10.22161/ijaers.78.39.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Alexandris, Christina. "GenAI and Socially Responsible AI in Natural Language Processing Applications: A Linguistic Perspective." Proceedings of the AAAI Symposium Series 3, no. 1 (2024): 330–37. http://dx.doi.org/10.1609/aaaiss.v3i1.31230.

Texto completo
Resumen
It is a widely-accepted fact that the processing of very large amounts of data with state-of-the-art Natural Language Processing (NLP) practices (i.e. Machine Learning –ML, language agnostic approaches) has resulted to a dramatic improvement in the speed and efficiency of systems and applications. However, these developments are accompanied with several challenges and difficulties that have been voiced within the last years. Specifically, in regard to NLP, evident improvement in the speed and efficiency of systems and applications with GenAI also entails some aspects that may be problematic, e
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Coetzee, Stephen A., Astrid Schmulian, and Lizette Kotze. "Communication Apprehension of South African Accounting Students: The Effect of Culture and Language." Issues in Accounting Education 29, no. 4 (2014): 505–25. http://dx.doi.org/10.2308/iace-50850.

Texto completo
Resumen
ABSTRACT Developing communication skills is an objective of many accounting education programs. Students' communication apprehension may hamper this. This study explores South African accounting students' communication apprehension and the association thereof with culture and home and instruction language. Data were collected using the Personal Report of Communication Apprehension (PRCA-24) and Written Communication Apprehension (WCA) self-report questionnaires. South Africa provides an example of the salience of race, given past racial segregation. Culture is, however, more complex than physi
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Obosu, Gideon Kwesi, Irene Vanderpuye, Nana Afia Opoku-Asare, and Timothy Olufemi Adigun. "A Qualitative Inquiry into the Factors that Influence Deaf Children's Early Sign Language Acquisition among Deaf Children in Ghana." Sign Language Studies 23, no. 4 (2023): 527–54. http://dx.doi.org/10.1353/sls.2023.a905538.

Texto completo
Resumen
Abstract: The linguistic and cognitive importance of early language exposure for deaf children is well reported in the literature. However, most of such studies have been conducted in industrialized countries with less of such studies conducted in developing and nonindustrialized countries such as Ghana. Therefore, hinged on the social interactionist theory of language development, this study explored the factors that influence early acquisition of sign language among deaf children from a low-resource setting in Ghana. Ten mothers of deaf children from these communities were purposively select
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Choi, Carolyn Areum. "Transperipheral Educational Mobility: Less Privileged South Korean Young Adults Pursuing English Language Study in a Peripheral City in the Philippines." positions: asia critique 30, no. 2 (2022): 377–407. http://dx.doi.org/10.1215/10679847-9573396.

Texto completo
Resumen
Abstract The pursuit of overseas English language education by South Korean youth has resulted in a hierarchy of educational destinations, with migrants studying English in the Global North attaining higher cultural capital compared to those learning English in the Global South. This article examines the experiences of South Korean youth who pursue education in English language schools in the provincial Philippines. Using in-depth interviews and participant observation with South Korean educational migrants in the Philippines and South Korea, it outlines class and regional dynamics in a patter
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Hajar, Anas. "Examining the Impact of Immediate Family Members on Gulf Arab EFL Students’ Strategic Language Learning and Development." RELC Journal 50, no. 2 (2017): 285–99. http://dx.doi.org/10.1177/0033688217716534.

Texto completo
Resumen
This article provides a qualitative inquiry into the influences of immediate family members (i.e. parents and siblings) on a group of Gulf Arab EFL students regarding their language learning experiences and strategy use in their Arab homelands. The participants came from financially comfortable families, with different levels of education. The data collected from a written narrative and four subsequent semi-structured interviews suggest that the occupation and educational attainment of the participants’ family figures (mostly parents) affected the amount and kind of support these families offe
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Sepesy Maučec, Mirjam, Darinka Verdonik, and Gregor Donaj. "Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian." Applied Sciences 14, no. 20 (2024): 9515. http://dx.doi.org/10.3390/app14209515.

Texto completo
Resumen
Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This paper deals with one such source, namely speech from the less-resourced highly inflected Slovenian language. The paper explores speech corpora recently collected in public and private environments. We analyze the efficiencies of three sequence-to-sequence models for automatic
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Yekple, Sampson L. K., Veronica Serwaa Ofosu, and Innocent Yao Vinyo. "Ending Literacy Poverty: The Role of Early Childhood Educators and Caregivers in Developing Oral Language." European Journal of Language and Culture Studies 1, no. 4 (2022): 1–8. http://dx.doi.org/10.24018/ejlang.2022.1.4.16.

Texto completo
Resumen
Oral literacy development is the basis for other language skills, lifelong learning, and acquisition of indigenous knowledge. The oral literacy skills of language lay a solid foundation for other skills. This paper aims at exploring the teacher role of oral literacy facilitation in the early grades. The paper is a qualitative type. Thematic explanatory and descriptive approach were used. Population for the study was all primary schools in a deprived district of Volta Region in Ghana. Fifty early grade classrooms were purposively selected in four circuits in the district for data collection. Ob
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!