To see the other types of publications on this topic, follow the link: Arabic dataset.

Journal articles on the topic 'Arabic dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Arabic dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Sarwati Rahayu, Sulis Sandiwarno, Erwin Dwika Putra, Marissa Utami, and Hadiguna Setiawan. "Model Sequential Resnet50 Untuk Pengenalan Tulisan Tangan Aksara Arab." JSAI (Journal Scientific and Applied Informatics) 6, no. 2 (2023): 234–41. http://dx.doi.org/10.36085/jsai.v6i2.5379.

Full text
Abstract:
Research for Arabic handwriting recognition is still limited. The number of public datasets regarding Arabic script is still limited for this type of public dataset. Therefore, each study usually uses its dataset to conduct research. However, recently public datasets have become available and become research opportunities to compare methods with the same dataset. This study aimed to determine the implementation of the transfer learning model with the best accuracy for handwriting recognition in Arabic script. The results of the experiment using ResNet50 are as follows: training accuracy is 91.
APA, Harvard, Vancouver, ISO, and other styles
2

I. Abdalla, Mahmoud, Mohsen A. Rashwan, and Mohamed A. Elserafy. "Generating realistic Arabic handwriting dataset." International Journal of Engineering & Technology 8, no. 4 (2019): 460. http://dx.doi.org/10.14419/ijet.v8i4.29786.

Full text
Abstract:
During the previous year's holistic approach showing satisfactory results to solve ‎the ‎problem of Arabic handwriting word recognition instead of word letters ‎‎segmentation.‎ ‎In this paper, we present an efficient system for ‎ generation realistic Arabic handwriting dataset from ASCII input ‎text. We carefully selected simple word list that contains most Arabic ‎letters normal and ligature connection cases. To improve the ‎performance of new letters reproduction we developed our ‎normalization method that adapt its clustering action according to ‎created Arabic letters families. We enhanced
APA, Harvard, Vancouver, ISO, and other styles
3

Altamimi, Mohammed, and Abdulaziz M. Alayba. "ANAD: Arabic news article dataset." Data in Brief 50 (October 2023): 109460. http://dx.doi.org/10.1016/j.dib.2023.109460.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Rajih Mohammed, Zaid, and Ahmed H. Aliwy. "English-Arabic Phonetic Dataset construction." BIO Web of Conferences 97 (2024): 00057. http://dx.doi.org/10.1051/bioconf/20249700057.

Full text
Abstract:
In the field of natural language processing, the effectiveness of a semantic similarity task is significantly influenced by the presence of an extensive corpus. While numerous monolingual corpora exist, predominantly in English, the availability of multilingual resources remains quite restricted. In this study, we present a semi- automated framework designed for generating a multilingual phonetic English- Arabic corpus, specifically tailored for application in multilingual phonetically and semantic similarity tasks. The proposed model consists of four phases: data gathering, preprocessing and
APA, Harvard, Vancouver, ISO, and other styles
5

Alqifari, Reem, Hend Al-Khalifa, and Simon O’Keefe. "Arabic Temporal Common Sense Understanding." Computation 13, no. 1 (2024): 5. https://doi.org/10.3390/computation13010005.

Full text
Abstract:
Natural language understanding (NLU) includes temporal text understanding, which can be complex and encompasses temporal common sense understanding. There are many challenges in comprehending common sense within a text. Currently, there is a limited number of datasets containing temporal common sense in English and there is an absence of such datasets specifically for the Arabic language. In this study, an Arabic dataset was constructed based on an available English dataset. This dataset is considered a valuable resource for the Arabic community. Consequently, different multilingual pre-traine
APA, Harvard, Vancouver, ISO, and other styles
6

Elteir, Marwa K. "Fine-Grained Arabic Post (Tweet) Geolocation Prediction Using Deep Learning Techniques." Information 16, no. 1 (2025): 65. https://doi.org/10.3390/info16010065.

Full text
Abstract:
Leveraging Twitter data for crisis management necessitates the accurate, fine-grained geolocation of tweets, which unfortunately is often lacking, with only 1–3% of tweets being geolocated. This work addresses the understudied problem of fine-grained geolocation prediction for Arabic tweets, focusing on the Kingdom of Saudi Arabia. The goal is to accurately assign tweets to one of thirteen provinces. Existing approaches for Arabic geolocation are limited in accuracy and often rely on basic machine learning techniques. Additionally, advancements in tweet geolocation for other languages often re
APA, Harvard, Vancouver, ISO, and other styles
7

Turki, Hussain Mohammed, Essam Al Daoud, Ghassan Samara, et al. "Arabic fake news detection using hybrid contextual features." International Journal of Electrical and Computer Engineering (IJECE) 15, no. 1 (2025): 836. http://dx.doi.org/10.11591/ijece.v15i1.pp836-845.

Full text
Abstract:
Technology has advanced and social media users have grown dramatically in the last decade. Because social media makes information easily accessible, some people or organizations distribute false news for political or commercial gain. This news may influence elections and attitudes. Even though English fake news is widely detected and limited, Arabic fake news is hard to recognize owing to a lack of study and data collection. Wara Arabic bidirectional encoder representations from transformers (WaraBERT), a hybrid feature extraction approach, combines word level tokenization with two Arabic bidi
APA, Harvard, Vancouver, ISO, and other styles
8

Mustafa, Dheya, Safaa M. Khabour, Mousa Al-kfairy, and Ahmed Shatnawi. "Leveraging sentiment analysis of food delivery services reviews using deep learning and word embedding." PeerJ Computer Science 11 (February 19, 2025): e2669. https://doi.org/10.7717/peerj-cs.2669.

Full text
Abstract:
Companies that deliver food (food delivery services, or FDS) try to use customer feedback to identify aspects where the customer experience could be improved. Consumer feedback on purchasing and receiving goods via online platforms is a crucial tool for learning about a company’s performance. Many English-language studies have been conducted on sentiment analysis (SA). Arabic is becoming one of the most extensively written languages on the World Wide Web, but because of its morphological and grammatical difficulty as well as the lack of openly accessible resources for Arabic SA, like as dictio
APA, Harvard, Vancouver, ISO, and other styles
9

Shaker, Noor Haydar, and Ban N. Dhannoon. "Word embedding for detecting cyberbullying based on recurrent neural networks." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 500. http://dx.doi.org/10.11591/ijai.v13.i1.pp500-508.

Full text
Abstract:
<span lang="EN-US">The phenomenon of cyberbullying has spread and has become one of the biggest problems facing users of social media sites and generated significant adverse effects on society and the victim in particular. Finding appropriate solutions to detect and reduce cyberbullying has become necessary to mitigate its negative impacts on society and the victim. Twitter comments on two datasets are used to detect cyberbullying, the first dataset was the Arabic cyberbullying dataset, and the second was the English cyberbullying dataset. Three different pre-trained global vectors (GloV
APA, Harvard, Vancouver, ISO, and other styles
10

Shaker, Noor Haydar, and Ban N. Dhannoon. "Word embedding for detecting cyberbullying based on recurrent neural networks." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 500–508. https://doi.org/10.11591/ijai.v13.i1.pp500-508.

Full text
Abstract:
The phenomenon of cyberbullying has spread and has become one of the biggest problems facing users of social media sites and generated significant adverse effects on society and the victim in particular. Finding appropriate solutions to detect and reduce cyberbullying has become necessary to mitigate its negative impacts on society and the victim. Twitter comments on two datasets are used to detect cyberbullying, the first dataset was the Arabic cyberbullying dataset, and the second was the English cyberbullying dataset. Three different pre-trained global vectors (GloVe) corpora with different
APA, Harvard, Vancouver, ISO, and other styles
11

Alderazi, Fatima, Abdulelah Algosaibi, Mohammed Alabdullatif, Hafiz Farooq Ahmad, Ali Mustafa Qamar, and Abdulaziz Albarrak. "Generative artificial intelligence in topic-sentiment classification for Arabic text: a comparative study with possible future directions." PeerJ Computer Science 10 (July 10, 2024): e2081. http://dx.doi.org/10.7717/peerj-cs.2081.

Full text
Abstract:
Social media platforms have become essential for disseminating news and expressing individual sentiments on various life topics. Arabic, widely used in the Middle East, presents unique challenges for sentiment analysis due to its complexity and multiple dialects. Motivated by the need to address these challenges, this article develops methods to overcome the lack of topic-based labeling techniques, compares different approaches for preparing extensive, annotated datasets, and analyzes the efficacy of machine learning (ML), deep learning (DL), and large language models (LLMs) in classifying Ara
APA, Harvard, Vancouver, ISO, and other styles
12

Turki, Hussain Mohammed, Essam Al Daoud, Ghassan Samara, et al. "Arabic fake news detection using hybrid contextual features." International Journal of Electrical and Computer Engineering (IJECE) 15, no. 1 (2025): 836–45. https://doi.org/10.11591/ijece.v15i1.pp836-845.

Full text
Abstract:
Technology has advanced and social media users have grown dramatically in the last decade. Because social media makes information easily accessible, some people or organizations distribute false news for political or commercial gain. This news may influence elections and attitudes. Even though English fake news is widely detected and limited, Arabic fake news is hard to recognize owing to a lack of study and data collection. Wara Arabic bidirectional encoder representations from transformers (WaraBERT), a hybrid feature extraction app
APA, Harvard, Vancouver, ISO, and other styles
13

Tutar, Mehmet. "Comparison of Handwritten Recognition Methods on Arabic and Latin Characters." Journal of Studies in Science and Engineering 2, no. 3 (2022): 22–30. http://dx.doi.org/10.53898/josse2022232.

Full text
Abstract:
In this article, both machine learning techniques and deep learning methods were applied on the digit datasets created using the Arabic and Latin alphabets, and the performances of the methods were compared. Each method was tested with various parameters and the results were analyzed. In addition, with this study, the recognizability of handwritten numeral datasets created using different alphabets was also observed. For experiments, an Arabic alphabet handwritten digit dataset (60,000 training and 10,000 testings) and a Latin alphabet handwritten digit dataset (60,000 training and 10,000 test
APA, Harvard, Vancouver, ISO, and other styles
14

Assad, Ali, Abdul Hadi M. Alaidi, Amjad Yousif Sahib, Haider TH Salim ALRikabi, and Ahmed Magdy. "Transformer-based automatic Arabic text diacritization." Sustainable Engineering and Innovation 6, no. 2 (2024): 285–96. https://doi.org/10.37868/sei.v6i2.id305.

Full text
Abstract:
In Arabic natural language processing (NLP), automatic text diacritization is a major obstacle, and progress has been slow when compared to other language processing tasks. Automatic diacritical marking of Arabic text is proposed in this work using the first transformer-based paradigm designed solely for this task. By taking advantage of the attention mechanism, our system is able to capture more of the innate patterns in Arabic, surpassing the performance of both rule-based alternatives and neural network techniques. The model trained with the Clean-50 dataset had a diacritic error rate (DER)
APA, Harvard, Vancouver, ISO, and other styles
15

Assad, Ali, Abdul Hadi M. Alaidi, Amjad Yousif Sahib, Haider TH Salim ALRikabi, and Ahmed Magdy. "Transformer-based automatic Arabic text diacritization." Sustainable Engineering and Innovation 6, no. 2 (2024): 285–96. http://dx.doi.org/10.37868/sei.v6i2.id392.

Full text
Abstract:
In Arabic natural language processing (NLP), automatic text diacritization is a major obstacle, and progress has been slow when compared to other language processing tasks. Automatic diacritical marking of Arabic text is proposed in this work using the first transformer-based paradigm designed solely for this task. By taking advantage of the attention mechanism, our system is able to capture more of the innate patterns in Arabic, surpassing the performance of both rule-based alternatives and neural network techniques. The model trained with the Clean-50 dataset had a diacritic error rate (DER)
APA, Harvard, Vancouver, ISO, and other styles
16

Lakshen, Guma, Valentina Janev, and Sanja Vranes. "Arabic linked drug dataset consolidating and publishing." Computer Science and Information Systems, no. 00 (2020): 47. http://dx.doi.org/10.2298/csis200510047l.

Full text
Abstract:
The paper examines the process of creating and publishing an Arabic Linked Drug Dataset based on open drug datasets from selected Arabic countries and discusses quality issues considered in the linked data lifecycle when establishing a semantic Data Lake in the pharmaceutical domain. Through representation of the data in an open machine-readable format, the approach provides an optimum solution for information and dissemination of data and for building specialized applications. Authors contribute to opening the drug datasets from Arabic countries, interlinking the data with diverse repositorie
APA, Harvard, Vancouver, ISO, and other styles
17

Fuad, Ahlam, and Maha Al-Yahya. "AraConv: Developing an Arabic Task-Oriented Dialogue System Using Multi-Lingual Transformer Model mT5." Applied Sciences 12, no. 4 (2022): 1881. http://dx.doi.org/10.3390/app12041881.

Full text
Abstract:
Task-oriented dialogue systems (DS) are designed to help users perform daily activities using natural language. Task-oriented DS for English language have demonstrated promising performance outcomes; however, developing such systems to support Arabic remains a challenge. This challenge is mainly due to the lack of Arabic dialogue datasets. This study introduces the first Arabic end-to-end generative model for task-oriented DS (AraConv), which uses the multi-lingual transformer model mT5 with different settings. We also present an Arabic dialogue dataset (Arabic-TOD) and used it to train and te
APA, Harvard, Vancouver, ISO, and other styles
18

Elsafty, Hossam, Bouthaina Abdou, Tobias Deußer, Maren Pielka, Christian Bauckhage, and Rafet Sifa. "ArDia: Improving Arabic Dialectal Language Classification Using a Novel Dataset." Proceedings of the International AAAI Conference on Web and Social Media 19 (June 7, 2025): 2413–22. https://doi.org/10.1609/icwsm.v19i1.35944.

Full text
Abstract:
Despite Arabic being one of the most widely spoken languages, there is a scarcity of available dialectal Arabic data. In this paper, we address this challenge by proposing a novel approach to data collection through the main use of video captions from TikTok, and other resources such as dictionaries and articles, resulting in the creation of the ArDia dataset. To the best of our knowledge, the ArDia dataset is the largest labeled dialectal Arabic dataset, containing over 900,000 examples, each labeled with its respective dialect. We further leverage this dataset to pretrain transformer-based m
APA, Harvard, Vancouver, ISO, and other styles
19

Boutouta, Hanane, Abdelaziz Lakhfif, Ferial Senator, and Chahrazed Mediani. "A Transformer-based Hybrid Model for Implicit Emotion Recognition in Arabic Text." Engineering, Technology & Applied Science Research 15, no. 3 (2025): 23834–39. https://doi.org/10.48084/etasr.10261.

Full text
Abstract:
Implicit emotion recognition has emerged as an active area of research in modern Natural Language Processing (NLP). Unlike explicit emotions, which are directly expressed through emotional words, implicit emotions are inferred from the surrounding context, making their detection more challenging. While most research in Arabic NLP has focused on recognizing explicit emotions, the study of implicit emotions remains largely unexplored, primarily due to its unique linguistic and morphological characteristics. The current study addresses this gap by compiling an Arabic dataset for the implicit emot
APA, Harvard, Vancouver, ISO, and other styles
20

Alyami, Sarah N., and Sunday O. Olatunji. "Application of Support Vector Machine for Arabic Sentiment Classification Using Twitter-Based Dataset." Journal of Information & Knowledge Management 19, no. 01 (2020): 2040018. http://dx.doi.org/10.1142/s0219649220400183.

Full text
Abstract:
Sentiment classification is the process of classifying emotions and opinions in texts. In this study, the problem of Arabic sentiment analysis was addressed. A support vector machine (SVM) model was proposed to classify opinions in Arabic micro-texts as being positive or negative. To evaluate the performance of the SVM model, a dataset was built from tweets discussing several social issues in Saudi Arabia. These issues include changes that were implemented by the country as part of a newly established vision, known as Saudi Arabia Vision 2030. The constructed dataset was manually annotated acc
APA, Harvard, Vancouver, ISO, and other styles
21

Aftan, Sulaiman, and Habib Shah. "Using the AraBERT Model for Customer Satisfaction Classification of Telecom Sectors in Saudi Arabia." Brain Sciences 13, no. 1 (2023): 147. http://dx.doi.org/10.3390/brainsci13010147.

Full text
Abstract:
Customer satisfaction and loyalty are essential for every business. Feedback prediction and social media classification are crucial and play a key role in accurately identifying customer satisfaction. This paper presents sentiment analysis-based customer feedback prediction based on Twitter Arabic datasets of telecommunications companies in Saudi Arabia. The human brain, which contains billions of neurons, provides feedback based on the current and past experience provided by the services and other related stakeholders. Artificial Intelligent (AI) based methods, parallel to human brain process
APA, Harvard, Vancouver, ISO, and other styles
22

Kaseb, Gehad S., and Mona F. Ahmed. "Extended-ATSD: Arabic Tweets Sentiment Dataset." Journal of Engineering and Applied Sciences 14, no. 14 (2019): 4780–85. http://dx.doi.org/10.36478/jeasci.2019.4780.4785.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Latif, Ghazanfar, Nazeeruddin Mohammad, Jaafar Alghazo, Roaa AlKhalaf, and Rawan AlKhalaf. "ArASL: Arabic Alphabets Sign Language Dataset." Data in Brief 23 (April 2019): 103777. http://dx.doi.org/10.1016/j.dib.2019.103777.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Lataifeh, Mohammed, and Ashraf Elnagar. "Ar-DAD: Arabic diversified audio dataset." Data in Brief 33 (December 2020): 106503. http://dx.doi.org/10.1016/j.dib.2020.106503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Baniata, Laith H., and Sangwoo Kang. "Transformer Text Classification Model for Arabic Dialects That Utilizes Inductive Transfer." Mathematics 11, no. 24 (2023): 4960. http://dx.doi.org/10.3390/math11244960.

Full text
Abstract:
In the realm of the five-category classification endeavor, there has been limited exploration of applied techniques for classifying Arabic text. These methods have primarily leaned on single-task learning, incorporating manually crafted features that lack robust sentence representations. Recently, the Transformer paradigm has emerged as a highly promising alternative. However, when these models are trained using single-task learning, they often face challenges in achieving outstanding performance and generating robust latent feature representations, especially when dealing with small datasets.
APA, Harvard, Vancouver, ISO, and other styles
26

Miaad Raisan Khudhair. "Classification of Arabic Geographical Research Papers Using Machine Learning Techniques: A Comparative Analysis of TF-IDF and Word2Vec." Journal of Information Systems Engineering and Management 10, no. 27s (2025): 45–57. https://doi.org/10.52783/jisem.v10i27s.4377.

Full text
Abstract:
The classification of Arabic geographical research papers presents a unique challenge due to linguistic complexities and the absence of standardized datasets. In this study, we introduce a novel approach by creating a new dataset, comprising Arabic texts extracted from geographical research papers including research files, abstracts and geographical categories (human or physical geography). After preprocessing and text cleaning, TF-IDF and Word2Vec were employed as feature extraction techniques. Four machine learning models were tested: Naïve Bayes, Logistic Regression, Support Vector Machine
APA, Harvard, Vancouver, ISO, and other styles
27

Bin Durayhim, Anfal, Amani Al-Ajlan, Isra Al-Turaiki, and Najwa Altwaijry. "Towards Accurate Children’s Arabic Handwriting Recognition via Deep Learning." Applied Sciences 13, no. 3 (2023): 1692. http://dx.doi.org/10.3390/app13031692.

Full text
Abstract:
Automatic handwriting recognition has received considerable attention over the past three decades. Handwriting recognition systems are useful for a wide range of applications. Much research has been conducted to address the problem in Latin languages. However, less research has focused on the Arabic language, especially concerning recognizing children’s Arabic handwriting. This task is essential as the demand for educational applications to practice writing and spelling Arabic letters is increasing. Thus, the development of Arabic handwriting recognition systems and applications for children i
APA, Harvard, Vancouver, ISO, and other styles
28

ALBayari, Reem, and Sherief Abdallah. "Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text." Data 7, no. 7 (2022): 83. http://dx.doi.org/10.3390/data7070083.

Full text
Abstract:
(1) Background: the ability to use social media to communicate without revealing one’s real identity has created an attractive setting for cyberbullying. Several studies targeted social media to collect their datasets with the aim of automatically detecting offensive language. However, the majority of the datasets were in English, not in Arabic. Even the few Arabic datasets that were collected, none focused on Instagram despite being a major social media platform in the Arab world. (2) Methods: we use the official Instagram APIs to collect our dataset. To consider the dataset as a benchmark, w
APA, Harvard, Vancouver, ISO, and other styles
29

Mohammad, Adel Hamdan. "Arabic Text Classification: A Review." Modern Applied Science 13, no. 5 (2019): 88. http://dx.doi.org/10.5539/mas.v13n5p88.

Full text
Abstract:
Text classification is an important topic. The number of electronic documents available on line is massive. Text classification aims to classify documents into a set of predefined categories.  Number of researches conducted on English dataset is great in comparison with number of researches done using Arabic dataset. This research could be considered as reference for most researchers who deal with Arabic dataset. This research used the most well-known algorithms used in text classification with Arabic dataset. Besides that, dataset used in this research is large enough in comparison w
APA, Harvard, Vancouver, ISO, and other styles
30

Naser-Karajah, Eman, and Nabil Arman. "Arabic Lexical Substitution: AraLexSubD Dataset and AraLexSub Pipeline." Data 9, no. 8 (2024): 98. http://dx.doi.org/10.3390/data9080098.

Full text
Abstract:
Lexical substitution aims to generate a list of equivalent substitutions (i.e., synonyms) to a sentence’s target word or phrase while preserving the sentence’s meaning to improve writing, enhance language understanding, improve natural language processing models, and handle ambiguity. This task has recently attracted much attention in many languages. Despite the richness of Arabic vocabulary, limited research has been performed on the lexical substitution task due to the lack of annotated data. To bridge this gap, we present the first Arabic lexical substitution benchmark dataset AraLexSubD fo
APA, Harvard, Vancouver, ISO, and other styles
31

A. Al Shamsi, Arwa, and Sherief Abdallah. "Sentiment Analysis of Emirati Dialects." Big Data and Cognitive Computing 6, no. 2 (2022): 57. http://dx.doi.org/10.3390/bdcc6020057.

Full text
Abstract:
Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP) field have been conducted for text classification and sentiment analysis. Moreover, the number of studies that target Arabic dialects has also increased. In this research paper, we constructed the first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated the comments in the dataset based on text polarity, dividing them into positive, negative, and neutral categories
APA, Harvard, Vancouver, ISO, and other styles
32

Almalki, Jameel. "A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets." PeerJ Computer Science 8 (July 26, 2022): e1047. http://dx.doi.org/10.7717/peerj-cs.1047.

Full text
Abstract:
Social media platforms such as Twitter, YouTube, Instagram and Facebook are leading sources of large datasets nowadays. Twitter’s data is one of the most reliable due to its privacy policy. Tweets have been used for sentiment analysis and to identify meaningful information within the dataset. Our study focused on the distance learning domain in Saudi Arabia by analyzing Arabic tweets about distance learning. This work proposes a model for analyzing people’s feedback using a Twitter dataset in the distance learning domain. The proposed model is based on the Apache Spark product to manage the la
APA, Harvard, Vancouver, ISO, and other styles
33

Bilal, Zaid Saad, Amir Gargouri, Hanaa F. Mahmood, and Hassene Mnif. "Comparison of Collective Diverse Arabic Sign Language Dataset." Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications 15, no. 4 (2024): 133–50. https://doi.org/10.58346/jowua.2024.i4.009.

Full text
Abstract:
Machine learning researchers from all around the world continue to work out the best ways to collect data efficiently. Data collecting has recently emerged as a key concern for two primary reasons. Despite the fact that machine learning is making significant progress, there may not be enough labelled data for some new applications. Furthermore, deep learning methods have the benefit of automatically creating features, which is not the case with traditional machine learning methods. With this, model design becomes more affordable, although more labelled data may be required. Particularly, the c
APA, Harvard, Vancouver, ISO, and other styles
34

Almaqtari, Hani, Feng Zeng, and Ammar Mohammed. "Enhancing Arabic Sentiment Analysis of Consumer Reviews: Machine Learning and Deep Learning Methods Based on NLP." Algorithms 17, no. 11 (2024): 495. http://dx.doi.org/10.3390/a17110495.

Full text
Abstract:
Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic and its dialects poses challenges due to the language’s intricate morphology, right-to-left script, and nuanced emotional expressions. To address this, this study introduces the Arb-MCNN-Bi Model, which integrates the strengths of the transformer-based AraBER
APA, Harvard, Vancouver, ISO, and other styles
35

Youssef, Nahla Ibrahim, and Nadia Abd-Alsabour. "A REVIEW ON ARABIC HANDWRITING RECOGNITION." Journal of Southwest Jiaotong University 57, no. 6 (2022): 745–64. http://dx.doi.org/10.35741/issn.0258-2724.57.6.66.

Full text
Abstract:
Handwriting recognition is considered a very hard area of research, especially for Arabic, because of its ligatures, cursive nature, diacritics, and overlapping. Although many studies have been conducted on Arabic recognition, this field still has many unsolved problems. This work aims to provide a comprehensive review of various strategies for handling Arabic handwriting recognition. Furthermore, it details handwriting recognition, general recognition, Arabic recognition, its characteristics, and the difficulties it faces. Additionally, we discuss online and offline Arabic recognition and oth
APA, Harvard, Vancouver, ISO, and other styles
36

Muaad, Abdullah Y., Hanumanthappa Jayappa, Mugahed A. Al-antari, and Sungyoung Lee. "ArCAR: A Novel Deep Learning Computer-Aided Recognition for Character-Level Arabic Text Representation and Recognition." Algorithms 14, no. 7 (2021): 216. http://dx.doi.org/10.3390/a14070216.

Full text
Abstract:
Arabic text classification is a process to simultaneously categorize the different contextual Arabic contents into a proper category. In this paper, a novel deep learning Arabic text computer-aided recognition (ArCAR) is proposed to represent and recognize Arabic text at the character level. The input Arabic text is quantized in the form of 1D vectors for each Arabic character to represent a 2D array for the ArCAR system. The ArCAR system is validated over 5-fold cross-validation tests for two applications: Arabic text document classification and Arabic sentiment analysis. For document classif
APA, Harvard, Vancouver, ISO, and other styles
37

Almutairi, Sara, and Fahad Alotaibi. "A Comparative Analysis for Arabic Sentiment Analysis Models In E-Marketing Using Deep Learning Techniques." Journal of Engineering and Applied Sciences 10, no. 1 (2023): 19. http://dx.doi.org/10.5455/jeas.2023050102.

Full text
Abstract:
The Internet has a huge amount of information when it comes to analysis, much of which is valuable and significant. Arabic Sentiment Analysis (SA) is a method responsible for analyzing people’s thoughts, feelings, and responses to a variety of products and services on social networking and commercial sites. Several researchers utilize sentiment analysis to determine the opinions of customers in various areas, including e-marketing, business, and other fields. Deep learning (DL) is a useful technology for developing sentiment analysis models to improve e-marketing operations. There are a few st
APA, Harvard, Vancouver, ISO, and other styles
38

Gamal, Donia, Marco Alfonse, El-Sayed M.El-Horbaty, and Abdel-Badeeh M.Salem. "Twitter Benchmark Dataset for Arabic Sentiment Analysis." International Journal of Modern Education and Computer Science 11, no. 1 (2019): 33–38. http://dx.doi.org/10.5815/ijmecs.2019.01.04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Zahir, Jihad. "IADD: An integrated Arabic dialect identification dataset." Data in Brief 40 (February 2022): 107777. http://dx.doi.org/10.1016/j.dib.2021.107777.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Saleh Al-Sheikh, Idris, Masnizah Mohd, and Lia Warlina. "A Review of Arabic Text Recognition Dataset." Asia-Pacific Journal of Information Technology and Multimedia 09, no. 01 (2020): 69–81. http://dx.doi.org/10.17576/apjitm-2020-0901-06.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Mostafa, Mohamed A., and Ahmad Almogren. "VERA-ARAB: unveiling the Arabic tweets credibility by constructing balanced news dataset for veracity analysis." PeerJ Computer Science 10 (October 30, 2024): e2432. http://dx.doi.org/10.7717/peerj-cs.2432.

Full text
Abstract:
The proliferation of fake news on social media platforms necessitates the development of reliable datasets for effective fake news detection and veracity analysis. In this article, we introduce a veracity dataset of Arabic tweets called “VERA-ARAB”, a pioneering large-scale dataset designed to enhance fake news detection in Arabic tweets. VERA-ARAB is a balanced, multi-domain, and multi-dialectal dataset, containing both fake and true news, meticulously verified by fact-checking experts from Misbar. Comprising approximately 20,000 tweets from 13,000 distinct users and covering 884 different cl
APA, Harvard, Vancouver, ISO, and other styles
42

Hamed Abd, Dhafar, Ahmed T. Sadiq, and Ayad R. Abbas. "PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION." Iraqi Journal for Computers and Informatics 46, no. 1 (2020): 1–10. http://dx.doi.org/10.25195/ijci.v46i1.246.

Full text
Abstract:
Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards th
APA, Harvard, Vancouver, ISO, and other styles
43

Alhefdhi, Khawlah, Abdulmalik Alsalman, and Safi Faizullah. "Toward Building a Domain-Based Dataset for Arabic Handwritten Text Recognition." Electronics 14, no. 12 (2025): 2461. https://doi.org/10.3390/electronics14122461.

Full text
Abstract:
The problem of automatic recognition of handwritten text has recently been widely discussed in the research community. Handwritten text recognition is considered a challenging task for cursive scripts, such as Arabic-language scripts, due to their complex properties. Although the demand for automatic text recognition is growing, especially to assist in digitizing archival documents, limited datasets are available for Arabic handwritten text compared to other languages. In this paper, we present novel work on building the Real Estate and Judicial Documents dataset (REJD dataset), which aims to
APA, Harvard, Vancouver, ISO, and other styles
44

Ibrahim, Haneen Siraj, Narjis Mezaal Shati, and AbdulRahman A. Alsewari. "A Transfer Learning Approach for Arabic Image Captions." Al-Mustansiriyah Journal of Science 35, no. 3 (2024): 81–90. http://dx.doi.org/10.23851/mjs.v35i3.1485.

Full text
Abstract:
Background: Arabic image captioning (AIC) is the automatic generation of text descriptions in the Arabic language for images. Applies a transfer learning approach in deep learning to enhance computer vision and natural language processing. There are many datasets in English reverse other languages. Instead of, the Arabs researchers unanimously agreed that there is a lack of Arabic databases available in this field. Objective: This paper presents the improvement and processing of the available Arabic textual database using Google spreadsheets for translation and creation of AR. Flicker8k2023 da
APA, Harvard, Vancouver, ISO, and other styles
45

Elhassan, Nasrin, Giuseppe Varone, Rami Ahmed, et al. "Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning." Computers 12, no. 6 (2023): 126. http://dx.doi.org/10.3390/computers12060126.

Full text
Abstract:
Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities a
APA, Harvard, Vancouver, ISO, and other styles
46

Omran, Thuraya, Baraa Sharef, Crina Grosan, and Yongmin Li. "Sentiment Analysis of Multilingual Dataset of Bahraini Dialects, Arabic, and English." Data 8, no. 4 (2023): 68. http://dx.doi.org/10.3390/data8040068.

Full text
Abstract:
Sentiment analysis is an application of natural language processing (NLP) that requires a machine learning algorithm and a dataset. In some cases, the dataset availability is scarce, particularly with Arabic dialects, precisely the Bahraini ones, which necessitates using an approach such as translation, where a rich source language is exploited to create the target language dataset. In this study, a dataset of Amazon product reviews in Bahraini dialects is presented. This dataset was generated using two cascading stages of translation—a machine translation followed by a manual one. Machine tra
APA, Harvard, Vancouver, ISO, and other styles
47

Afzal, Tania, Sadaf Abdul Rauf, Muhammad Ghulam Abbas Malik, and Muhammad Imran. "Fine-Tuning QurSim on Monolingual and Multilingual Models for Semantic Search." Information 16, no. 2 (2025): 84. https://doi.org/10.3390/info16020084.

Full text
Abstract:
Transformers have made a significant breakthrough in natural language processing. These models are trained on large datasets and can handle multiple tasks. We compare monolingual and multilingual transformer models for semantic relatedness and verse retrieval. We leveraged data from the original QurSim dataset (Arabic) and used authentic multi-author translations in 22 languages to create a multilingual QurSim dataset, which we released for the research community. We evaluated the performance of monolingual and multilingual LLMs for Arabic and our results show that monolingual LLMs give better
APA, Harvard, Vancouver, ISO, and other styles
48

AlMasaud, Alanod, and Heyam H. Al-Baity. "AraMAMS: Arabic Multi-Aspect, Multi-Sentiment Restaurants Reviews Corpus for Aspect-Based Sentiment Analysis." Sustainability 15, no. 16 (2023): 12268. http://dx.doi.org/10.3390/su151612268.

Full text
Abstract:
The abundance of data on the internet makes analysis a must. Aspect-based sentiment analysis helps extract valuable information from textual data. Because of limited Arabic resources, this paper enriches the Arabic dataset landscape by creating AraMA, the first and largest Arabic multi-aspect corpus. AraMA comprises 10,750 Google Maps reviews for restaurants in Riyadh, Saudi Arabia. It covers four aspect categories—food, environment, service, and price—along with four sentiment polarities: positive, negative, neutral, and conflict. All AraMA reviews are labeled with at least two aspect categor
APA, Harvard, Vancouver, ISO, and other styles
49

Masruroh, Siti Ummi, Muhammad Fikri Syahid, Firman Munthaha, Asep Taufik Muharram, and Rizka Amalia Putri. "Deep Convolutional Neural Networks Transfer Learning Comparison on Arabic Handwriting Recognition System." JOIV : International Journal on Informatics Visualization 7, no. 2 (2023): 330. http://dx.doi.org/10.30630/joiv.7.2.1605.

Full text
Abstract:
Around 27 languages and more than 420 million people worldwide use Arabic letters. That makes the Arabic language one of the most used languages. However, the Arabic language has a challenge, namely the difference in letters based on their position. Arabic handwriting recognition is important for various applications, such as education and communication. One example is during a pandemic when most education has turned digital, making recognizing students' Arabic handwriting difficult. This paper aims to create a model that can recognize Arabic handwriting by comparing several CNN architectures
APA, Harvard, Vancouver, ISO, and other styles
50

Baniata, Laith H., and Sangwoo Kang. "Switch-Transformer Sentiment Analysis Model for Arabic Dialects That Utilizes a Mixture of Experts Mechanism." Mathematics 12, no. 2 (2024): 242. http://dx.doi.org/10.3390/math12020242.

Full text
Abstract:
In recent years, models such as the transformer have demonstrated impressive capabilities in the realm of natural language processing. However, these models are known for their complexity and the substantial training they require. Furthermore, the self-attention mechanism within the transformer, designed to capture semantic relationships among words in sequences, faces challenges when dealing with short sequences. This limitation hinders its effectiveness in five-polarity Arabic sentiment analysis (SA) tasks. The switch-transformer model has surfaced as a potential substitute. Nevertheless, wh
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!