Log in

Relevant bibliographies by topics / Term Frequency-Inverse Document Frequency vectorization / Journal articles

To see the other types of publications on this topic, follow the link: Term Frequency-Inverse Document Frequency vectorization.

Journal articles on the topic 'Term Frequency-Inverse Document Frequency vectorization'

Author: Grafiati

Published: 7 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Term Frequency-Inverse Document Frequency vectorization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Shafah, Ali, Ahmed Suleiman, and Samira Alshafah. "Impact Feature Vectorization Methods on Arabic Large Data Using Logistic Regression Classification." University of Zawia Journal of Engineering Sciences and Technology 1, no. 1 (2024): 22–29. http://dx.doi.org/10.26629/uzjest.2023.03.

Full text

Abstract:

The process of assigning text documents to a predetermined set of categories is known as text categorization. The objective of this study is to present experimental assessments of various feature vectorization methods for the purpose of categorizing a large Arabic corpus using a logistic regression classifier. N-Gram, Bag of Words, and Term Frequency–Inverse Document Frequency are these methods. A corpus of around 111,000 Arabic documents was utilized, which was split up into five categories: news, sports, culture, economics, and varied. Each method's experimental findings were assessed using

APA, Harvard, Vancouver, ISO, and other styles

2

M, Ms AISHWARYA LAKSHMI. "MOVIE SIMILARITY FROM PLOT SUMMARIES." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem33647.

Full text

Abstract:

This project focuses on developing a Python application to analyze and measure the similarity between movie plot summaries. The goal is to provide a tool that can assist in identifying similarities between movies based on their storyline, enabling users to discover related movies or recommend similar ones.The project utilizes natural language processing (NLP) techniques, particularly text preprocessing, vectorization, and similarity metrics, to achieve its objectives. First, it preprocesses the plot summaries by removing stop words, punctuation, and performing stemming or lemmatization to norm

APA, Harvard, Vancouver, ISO, and other styles

3

Bhargavi, A. D. "Comparative Study of Static and Contextual Text Vectorization for Sentiment Analysis." International Journal for Research in Applied Science and Engineering Technology 13, no. 7 (2025): 484–88. https://doi.org/10.22214/ijraset.2025.73045.

Full text

Abstract:

Sentiment analysis, a core task in Natural Language Processing (NLP), relies heavily on effective text representation techniques to capture semantic and syntactic nuances. This study presents a comparative analysis of widely-used vectorization methods—Bag of Words (BoW), Term Frequency–Inverse Document Frequency (TF-IDF), Word2Vec, GloVe, BERT, and RoBERTa—in the context of sentiment classification. Using the IMDb movie reviews dataset, each method is evaluated based on classification performance, using accuracy and F1-score as primary metrics. Results demonstrate that while deep contextual em

APA, Harvard, Vancouver, ISO, and other styles

4

Aminu, Bunyaminu Khalid, Dr Anupa Sinha, and Ahmad Mustapha. "PhishGuard: A Machine Learning Framework for Windows-Specific Phishing Detection." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 81–89. https://doi.org/10.22214/ijraset.2025.70104.

Full text

Abstract:

Abstract: Phishing remains one of the most prevalent and evolving cybersecurity threats, exploiting humanvulnerabilities through deceptive digital communication. This study proposes a dynamic, Windows-specific phishing detection model leveraging Random Forest machine learning techniques. By integrating Term Frequency–Inverse Document Frequency (TF-IDF) vectorization with structured email features, the model classifies phishing and legitimate emails with high accuracy. Using secondary data and publicly available datasets, the model achieved a classification accuracy of 98.31% and demonstrated b

APA, Harvard, Vancouver, ISO, and other styles

5

Winahyu, Sri Kusuma, Fawwaz Zaini Ahmad, Achril Zalmansyah, et al. "Sentence Classification Using Machine Learning and Word Embedding: An Innovation in Indonesian Language Learning." Journal of Language Teaching and Research 16, no. 4 (2025): 1225–39. https://doi.org/10.17507/jltr.1604.17.

Full text

Abstract:

In applied linguistics, writing assessment examines language learning. There are various genres in writing, but the evaluation always includes a syntactic component or sentence structure. This research focuses on classifying sentence structure in the Indonesian language using the Random Forest Classifier algorithm on five different experiment models, which are trained using different vectorization techniques, including bag of word (BoW), hashing, Term Frequency-Inverse Document Frequency (TF-IDF), CBoW, and skipgram vectorizers. The results showed that the accuracy of the models varied signifi

APA, Harvard, Vancouver, ISO, and other styles

6

Rahman, Abdur, Abu Nayem, and Saeed Siddik. "Non-Functional Requirements Classification Using Machine Learning Algorithms." International Journal of Intelligent Systems and Applications 15, no. 3 (2023): 56–69. http://dx.doi.org/10.5815/ijisa.2023.03.05.

Full text

Abstract:

Non-functional requirements define the quality attribute of a software application, which are necessary to identify in the early stage of software development life cycle. Researchers proposed automatic software Non-functional requirement classification using several Machine Learning (ML) algorithms with a combination of various vectorization techniques. However, using the best combination in Non-functional requirement classification still needs to be clarified. In this paper, we examined whether different combinations of feature extraction techniques and ML algorithms varied in the non-functio

APA, Harvard, Vancouver, ISO, and other styles

7

Anuradha, Surabhi, Pothabathula Naga Jyothi, Surabhi Sivakumar, and Martha Sheshikala. "RecommendRift: a leap forward in user experience with transfer learning on netflix recommendations." Indonesian Journal of Electrical Engineering and Computer Science 36, no. 2 (2024): 1218. http://dx.doi.org/10.11591/ijeecs.v36.i2.pp1218-1225.

Full text

Abstract:

In today’s fast-paced lifestyle, streaming movies and series on platforms like Netflix is a valued recreational activity. However, users often spend considerable time searching for the right content and receive irrelevant recommendations, particularly when facing the “cold start problem” for new users. This challenge arises from existing recommender systems relying on factors like casting, title, and genre, using term frequency-inverse document frequency (TF-IDF) for vectorization, which prioritizes word frequency over semantic meaning. To address this, an innovative recommender system conside

APA, Harvard, Vancouver, ISO, and other styles

8

Tao, Chang, Shaoming Zheng, Shuhong Wang, et al. "On Defect Grading for the Relay Protection Devices Based on TF-IDF Assignment and Simple Classifiers." Journal of Physics: Conference Series 2433, no. 1 (2023): 012023. http://dx.doi.org/10.1088/1742-6596/2433/1/012023.

Full text

Abstract:

Abstract Accurate grading of relay protection device (RPD) defects can improve the maintenance and reliability of RPD to ensure the safety of power grid. Based on the text record of defects of RPDs in a regional power grid and the defect text dictionary, this paper analyses the defect grading method with Term Frequency-Inverse Document Frequency (TF-IDF) assignment method and simple classifiers. The details are as follows: firstly, the construction of relay protection devices defect dictionary is introduced. Secondly, the vectorization text of relay protection devices defect is formed; combine

APA, Harvard, Vancouver, ISO, and other styles

9

Lu, Jiaxin. "Text vectorization in sentiment analysis: A comparative study of TF-IDF and Word2Vec from Amazon Fine Food Reviews." ITM Web of Conferences 70 (2025): 03001. https://doi.org/10.1051/itmconf/20257003001.

Full text

Abstract:

Sentiment analysis is a practical tool for marketing and branding teams. Companies can collect and analyze opinions or reviews from social media platforms, blog posts, and other numerous forums. It may help them acquire positive feedback to reinforce strengths or identify negative emotions to make improvements. The research is to compare two text vectorization methods in opinion mining: Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec, using Amazon Fine Food Reviews dataset. This study will use these two methods to vectorize preprocessed text data and also input the vectorized d

APA, Harvard, Vancouver, ISO, and other styles

10

Surabhi, Anuradha Pothabathula Naga Jyothi Surabhi Sivakumar Martha Sheshikala. "RecommendRift: a leap forward in user experience with transfer learning on netflix recommendations." Indonesian Journal of Electrical Engineering and Computer Science 36, no. 2 (2024): 1218–25. https://doi.org/10.11591/ijeecs.v36.i2.pp1218-1225.

Full text

Abstract:

In today’s fast-paced lifestyle, streaming movies and series on platforms like  Netflix is a valued recreational activity. However, users often spend considerable time searching for the right content and receive irrelevant recommendations, particularly when facing the “cold start problem” for new users. This challenge arises from existing recommender systems relying on factors like casting, title, and genre, using term frequency-inverse document frequency (TF-IDF) for vectorization, which prioritizes word frequency over semantic meaning. To address this, an innovative re

APA, Harvard, Vancouver, ISO, and other styles

11

Muhammad Akmal Ahmad Nawawi and Tajul Rosli Razak. "Encouraging Recycling in Bangi Selatan Through a Content-Based Filtering Web Application." Journal of Computing Research and Innovation 10, no. 1 (2025): 218–26. https://doi.org/10.24191/jcrinn.v10i1.510.

Full text

Abstract:

This study addresses the challenges faced by residents of Bangi Selatan in adopting 3R (Reduce, Reuse, Recycle) practices, primarily due to a lack of interest in conservation efforts and insufficient awareness of recycling’s importance. To address these challenges, we presented a web application that enhances recycling adoption by delivering personalized content recommendations. The key contributions of this study include the development of a novel recommendation system based on content-based filtering (CBF) with improved accuracy through a modified Term Frequency-Inverse Document Frequency (T

APA, Harvard, Vancouver, ISO, and other styles

12

Subrhamanyam, Kolusu, Aduri DharaniSri, Devarakonda HemaSri, Anumanedi Yamini, Chakarajamula Denith Siva Sai, and Bandela Jaswanth. "A HYBRID APPROACH FOR EMOTION-DRIVEN GAME RECOMMENDATIONS USING TEXT, VOICE AND IMAGE RECOGNITION." Industrial Engineering Journal 54, no. 02 (2025): 81–89. https://doi.org/10.36893/iej.2025.v52i2.009.

Full text

Abstract:

This implementation presents a game recommendation system that utilizes natural language processing (NLP) techniques to provide personalized game suggestions based on user preferences. The system processes a dataset of games containing descriptions and emotional tones to determine relevant recommendations. It employs TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to transform textual data into numerical representations, enabling meaningful comparisons between game content and user input. The cosine similarity metric is then used to assess the closeness of games to the given p

APA, Harvard, Vancouver, ISO, and other styles

13

Surianto, Dewi Fatmarani, and Dewi Fatmawati Surianto. "Enhancing K-Means Clustering for Journal Articles using TF-IDF and LDA Feature Extraction." Brilliance: Research of Artificial Intelligence 4, no. 2 (2025): 964–72. https://doi.org/10.47709/brilliance.v4i2.5547.

Full text

Abstract:

Clustering is a fundamental technique in data analysis, particularly in unsupervised learning, to group data with similar characteristics. However, the effectiveness of the K-Means algorithm in text clustering heavily depends on proper feature extraction. This study proposes an enhanced feature extraction approach by integrating Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) to improve clustering performance on journal article datasets. The dataset consists of 427 journal article abstracts collected from Google Scholar. The preprocessing steps include

APA, Harvard, Vancouver, ISO, and other styles

14

Kucuk, Ekrem, Ipek Cicek, Zeynep Kucukakcali, and Cihan Yetis. "Comparative analysis of machine learning algorithms for biomedical text document classification: A case study on cancer-related publications." Medicine Science | International Medical Journal 13, no. 1 (2024): 171. http://dx.doi.org/10.5455/medscience.2023.10.209.

Full text

Abstract:

Biomedical text document classification is an essential task within Natural Language Processing (NLP), with applications ranging from sentiment analysis to authorship identification. Despite advancements in traditional machine-learning algorithms like Support Vector Machines (SVM) and Logistic Regression, challenges such as data sparsity and high dimensionality persist. Recent years have seen a surge in the use of deep learning models to mitigate these issues. This study aims to conduct a comparative analysis of various machine-learning algorithms for classifying biomedical text documents. The

APA, Harvard, Vancouver, ISO, and other styles

15

Malik, Tariq, Najma Hanif, Ahsen Tahir, et al. "Crowd Control, Planning, and Prediction Using Sentiment Analysis: An Alert System for City Authorities." Applied Sciences 13, no. 3 (2023): 1592. http://dx.doi.org/10.3390/app13031592.

Full text

Abstract:

Modern means of communication, economic crises, and political decisions play imperative roles in reshaping political and administrative systems throughout the world. Twitter, a micro-blogging website, has gained paramount importance in terms of public opinion-sharing. Manual intelligence of law enforcement agencies (i.e., in changing situations) cannot cope in real time. Thus, to address this problem, we built an alert system for government authorities in the province of Punjab, Pakistan. The alert system gathers real-time data from Twitter in English and Roman Urdu about forthcoming gathering

APA, Harvard, Vancouver, ISO, and other styles

16

Sivaiah, B., Grishma, S. Tharun, and A. Shashi. "Drug Recommendation System on Sentiment Analysis of Drug Reviews by Using Machine learning." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (2024): 2019–23. http://dx.doi.org/10.22214/ijraset.2024.59247.

Full text

Abstract:

Abstract: With the healthcare system facing increased challenges due to the COVID-19 pandemic, there's a big demand for new ideas to help doctors and nurses. This paper suggests a new way of using computers to help doctors decide which medicines to give to patients. By using smart computer programs, we can make it easier for healthcare workers to handle their workload and provide better care for patients. By analyzing patient reviews, we employ sentiment analysis using advanced vectorization methods like Bag of Words, Term Frequency-Inverse Document Frequency (TF-IDF), and Manual Feature Analy

APA, Harvard, Vancouver, ISO, and other styles

17

Virumeshwaran, Muthu, and R. Thirumahal. "TF-IDF Vectorization and Clustering for Extractive Text Summarization." March 2024 6, no. 1 (2024): 96–111. http://dx.doi.org/10.36548/jitdw.2024.1.008.

Full text

Abstract:

Extractive document summarization is a vital technique for condensing large volumes of text while retaining key information. This research introduces a dynamic feature space mapping approach to enhance extractive document summarization, aiming to succinctly encapsulate key information from extensive text volumes. The proposed method involves extracting various document properties like term frequency, sentence length, and position to comprehensively describe content. By employing a mapping function, these features are projected into a dynamic feature space, enhancing summarization efficiency an

APA, Harvard, Vancouver, ISO, and other styles

18

Bounabi, Mariem, Karim Elmoutaouakil, and Khalid Satori. "A new neutrosophic TF-IDF term weighting for text mining tasks: text classification use case." International Journal of Web Information Systems 17, no. 3 (2021): 229–49. http://dx.doi.org/10.1108/ijwis-11-2020-0067.

Full text

Abstract:

Purpose This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals. Design/methodology/approach After preprocessing textual dat

APA, Harvard, Vancouver, ISO, and other styles

19

Hanić, Sanja, Marina Bagić Babac, Gordan Gledec, and Marko Horvat. "Comparing Machine Learning Models for Sentiment Analysis and Rating Prediction of Vegan and Vegetarian Restaurant Reviews." Computers 13, no. 10 (2024): 248. http://dx.doi.org/10.3390/computers13100248.

Full text

Abstract:

The paper investigates the relationship between written reviews and numerical ratings of vegan and vegetarian restaurants, aiming to develop a predictive model that accurately determines numerical ratings based on review content. The dataset was obtained by scraping reviews from November 2022 until January 2023 from the TripAdvisor website. The study applies multidimensional scaling and clustering using the KNN algorithm to visually represent the textual data. Sentiment analysis and rating predictions are conducted using neural networks, support vector machines (SVM), random forest, Naïve Baye

APA, Harvard, Vancouver, ISO, and other styles

20

Khan, Sara, and Saurabh Pal. "User Interface Bug Classification Model Using ML and NLP Techniques: A Comparative Performance Analysis of ML Models." International Journal of Experimental Research and Review 45, Spl Vol (2024): 56–69. https://doi.org/10.52756/ijerr.2024.v45spl.005.

Full text

Abstract:

Analyzing user interface (UI) bugs is an important step taken by testers and developers to assess the usability of the software product. UI bug classification helps in understanding the nature and cause of software failures. Manually classifying thousands of bugs is an inefficient and tedious job for both testers and developers. Objective of this research is to develop a classification model for the User Interface (UI) related bugs using supervised Machine Learning (ML) algorithms and Natural Language Processing (NLP) techniques. Also, to assess the effect of different sampling and feature vec

APA, Harvard, Vancouver, ISO, and other styles

21

Y., Sri Navya, Pranathi K., Srija G., and Hifsa Naaz Syeda. "Spam Detection Using Machine Learning: A Logistic Regression Approach." Advancement of Computer Technology and its Applications 8, no. 3 (2025): 1–10. https://doi.org/10.5281/zenodo.15093547.

Full text

Abstract:

<em>Spam emails pose a significant challenge to digital communication, leading to security threats and reduced productivity. This paper proposes a machine learning-based strategy to spam detection using Logistic Regression with TF-IDF vectorization. The dataset is prepared by handling missing values and normalizing labels. A TF-IDF model with bigram inclusion is implemented for feature extraction, followed by a balanced Logistic Regression classifier to address class imbalance. Experimental results indicate promising accuracy, demonstrating the effectiveness of the proposed method. Future enha

APA, Harvard, Vancouver, ISO, and other styles

22

Israt Jahan, Md Nazmul Hasan, Syed Nurul Islam, et al. "Advanced machine learning techniques for fake news detection: A comprehensive analysis." Magna Scientia Advanced Research and Reviews 12, no. 2 (2024): 203–12. https://doi.org/10.30574/msarr.2024.12.2.0198.

Full text

Abstract:

The rise of fake news has become a significant global concern, undermining public trust and information integrity. This study explores the application of advanced machine learning algorithms for detecting fake news, leveraging a balanced dataset of real and fake news articles. Through rigorous preprocessing, including text cleaning and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, the study enhances data quality and model performance. Five machine learning models—Random Forest, Support Vector Machine (SVM), Neural Networks, Logistic Regression, and Naïve Bayes—are systemati

APA, Harvard, Vancouver, ISO, and other styles

23

Bilgen, Yusuf, and Mahmut Kaya. "EGMA: Ensemble Learning-Based Hybrid Model Approach for Spam Detection." Applied Sciences 14, no. 21 (2024): 9669. http://dx.doi.org/10.3390/app14219669.

Full text

Abstract:

Spam messages have emerged as a significant issue in digital communication, adversely affecting users’ mental health, personal safety, and network resources. Traditional spam detection methods often suffer from low detection rates and high false positives, underscoring the need for more effective solutions. This paper proposes the EGMA model, an ensemble learning-based hybrid approach for spam detection in SMS messages, which integrates gated recurrent unit (GRU), multilayer perceptron (MLP), and hybrid autoencoder models utilizing a majority voting algorithm. The EGMA model enhances performan

APA, Harvard, Vancouver, ISO, and other styles

24

T, Ms MADHU, Ms MONICA M, and Ms SHYMA S. "URL BASED PHISHING DETECTION." International Scientific Journal of Engineering and Management 04, no. 01 (2025): 1–6. https://doi.org/10.55041/isjem02220.

Full text

Abstract:

Phishing attacks, which deceive users into revealing sensitive information by mimicking legitimate websites, pose a growing threat in the digital age. To address this challenge, we propose a machine learning- based system for detecting phishing URLs. The system uses logistic regression in conjunction with TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to analyze and classify URLs as either legitimate or phishing. By identifying suspicious patterns in URL structures, such as unusual domain names, special characters, or deceptive keywords, the model effectively predicts whether

APA, Harvard, Vancouver, ISO, and other styles

25

Wang, Jiaqi. "Comparative Analysis of Machine Learning and Deep Learning Models for Text Emotion Classification in Federated Learning." Applied and Computational Engineering 155, no. 1 (2025): 220–27. https://doi.org/10.54254/2755-2721/2025.gl23568.

Full text

Abstract:

Text sentiment analysis is an important aspect of natural language processing (NLP), playing an essential role in understanding public opinion, enhancing customer experience, and informing data-driven decisions in sectors such as business and policy-making. This study aims to systematically compare the performance of traditional machine learning models (Support Vector Machine (SVM) and Logistic Regression) with Bidirectional Encoder Representations from Transformers (BERT) with a federated learning framework. To represent text features, SVM and Logistic Regression are implemented using Term Fr

APA, Harvard, Vancouver, ISO, and other styles

26

Pacol, Caren A., and Thelma D. Palaoag. "Enhancing Sentiment Analysis of Textual Feedback in the Student-Faculty Evaluation using Machine Learning Techniques." European Journal of Engineering Science and Technology 4, no. 1 (2021): 27–34. http://dx.doi.org/10.33422/ejest.v4i1.604.

Full text

Abstract:

Sentiment Analysis has been an interesting and popular research area encouraging researchers and practitioners to adopt this tool in various fields such as the government, health care and education. In education, instruction evaluation is one of the activities that sentiment analysis has served. Though, it is a common practice that educational institutions periodically evaluate their teachers’ performance, students’ comments which are rich in insights are not easily taken into account because of lack of automated text analytics methods. In this study, supervised machine learning algorithms wer

APA, Harvard, Vancouver, ISO, and other styles

27

hra, Chait, Dr G. M. Lingaraju, and Dr S. Jagannatha. "Automatic Web Page Classification System with Improved Accuracy." Webology 18, no. 2 (2021): 225–42. http://dx.doi.org/10.14704/web/v18i2/web18318.

Full text

Abstract:

Nowadays, the Internet contain s a wide variety of online documents, making finding useful information about a given subject impossible, as well as retrieving irrelevant pages. Web document and page recognition software is useful in a variety of fields, including news, medicine, and fitness, research, and information technology. To enhance search capability, a large number of web page classification methods have been proposed, especially for news web pages. Furthermore existing classification approaches seek to distinguish news web pages while still reducing the high dimensionality of features

APA, Harvard, Vancouver, ISO, and other styles

28

Graham, S. Scott, Savannah Shifflet, Maaz Amjad, and Kasey Claborn. "An interpretable machine learning framework for opioid overdose surveillance from emergency medical services records." PLOS ONE 19, no. 1 (2024): e0292170. http://dx.doi.org/10.1371/journal.pone.0292170.

Full text

Abstract:

The goal of this study is to develop and validate a lightweight, interpretable machine learning (ML) classifier to identify opioid overdoses in emergency medical services (EMS) records. We conducted a comparative assessment of three feature engineering approaches designed for use with unstructured narrative data. Opioid overdose annotations were provided by two harm reduction paramedics and two supporting annotators trained to reliably match expert annotations. Candidate feature engineering techniques included term frequency-inverse document frequency (TF-IDF), a highly performant approach to

APA, Harvard, Vancouver, ISO, and other styles

29

Murato, Demetrius Milton, Bruno Samways dos Santos, and Rafael Henrique Palma Lima. "Clustering and Analysis of Tweets Related to Petrobras." Cadernos do IME - Série Informática 49 (August 6, 2024): 113–31. http://dx.doi.org/10.12957/cadinf.2024.82401.

Full text

Abstract:

This study aimed to cluster and analyze tweets associated with Petrobras, exploring its meaning and user profiles on social media to understand their impact on financial markets. The research applied a workflow including the data collection from Twitter's API (current X), preprocessing of tweets using Python libraries, word vectorization via Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), Principal Component Analysis (PCA) to reduce matrix dimensionality, and the K-means clustering technique. A total of 840 preprocessed tweets were clustered and analyzed for patterns

APA, Harvard, Vancouver, ISO, and other styles

30

Abubakr, Hawraz Abdalla, and Kamaran Faraj. "Hybrid E-Recommendation System for Multi-Shop Environment." UHD Journal of Science and Technology 9, no. 1 (2025): 123–34. https://doi.org/10.21928/uhdjst.v9n1y2025.pp123-134.

Full text

Abstract:

In the Kurdistan Regional Government, most computer shops and markets conduct their marketing offline and do not have electronic systems. Nevertheless, customers live in a digital age; they often face challenges in finding products among these markets and shops. The most common question that customers ask is which shop they should purchase from. Therefore, data from five laptop stores and ratings for markets were collected to build an integrated recommender system to help customers find products and select the best store. Our proposed system is a hybrid e-recommendation system that combines ma

APA, Harvard, Vancouver, ISO, and other styles

31

Rahmadian, Adhi. "Public Sentiment Towards Mandatory Halal Certification: A Large Language Model (LLM) Approach." Likuid Jurnal Ekonomi Industri Halal 4, no. 2 (2024): 1–15. http://dx.doi.org/10.15575/likuid.v4i2.35185.

Full text

Abstract:

This study analyzes public sentiment towards mandatory halal certification in Indonesia, as mandated by Law No. 33/2014 and its revision in Government Regulation No. 39/2021. Using the Large Language Model (LLM) approach, sentiment analysis was conducted on a dataset consisting of 320 samples of headlines from various electronic media platforms, published between 2019 and 2023. The LLM model, employing the RoBERTa architecture, was trained on an Indonesian language dataset and optimized for sentiment classification tasks. Data preprocessing included web scraping, data cleansing, and text vecto

APA, Harvard, Vancouver, ISO, and other styles

32

Uhryn, Dmytro I., Artem O. Karachevtsev, Yurii Ya Tomka, Mykyta M. Zakharov, and Yuliia L. Troianovska. "Information system for analyzing publicsentiment in web platforms based on machine learning." Herald of Advanced Information Technology 7, no. 2 (2024): 199–212. http://dx.doi.org/10.15276/hait.07.2024.14.

Full text

Abstract:

The systems for studying public sentiment in web platforms are analyzed. Various tools and methods for effectively determining the mood in textual data from web platforms are described, including the formalization of the social graph and the content graph.The process of classifying comments, which includes the systematization and categorization of statements, is investigated. Based on the studied dataset, information on customer reviews and hotel ratings in Europe from the booking.com web platform is selected. Taking into account the requirements of the information system and the results of th

APA, Harvard, Vancouver, ISO, and other styles

33

Allam, Hesham, Chris Davison, Faisal Kalota, Edward Lazaros, and David Hua. "AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques." Big Data and Cognitive Computing 9, no. 1 (2025): 16. https://doi.org/10.3390/bdcc9010016.

Full text

Abstract:

As suicide rates increase globally, there is a growing need for effective, data-driven methods in mental health monitoring. This study leverages advanced artificial intelligence (AI), particularly natural language processing (NLP) and machine learning (ML), to identify suicidal ideation from Twitter data. A predictive model was developed to process social media posts in real time, using NLP and sentiment analysis to detect textual and emotional cues associated with distress. The model aims to identify potential suicide risks accurately, while minimizing false positives, offering a practical to

APA, Harvard, Vancouver, ISO, and other styles

34

Olorunshola, Oluwaseyi Ezekiel, Ikuponiyi Oluwapelumi Ampitan, Fatimah Adamu-Fika, and Adeniran Kolade Ademuwagun. "An Enhanced K-NN Algorithm Leveraging BERT Techniques for Resume Parsing System." Asian Journal of Research in Computer Science 18, no. 7 (2025): 49–59. https://doi.org/10.9734/ajrcos/2025/v18i7719.

Full text

Abstract:

The increasing volume of job applications has created significant challenges for organizations in efficiently screening and ranking candidate resumes. Manual and keyword-based automated systems often struggle with accuracy, and contextual understanding. The study introduced an experimental design that develops a hybrid ensemble model for resume parsing and ranking, combining k-nearest neighbors (KNN) and Bidirectional Encoder Representations from Transformers (BERT). The enhancement lies in BERT's ability to generate deep contextual embeddings that are integrated into KNN’s distance-based clas

APA, Harvard, Vancouver, ISO, and other styles

35

Airlangga, Gregorius. "Spam Detection on YouTube Comments Using Advanced Machine Learning Models: A Comparative Study." Brilliance: Research of Artificial Intelligence 4, no. 2 (2024): 500–508. https://doi.org/10.47709/brilliance.v4i2.4670.

Full text

Abstract:

The exponential growth of user-generated content on platforms like YouTube has led to an increase in spam comments, which negatively affect the user experience and content moderation efforts. This research presents a comprehensive comparative study of various machine learning models for detecting spam comments on YouTube. The study evaluates a range of traditional and ensemble models, including Linear Support Vector Classifier (LinearSVC), RandomForest, LightGBM, XGBoost, and a VotingClassifier, with the goal of identifying the most effective approach for automated spam detection. The dataset

APA, Harvard, Vancouver, ISO, and other styles

36

Kolpashnikova, Kamila, Laurence R. Harris, and Shital Desai. "Fear of falling: Scoping review and topic analysis using natural language processing." PLOS ONE 18, no. 10 (2023): e0293554. http://dx.doi.org/10.1371/journal.pone.0293554.

Full text

Abstract:

Fear of falling (FoF) is a major concern among older adults and is associated with negative outcomes, such as decreased quality of life and increased risk of falls. Despite several systematic reviews conducted on various specific domains of FoF and its related interventions, the research area has only been minimally covered by scoping reviews, and a comprehensive scoping review mapping the range and scope of the research area is still lacking. This review aims to provide such a comprehensive investigation of the existing literature and identify main topics, gaps in the literature, and potentia

APA, Harvard, Vancouver, ISO, and other styles

37

Shah, Yaser Ali, Um-e-Aimen, Rida Bushra, Amaad Khalil, Saad Ali Shahbaz, and Mashab Ali Javed. "Cognitive Therapy and Routine Recommendation System (CTRRS): An AI-Driven Approach for Mental Health." VFAST Transactions on Software Engineering 12, no. 4 (2024): 282–301. https://doi.org/10.21015/vtse.v12i4.1976.

Full text

Abstract:

Depression detection and management is an important research field nowadays. In this research work, Cognitive Therapy and Routine Recommendation System (CTRRS) is proposed. It automates the process of detecting depression and provides personalized mental health recommendations using a Random Forest model for healthcare activities and a Long Short-Term Memory (LSTM) model for sentiment analysis. The LSTM architecture includes dense layers, bidirectional LSTM layers, and embedding layers, with term frequency-inverse document frequency (TF-IDF) vectorization and early stopping to prevent overfitt

APA, Harvard, Vancouver, ISO, and other styles

38

Niaz, Awais Amir, Rehan Ashraf, Toqeer Mahmood, C. M. Nadeem Faisal, and Muhammad Mobeen Abid. "An efficient smart phone application for wheat crop diseases detection using advanced machine learning." PLOS ONE 20, no. 1 (2025): e0312768. https://doi.org/10.1371/journal.pone.0312768.

Full text

Abstract:

Globally, agriculture holds significant importance for human food, economic activities, and employment opportunities. Wheat stands out as the most cultivated crop in the farming sector; however, its annual production faces considerable challenges from various diseases. Timely and accurate identification of these wheat plant diseases is crucial to mitigate damage and enhance overall yield. Pakistan stands among the leading crop producers due to favorable weather and rich soil for production. However, traditional agricultural practices persist, and there is insufficient emphasis on leveraging te

APA, Harvard, Vancouver, ISO, and other styles

39

Chavan, Devang, and Shrihari Padatare. "Explainable AI for News Classification." International Journal for Research in Applied Science and Engineering Technology 12, no. 11 (2024): 2400–2408. https://doi.org/10.22214/ijraset.2024.65670.

Full text

Abstract:

Abstract: The proliferation of news content across digital platforms necessitates robust and interpretable machine learning models to classify news into predefined categories effectively. This study investigates the integration of Explainable AI (XAI) techniques within the context of traditional machine learning models, including Naive Bayes, Logistic Regression, and Support Vector Machines (SVM), to achieve interpretable and accurate news classification. Utilizing the News Category Dataset, we preprocess the data to focus on the top 15 categories while addressing class imbalance challenges. M

APA, Harvard, Vancouver, ISO, and other styles

40

Obot, Okure U., Peter Obike, and Imaobong James. "Automated Marking System for Essay Questions." Journal of Engineering Research and Reports 26, no. 5 (2024): 107–26. http://dx.doi.org/10.9734/jerr/2024/v26i51139.

Full text

Abstract:

The stress of marking assessment scripts of many candidates often results in fatigue that could lead to low productivity and reduced consistency. In most cases, candidates use words, phrases and sentences that are synonyms or related in meaning to those stated in the marking scheme, however, examiners rely solely on the exact words specified in the marking scheme. This often leads to inconsistent grading and in most cases, candidates are disadvantaged. This study seeks to address these inconsistencies during assessment by evaluating the marked answer scripts and the marking scheme of Introduct

APA, Harvard, Vancouver, ISO, and other styles

41

S. Roja, Et al. "Performance of Machine Learning Models in Predicting Sentiments of Post-Covid Patients." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 10 (2023): 2324–29. http://dx.doi.org/10.17762/ijritcc.v11i10.8953.

Full text

Abstract:

With the widespread use of social media platforms, sentiment analysis of user-generated content has become a crucial task in understanding public opinion and trends. In this paper, we compare the performance of three popular machine learning models, namely Random Forest, Support Vector Machine (SVM), and Logistic Regression, in predicting sentiments of post-COVID patients on social media tweets. The study utilizes a dataset of labeled tweets representing positive, negative, and neutral sentiments. The preprocessing of textual data involves tokenization, stop-word removal, and conversion to low

APA, Harvard, Vancouver, ISO, and other styles

42

T. SRIKANTH, B. SREYA, B. KEERTHI, and B. SINDHUJA. "DETECTION AND PREDICTION OF FUTURE MENTAL DISORDER FROM SOCIAL MEDIA DATA USING MACHINE LEARNING, ENSEMBLE LEARNING, AND LARGE LANGUAGE MODELS." Journal of Nonlinear Analysis and Optimization 15, no. 02 (2024): 186–91. https://doi.org/10.36893/jnao.2024.v15i12.049.

Full text

Abstract:

The increasing use of social media platforms has led to an exponential rise in data related to individuals' mental health, providing valuable insights for the detection of mental disorders. This project explores the use of Machine Learning (ML) techniques, particularly Random Forest and Decision Tree algorithms, to detect potential mental health issues from social media data. By analyzing the textual content shared by users, the system aims to predict whether a person might be experiencing a mental health disorder based on their posts and interactions. In this study, we preprocess the textual

APA, Harvard, Vancouver, ISO, and other styles

43

Melveetil, Visakh Chandran. "An Optimal Multi-Modal Approach for Stock Market Price Forecasting with Fused Sentiment Analysis for Real Time Data." International Journal for Research in Applied Science and Engineering Technology 12, no. 9 (2024): 1605–22. http://dx.doi.org/10.22214/ijraset.2024.64417.

Full text

Abstract:

Abstract: This study presents an innovative fusion-based methodology that integrates real-time stock market technical indicators with news sentiment analysis from financial news feeds to enhance stock selection decisions. The proposed framework employs a Bidirectional Long Short-Term Memory (Bi-LSTM) model for forecasting stock prices and a Deep Neural Network (DNN) used in conjunction with transformer-based model for sentiment classification, both optimized through the incorporation of real-time datasets. To further refine feature selection, Artificial Bee Colony (ABC) and Firefly algorithms

APA, Harvard, Vancouver, ISO, and other styles

44

Fetahi, Endrit, Mentor Hamiti, Arsim Susuri, Jaumin Ajdari, and Xhemal Zenuni. "AI-Based Hate Speech Detection in Albanian Social Media: New Dataset and Mobile Web Application Integration." International Journal of Interactive Mobile Technologies (iJIM) 18, no. 24 (2024): 190–208. https://doi.org/10.3991/ijim.v18i24.50851.

Full text

Abstract:

This paper aims to advance AI-based hate speech (HS) detection in the Albanian language, which is resource-limited in natural language processing (NLP). Addressing the challenge of limited data, we developed a human-annotated dataset of over 11,000 comments, carefully curated from various Albanian social media platforms, containing a substantial number of HS instances. The dataset was annotated using a detailed two-layer taxonomy to capture the complex dimensions of HS. To ensure high-quality annotations, three expert annotators applied a majority voting system, achieving a substantial Fleiss’

APA, Harvard, Vancouver, ISO, and other styles

45

Yulita, Winda, Meida Cahyo Untoro, Mugi Praseptiawan, Ilham Firman Ashari, Aidil Afriansyah, and Ahmad Naim Bin Che Pee. "Automatic Scoring Using Term Frequency Inverse Document Frequency Document Frequency and Cosine Similarity." Scientific Journal of Informatics 10, no. 2 (2023): 93–104. http://dx.doi.org/10.15294/sji.v10i2.42209.

Full text

Abstract:

Purpose: In the learning process, most of the tests to assess learning achievement have been carried out by providing questions in the form of short answers or essay questions. The variety of answers given by students makes a teacher have to focus on reading them. This scoring process is difficult to guarantee quality if done manually. In addition, each class is taught by a different teacher, which can lead to unequal grades obtained by students due to the influence of differences in teacher experience. Therefore the purpose of this study is to develop an assessment of the answers. Automated s

APA, Harvard, Vancouver, ISO, and other styles

46

Mohammed, Mohannad T., and Omar Fitian Rashid. "Document retrieval using term term frequency inverse sentence frequency weighting scheme." Indonesian Journal of Electrical Engineering and Computer Science 31, no. 3 (2023): 1478. http://dx.doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text

Abstract:

The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents a

APA, Harvard, Vancouver, ISO, and other styles

47

R, Vanitha. "An Improved Vectorization-Based Emotion Detection Using Tuned Inverse Document Frequency Approach." International Journal of Electronics and Communication Engineering 11, no. 3 (2024): 106–14. http://dx.doi.org/10.14445/23488549/ijece-v11i3p111.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Al-Obaydy, Wasseem N. Ibrahem, Hala A. Hashim, Yassen AbdelKhaleq Najm, and Ahmed Adeeb Jalal. "Document classification using term frequency-inverse document frequency and K-means clustering." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (2022): 1517. http://dx.doi.org/10.11591/ijeecs.v27.i3.pp1517-1524.

Full text

Abstract:

Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include

APA, Harvard, Vancouver, ISO, and other styles

49

Widianto, Adi, Eka Pebriyanto, Fitriyanti Fitriyanti, and Marna Marna. "Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity." Journal of Dinda : Data Science, Information Technology, and Data Analytics 4, no. 2 (2024): 149–53. http://dx.doi.org/10.20895/dinda.v4i2.1589.

Full text

Abstract:

Document similarity is a fundamental task in natural language processing and information retrieval, with applications ranging from plagiarism detection to recommendation systems. In this study, we leverage the term frequency-inverse document frequency (TF-IDF) to represent documents in a high-dimensional vector space, capturing their unique content while mitigating the influence of common terms. Subsequently, we employ the cosine similarity metric to measure the similarity between pairs of documents, which assesses the angle between their respective TF-IDF vectors. To evaluate the effectivenes

APA, Harvard, Vancouver, ISO, and other styles

50

Al-Obaydy, Wasseem N. Ibrahem, Hala A. Hashim, Yassen AbdulKhaleq Najm, and Ahmed Adeeb Jalal. "Document classification using term frequency-inverse document frequency and K-means clustering." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (2022): 1517–24. https://doi.org/10.11591/ijeecs.v27.i3.pp1517-1524.

Full text

Abstract:

Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!