Log in

Relevant bibliographies by topics / K-Nearest Neighbors Classification and Regression Tree Logistic Regression Support Vector Machine Random Forest and Bogura / Journal articles

To see the other types of publications on this topic, follow the link: K-Nearest Neighbors Classification and Regression Tree Logistic Regression Support Vector Machine Random Forest and Bogura.

Journal articles on the topic 'K-Nearest Neighbors Classification and Regression Tree Logistic Regression Support Vector Machine Random Forest and Bogura'

Author: Grafiati

Published: 5 June 2025

Last updated: 15 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'K-Nearest Neighbors Classification and Regression Tree Logistic Regression Support Vector Machine Random Forest and Bogura.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mostafizur Rahman, Md, and M. Sayedur Rahman. "PREDICTING RAINFALL BASED ON MACHINE LEARNING ALGORITHM: AN EVIDENCE FROM BOGURA DISTRICT, BANGLADESH." International Journal of Advanced Research 10, no. 08 (2022): 850–58. http://dx.doi.org/10.21474/ijar01/15243.

Full text

Abstract:

Accurately and timely predicting climatic variables are most challenging task for the researchers. Scientists have been trying numerous methods for forecasting environmental data with different methods and found confusing performance of different methods. Recently machine learning tools are considering as a robust technique for predicting climatic variables because these tools extracted hidden relationship from the data and can predict more correctly than existing methods. In this paperwe compare the forecasting performance of various machine learning algorithms such as Classification and Regression Trees (CART), Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (K-NN) and Random Forest (RF) in case of Bogura district in Bangladesh. The weekly rainfall related time series data such as temperature, humidity, wind speed, sunshine, minimum temperature and maximum temperature for the time period January, 1971 to December, 2015 were considered. The model evaluation criteria precision, recall and f-measure and overall accuracy confirms that Random Forest algorithm give best forecasting performance and cross validation approach which produce some graphical view model comparison also confirm that the Random Forest algorithm is the most suitable algorithm for predicting rainfall in case of Bogura district, Bangladesh during this study period.

APA, Harvard, Vancouver, ISO, and other styles

2

Ehsan, Muhmammad. "Comparison of the Predictive Models of Human Activity Recognition (HAR) in Smartphones." UMT Artificial Intelligence Review 1, no. 2 (2021): 27–35. http://dx.doi.org/10.32350/air.0102.03.

Full text

Abstract:

This report compared the performance of different classification algorithms such as decision tree, K-Nearest Neighbour (KNN), logistic regression, Support Vector Machine (SVM) and random forest. The dataset comprised smartphones’ accelerometer and gyroscope readings of the participants while performing different activities, such as walking, walking downstairs, walking upstairs, standing, sitting, and laying. Different machine learning algorithms were applied to this dataset for classification and their accuracy rates were compared. KNN and SVM were found to be the most accurate of all. KEYWORDS— decision tree, Human Activity Recognition (HAR), K-Nearest Neighbour (KNN), logistic regression, random forest, Support Vector Machine (SVM)

APA, Harvard, Vancouver, ISO, and other styles

3

Angula, Taapopi John, and Valerianus Hashiyana. "Detection of Structured Query Language Injection Attacks Using Machine Learning Techniques." International Journal of Computer Science and Information Technology 15, no. 4 (2023): 13–26. http://dx.doi.org/10.5121/ijcsit.2023.15402.

Full text

Abstract:

This paper presents a comparative analysis of various machine learning classification models for structured query language injection prevention. The objective is to identify the best-performing model in terms of accuracy on a given dataset. The study utilizes popular classifiers such as Logistic Regression, Naive Bayes, Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine. Based on the tests used to evaluate the performance of the classifiers, the Naïve Bayes gets the highest level of accurate detection. The results show a 97.06% detection rate for the Naïve Bayes, followed by LogisticRegression (0.9610), Support Vector Machine (0.9586), RandomForest (0.9530), DecisionTree (0.9069), and K-Nearest Neighbor (0.6937). The code snippet provided demonstrates the implementation and evaluation of these models.

APA, Harvard, Vancouver, ISO, and other styles

4

Azeez, N. A., S. S. Oladele, and O. Ologe. "Identification of pharming in communication networks using ensemble learning." Nigerian Journal of Technological Development 19, no. 2 (2022): 172–80. http://dx.doi.org/10.4314/njtd.v19i2.10.

Full text

Abstract:

Pharming scams are carried out by exploiting the DNS as the main weapon while phishing attacks employ spoofed websites that appear to be legitimate to internet users. Phishing makes use of baits such as fake links but pharming leverages and negotiates on the DNS server to move and redirect internet users to a fake and simulated website.Having seen several challenges through pharming resulting into vulnerable websites, personal emails and accounts on social media, the usage and reliability on internet calls for caution. Against this backdrop, this work aims at enhancing pharming detection strategies by adopting machine learning classification algorithms. To further obtain the best classification results, an ensemble learning approach was adopted. The algorithms used include K-Nearest Neighbors (KNN), Decision Tree, Random Forest, Gaussian Naive Bayes, Logistic Regression, Support Vector Machine, Adaptive Boosting, Gradient Boosting, and Extra Trees Classifier. During the testing process, the classifiers were tested against four popular metrics: accuracy, recall, precision, F1 score, and Log loss. The results demonstrate the performance of all algorithms used, as well as their relationships. The ensemble model that included Logistic Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine, Gradient Boosting Classifier, AdaBoost Classifier, Extra Trees Classifier, and Random Forest produced the best results after evaluating them on the two datasets. Random Forest Classifiers showed a better performance of the classifiers, with mean accuracies of 0.932 and 0.939, respectively for each of the datasets when compared to 0.476 and 0.519 obtained for Naive Bayes.

APA, Harvard, Vancouver, ISO, and other styles

5

Vardhan, L. VN Sasi, and Mrs G. Kumari. "Using Machine Learning Classifiers, Analyze and Predict Cardiovascular Disease." International Journal for Research in Applied Science and Engineering Technology 10, no. 11 (2022): 1220–28. http://dx.doi.org/10.22214/ijraset.2022.47384.

Full text

Abstract:

Abstract: A myocardial infarction, indigestion, or even death can take place as a result of several illnesses known as heart disease, including restricted or blocked veins. Depending on the extent of the patient's side effects, the condition is anticipated by the supervised classification classifier. This research intends to investigate how Machine Learning Tree Classifiers depict Heart Disease Prediction. Pattern recognition tree classifiers are analyzed using Random Forest, Decision Tree, Logistic Regression, Support Vector Machine (SVM), and K-nearest Neighbors (KNN) based on their correctness and AUC Gryphon scores. With an execution time of 1.32 seconds, better precision of 85%, and a Coefficient Of determination (r score of 0.8739, the Random Forest machine learning classification surpassed its effectiveness in this investigation of coronary heart disease detection.

APA, Harvard, Vancouver, ISO, and other styles

6

Yao, Jian-Rong, and Jia-Rui Chen. "A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring." Journal of Information Technology Research 12, no. 1 (2019): 77–88. http://dx.doi.org/10.4018/jitr.2019010106.

Full text

Abstract:

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.

APA, Harvard, Vancouver, ISO, and other styles

7

Al-Imran, Md, Salma Akter, Md Abu Sufian Mozumder, et al. "EVALUATING MACHINE LEARNING ALGORITHMS FOR BREAST CANCER DETECTION: A STUDY ON ACCURACY AND PREDICTIVE PERFORMANCE." American Journal of Engineering and Technology 6, no. 9 (2024): 22–33. http://dx.doi.org/10.37547/tajet/volume06issue09-04.

Full text

Abstract:

This study evaluates several machine learning algorithms—Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree (C4.5), and k-Nearest Neighbors (KNN)—for breast cancer detection using the Breast Cancer Wisconsin Diagnostic dataset. We implemented comprehensive pre-processing and model evaluation with Scikit-learn in Python. Our findings show that SVM achieved the highest accuracy, with 99.9% on the training set and 98.50% on the testing set, indicating superior performance in handling high-dimensional data. Random Forest also performed well, with accuracies of 98.5% and 98.20%, respectively. Logistic Regression and Decision Tree models provided reliable predictions when tuned, while KNN was less effective. SVM and Random Forest are recommended for clinical decision support systems due to their high accuracy and robustness.

APA, Harvard, Vancouver, ISO, and other styles

8

Siraj, Mohammed Siddiq, Mohammed Wahaj haqqani, and Dr Khaja Mizbahuddin Quadry. "A novel credit card fraud detection using supervised machine learning model." International Journal of Multidisciplinary Research and Growth Evaluation 5, no. 1 (2024): 313–24. http://dx.doi.org/10.54660/.ijmrge.2024.5.1.313-324.

Full text

Abstract:

Financial fraud, especially in credit card transactions, is a growing concern. To tackle this, data mining techniques are used to automatically analyze large and complex financial datasets. Detecting credit card fraud is tricky because the patterns of normal and fraudulent behavior keep changing, and the data about fraud is much less common compared to legitimate transactions Several techniques were tried on a dataset from European cardholders, including Decision Tree, Random Forest, SVC, XGBoost, K-Nearest Neighbors, and Logistic Regress The dataset had information from 284,786 credit card transactions. To address the challenges, six advanced data mining approaches (Logistic Regression, K-Nearest Neighbors, Support Vector Classifiers, Decision Tree, Random Forests, and XGBoost) are evaluated. A comparative analysis is conducted to identify the best-performing model.

APA, Harvard, Vancouver, ISO, and other styles

9

Mishra, Ashwani, and Sanjeev Gangwar. "Lung Cancer Detection and Classification using Machine Learning Algorithms." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 6s (2023): 277–82. http://dx.doi.org/10.17762/ijritcc.v11i6s.6920.

Full text

Abstract:

Lung cancer is a clump of cells in the lung that are multiplying uncontrollably and improperly. Lung cancer is the deadliest disease, and its cure should be the primary focus of all scientific research. Although it cannot be prevented, we can lessen the danger. Thus, a patient's chance of life depends on the early identification of lung cancer. Several machine learning methods, such as Support Vector Machine, Logistic Regression, Artificial Neural Networks, and Naive Bayes, have been used for the investigation and prognosis of lung cancer. In this paper, Lung cancer prediction is finished by gathering the dataset from the survey and applying machine learning methods such as Support Vector Machine, Nave Bayes, K-Nearest Neighbors, Decision Tree, and Random Forest. With this result, it is revealed that Decision Tree attained the maximum accuracy of 100% as compared to the others.

APA, Harvard, Vancouver, ISO, and other styles

10

Sudhan Reddy, K. Madhu. "Comparative Analysis Of Liver Diseases By Using Machine Learning." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–7. https://doi.org/10.55041/isjem03486.

Full text

Abstract:

ABSTRACT: Liver diseases constitute a major public health concern worldwide, often leading to life- threatening conditions if not diagnosed and treated in time. Conventional diagnostic methods rely heavily on clinical expertise and laboratory tests, which can be time-consuming and may not always yield accurate early detection. With the growing availability of healthcare data, machine learning (ML) techniques have emerged as powerful tools for disease prediction and classification. This paper presents a comparative analysis of liver disease prediction using multiple ML algorithms, including Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The study utilizes the Indian Liver Patient Dataset (ILPD) and applies various preprocessing and feature selection techniques to optimize model performance. Results are evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The analysis reveals that ensemble methods such as Random Forest outperform other models in both accuracy and robustness, offering a promising direction for automated liver disease diagnostics. Keywords : Liver, K-Nearest Neighbors , machine learning

APA, Harvard, Vancouver, ISO, and other styles

11

Lee, Cheng-Wen, Mao-Wen Fu, Chin-Chuan Wang, and Muh Irfandy Azis. "Evaluating Machine Learning Algorithms for Financial Fraud Detection: Insights from Indonesia." Mathematics 13, no. 4 (2025): 600. https://doi.org/10.3390/math13040600.

Full text

Abstract:

The study utilized Multiple Linear Regression along with advanced classification algorithms such as Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree, and Random Forest, to detect financial statement fraud. Model performance was evaluated using key metrics, including precision, recall, accuracy, and F1-Score. The analysis also identified significant indicators of fraud, such as Accounts Receivable Turnover, Days Outstanding Accounts Receivable, Days Payables Outstanding, Logarithm of Gross Profit, Gross Profit Margin, Inventory to Sales Ratio, and Total Asset Turnover. Among the models, Random Forest emerged as the most effective algorithm, consistently outperforming others on both training and testing datasets. Logistic Regression and SVM demonstrated strong reliability, whereas KNN and Decision Tree faced overfitting challenges, limiting their practical application. These findings emphasize the critical need for enhanced fraud detection frameworks, leveraging machine learning algorithms like Random Forest to identify fraud patterns effectively. The study highlights the importance of strengthening internal controls, implementing targeted fraud detection measures, and promoting regulatory improvements to enhance transparency and financial accountability.

APA, Harvard, Vancouver, ISO, and other styles

12

Efrizoni, Lusiana, Sarjon Defit, Muhammad Tajuddin, and Anthony Anggrawan. "Komparasi Ekstraksi Fitur dalam Klasifikasi Teks Multilabel Menggunakan Algoritma Machine Learning." MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer 21, no. 3 (2022): 653–66. http://dx.doi.org/10.30812/matrik.v21i3.1851.

Full text

Abstract:

Ektraksi fitur dan algoritma klasifikasi teks merupakan bagian penting dari pekerjaan klasifikasi teks, yang memiliki dampak langsung pada efek klasifikasi teks. Algoritma machine learning tradisional seperti Na¨ıve Bayes, Support Vector Machines, Decision Tree, K-Nearest Neighbors, Random Forest, Logistic Regression telah berhasil dalam melakukan klasifikasi teks dengan ektraksi fitur i.e. Bag ofWord (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), Documents to Vector (Doc2Vec), Word to Vector (word2Vec). Namun, bagaimana menggunakan vektor kata untuk merepresentasikan teks pada klasifikasi teks menggunakan algoritma machine learning dengan lebih baik selalumenjadi poin yang sulit dalam pekerjaan Natural Language Processing saat ini. Makalah ini bertujuan untuk membandingkan kinerja dari ekstraksi fitur seperti BoW, TF-IDF, Doc2Vec dan Word2Vec dalam melakukan klasifikasi teks dengan menggunakan algoritma machine learning. Dataset yang digunakan sebanyak 1000 sample yang berasal dari tribunnews.com dengan split data 50:50, 70:30, 80:20 dan 90:10. Hasil dari percobaan menunjukkan bahwa algoritma Na¨ıve Bayes memiliki akurasi tertinggi dengan menggunakan ekstraksi fitur TF-IDF sebesar 87% dan BoW sebesar 83%. Untuk ekstraksi fitur Doc2Vec, akurasi tertinggi pada algoritma SVM sebesar 81%. Sedangkan ekstraksi fitur Word2Vec dengan algoritma machine learning (i.e. i.e. Na¨ıve Bayes, Support Vector Machines, Decision Tree, K-Nearest Neighbors, Random Forest, Logistic Regression) memiliki akurasi model dibawah 50%. Hal ini menyatakan, bahwa Word2Vec kurang optimal digunakan bersama algoritma machine learning, khususnya pada dataset tribunnews.com.

APA, Harvard, Vancouver, ISO, and other styles

13

Koppula, Manasa. "PREDICTIVE MAINTENANCE TO REDUCE MACHINE DOWNTIME IN FACTORIES USING MACHINE LEARNING ALGORITHMS." international journal of advanced research in computer science 16, no. 2 (2025): 71–77. https://doi.org/10.26483/ijarcs.v16i2.7224.

Full text

Abstract:

Accurate machine failure detection allows manufacturers to estimate potential machine deterioration and avoid machine downtime caused by unexpected performance issues. Predictive maintenance with the use of machine learning algorithms may anticipate machine faults and maximize maintenance efforts to solve machine downtime problems. To anticipate machine breakdowns and minimize downtime, this work applies a variety of machine learning methods, such as Random Forest, Decision Tree, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Gradient Boosting, and Logistic Regression. Based on the performance measurement values, Random Forest model has shown high levels of accuracy, precision, recall, and F-score. The sequence of order for accuracy of machine learning models follows as: Random Forest > Decision Tree> Gradient Booster Classifier and SVM > Logistic Regression and KVM. This work emphasizes that, through various machine learning models, machine manufacturers could optimize the machine maintenance and prolong the life of machines

APA, Harvard, Vancouver, ISO, and other styles

14

Zhang, Zirui, and Zixuan Li. "Evaluation Methods for Breast Cancer Prediction in Machine Learning Field." SHS Web of Conferences 144 (2022): 03010. http://dx.doi.org/10.1051/shsconf/202214403010.

Full text

Abstract:

Breast cancer is the most common malignant tumor found in women, and there is no cure for advanced breast cancer. Early detection and treatment can effectively improve patient survival. This paper uses five machine learning classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and K-Nearest Neighbors Algorithm (KNN). The training data for the five models are provided by the Wisconsin Breast Cancer Dataset (WBCD). By evaluating and comparing the performance of the five models in accuracy, F1Score, ROC curve, and PR curve, the study finds that LR has the best performance.

APA, Harvard, Vancouver, ISO, and other styles

15

Bashir, Ahmad, Ullah Burhan, Sardar Fouzia, Junaid Hazrat, and Zaman Khan Gul. "A supervised classification phenotyping approach using machine learning for patients diagnosed with primary breast cancer." i-manager's Journal on Computer Science 11, no. 1 (2023): 1. http://dx.doi.org/10.26634/jcom.11.1.19374.

Full text

Abstract:

This paper presents a methodology for the early detection and diagnosis of breast cancer using the Wisconsin dataset. The methodology involves four main steps, including data collection, preprocessing, feature selection, and classification. Fine needle aspiration technique is used to extract the ultrasound image features of breast cancer, and preprocessing is performed to eliminate outliers, null values, and noise. Redundant parameters are removed during the feature selection process to improve accuracy. Six machine learning algorithms, including Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Random Forest, Decision Tree, and Gaussian Naive Bayes, are employed for the classification of the breast cancer dataset. Support Vector Machine and K-Nearest Neighbor achieved the highest accuracy, with Logistic Regression, Gaussian Naive Bayes, Random Forest, and Decision Tree having lower accuracy scores. The proposed methodology could aid in the timely detection and diagnosis of breast cancer, and help doctors in selecting the optimal clinical treatment plan for their patients. Further work will be carried out to investigate the effectiveness of additional preprocessing algorithms in improving the classification accuracy of the breast cancer dataset.

APA, Harvard, Vancouver, ISO, and other styles

16

HOUFANI, Djihane, Sihem SLATNIA, Okba KAZAR, Noureddine ZERHOUNI, Hamza SAOULI, and Ikram REMADNA. "Breast cancer classification using machine learning techniques: a comparative study." Medical Technologies Journal 4, no. 2 (2020): 535–44. http://dx.doi.org/10.26415/2572-004x-vol4iss2p535-544.

Full text

Abstract:

Background: The second leading deadliest disease affecting women worldwide, after lung cancer, is breast cancer. Traditional approaches for breast cancer diagnosis suffer from time consumption and some human errors in classification. To deal with this problems, many research works based on machine learning techniques are proposed. These approaches show their effectiveness in data classification in many fields, especially in healthcare. Methods: In this cross sectional study, we conducted a practical comparison between the most used machine learning algorithms in the literature. We applied kernel and linear support vector machines, random forest, decision tree, multi-layer perceptron, logistic regression, and k-nearest neighbors for breast cancer tumors classification. The used dataset is Wisconsin diagnosis Breast Cancer. Results: After comparing the machine learning algorithms efficiency, we noticed that multilayer perceptron and logistic regression gave the best results with an accuracy of 98% for breast cancer classification. Conclusion: Machine learning approaches are extensively used in medical prediction and decision support systems. This study showed that multilayer perceptron and logistic regression algorithms are performant ( good accuracy specificity and sensitivity) compared to the other evaluated algorithms.

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Huatao. "Machine Learning-based Voting Classifier for Improving Sentiment Analysis on Twitter Data." Transactions on Computer Science and Intelligent Systems Research 5 (August 12, 2024): 1–9. http://dx.doi.org/10.62051/nfkz3035.

Full text

Abstract:

As the number of individuals sharing their thoughts on Twitter continues to grow, comprehending the underlying sentiment behind these tweets becomes increasingly crucial for researchers. To identify the optimal model capable of accurately distinguishing tweet sentiment, the author uses a dataset published in 2022, containing tweet texts annotated with corresponding sentiments. Six basic machine learning classification methods are used for model training: Logistic Regression, Naïve Bayes Classifier, Support Vector Classifier, Decision Tree Classifier, Random Forest Classifier, and K-Nearest Neighbors Classifier. Subsequently, the author assesses the trained models. Through the validation, the author finds that the Logistic Regression, Support Vector Classifier, and Random Forest Classifier perform the highest accuracy and F1-score, and the differences between these three models are small. To improve the model, the author votes the best three models together to build a new model. This model’s accuracy and F1-score are better than all the basic models, and the accuracy and F1-score have all reached 71.6%. The research shows the differences between each model and the best model when distinguishing between positive tweets, neutral tweets, and negative tweets.

APA, Harvard, Vancouver, ISO, and other styles

18

Panchal, Ritik. "Comparing Breast Cancer Prediction Models." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (2024): 2703–13. http://dx.doi.org/10.22214/ijraset.2024.59447.

Full text

Abstract:

Abstract: In this research study, five machine learning algorithms—Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree (C4.5), and K-Nearest Neighbors (KNN)—were applied to the Breast Cancer Wisconsin Diagnostic dataset. The subsequent results underwent a thorough performance evaluation and comparison among these diverse classifiers. The primary objective was to predict and diagnose breast cancer using machine learning algorithms, determining the most effective approach based on factors such as the confusion matrix, accuracy, and precision. Notably, the findings highlight that the Support Vector Machine outperformed all other classifiers, achieving the highest accuracy at 97.2%.

APA, Harvard, Vancouver, ISO, and other styles

19

Habbat, Nassera, Houda Anoun, and Larbi Hassouni. "Sentiment Analysis and Topic Modeling on Arabic Twitter Data during Covid-19 Pandemic." Indonesian Journal of Innovation and Applied Sciences (IJIAS) 2, no. 1 (2022): 60–67. http://dx.doi.org/10.47540/ijias.v2i1.432.

Full text

Abstract:

Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using different algorithms. In our research work, we conducted a study to analyze and compare different Algorithms of Machine Learning (MLAs) for the classification task, and hence we collected 37 875 Moroccan tweets, during the COVID-19 pandemic, from 01 March 2020 to 28 June 2020. The analysis was done using six classification algorithms (Naive Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest classifier) and considering Accuracy, Recall, Precision, and F-Score as evaluation parameters. Then we applied topic modeling over the three classified tweets categories (negative, positive, and neutral) using Latent Dirichlet Allocation (LDA) which is among the most effective approaches to extract discussed topics. As result, the logistic regression classifier gave the best predictions of sentiments with an accuracy of 68.80%.

APA, Harvard, Vancouver, ISO, and other styles

20

Airlangga, Gregorius. "Comparative Analysis of Machine Learning Algorithms for Multi-Class Tree Species Classification Using Airborne LiDAR Data." Brilliance: Research of Artificial Intelligence 4, no. 1 (2024): 32–37. http://dx.doi.org/10.47709/brilliance.v4i1.3673.

Full text

Abstract:

Forests hold vital ecological significance, and the ability to accurately classify tree species is integral to conservation and management practices. This research investigates the application of machine learning techniques to airborne Light Detection and Ranging (LiDAR) data for the multi-class classification of tree species, specifically Alder, Aspen, Birch, Fir, Pine, Spruce, and Tilia. High-density LiDAR data from varied forest landscapes were subjected to a rigorous preprocessing and noise reduction protocol, followed by feature extraction to discern structural characteristics indicative of species identity. We assessed the performance of six machine learning models: Logistic Regression, Decision Tree, Random Forest, Support Vector Classifier (SVC), k-Nearest Neighbors (KNN), and Gradient Boosting. The analysis was based on metrics of accuracy, precision, recall, and F1 score. Logistic Regression and Random Forest models outperformed others, achieving accuracies of 0.81, precision of 0.80, recall of 0.81, and an F1 score of 0.80. In contrast, the KNN algorithm had the lowest accuracy of 0.60, precision and recall of 0.60, and an F1 score of 0.59. These results demonstrate the robustness of Logistic Regression and Random Forest for classifying complex LiDAR datasets. The study underscores the potential of these models to support ecological monitoring, enhance forest management, and aid in biodiversity conservation. Future research directions include the fusion of LiDAR data with other environmental variables, application of deep learning for improved feature extraction, and validation of the models across broader species and geographical ranges. This research marks a significant step towards leveraging advanced machine learning to interpret and utilize LiDAR data for environmental and ecological applications.

APA, Harvard, Vancouver, ISO, and other styles

21

Rohini, Ashok Gamane, and Dabhade Dr.Vaibhav. "FINDING FAKE SOCIAL MEDIA ACCOUNT USING MACHINE LEARNING." Journal of the Maharaja Sayajirao University of Baroda 59, no. 1 (I) (2025): 284–303. https://doi.org/10.5281/zenodo.15251857.

Full text

Abstract:

Abstract: The widespread use of social media has resulted in a surge of fake accounts, posing seriousrisks to individuals, organizations, and society at large. Identifying fake accounts effectively isessential to preserving the integrity and credibility of social media platforms. This study introduces amachine learning-based approach to detect fake social media accounts.We employed five machinelearning algorithms—Support Vector Machines (SVM), K-Nearest Neighbors (KNN), RandomForest, Logistic Regression, and Artificial Neural Networks (ANN)—to classify accounts as fake orgenuine. The dataset used in this study consisted of features extracted from social media profiles, suchas user behavior, profile details, and network characteristics. Experimental results revealed that theANN algorithm outperformed the others, achieving a high accuracy of 95.6% in detecting fakeaccounts.The proposed approach offers significant benefits for social media platforms by enablingmore efficient detection and prevention of fake accounts. Furthermore, the findings of this study canguide the development of advanced fake account detection systems, contributing to a safer and morereliable online environment. Keywords: Fake account detection, Support Vector Machines (SVM), K-Nearest Neighbors (KNN),Random Forest, Logistic Regression, Artificial Neural Networks (ANN), classification, behavioranalysis.

APA, Harvard, Vancouver, ISO, and other styles

22

Susandri, Susandri, Sarjon Defit, and Muhammad Tajuddin. "SENTIMENT LABELING AND TEXT CLASSIFICATION MACHINE LEARNING FOR WHATSAPP GROUP." JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) 9, no. 1 (2023): 119–25. http://dx.doi.org/10.33480/jitk.v9i1.4201.

Full text

Abstract:

The use of WhatsApp Group (WAG) for communication is increasing nowadays. WAG communication data can be analyzed from various perspectives. However, this data is imported in the form of unstructured text files. The aim of this research is to explore the potential use of the SentiwordNet lexicon for labeling the positive, negative, or neutral sentiment of WAG data from "Alumni94" and training and testing it with machine learning text classification models. The training and testing were conducted on six models, namely Random Forest, Decision Tree, Logistic Regression, K-Nearest Neighbors (KNN), Linear Support Vector Machine (SVM), and Artificial Neural Network. The labeling results indicate that neutral sentiment is the majority with 7588 samples, followed by 324 negative and 1617 positive samples. Among all the models, Random Forest showed better precision and recall, i.e., 83% and 64%. On the other hand, Decision Tree had slightly lower precision and recall, i.e., 80% and 66%, but exhibited a better f-measure of 71%. The accuracy evaluation results of the Random Forest and Decision Tree models showed significant performance compared to others, achieving an accuracy of 89% in classifying new messages. This research demonstrates the potential use of the SentiwordNet lexicon and machine learning in sentiment analysis of WAG data using the Random Forest and Decision Tree models

APA, Harvard, Vancouver, ISO, and other styles

23

S. Peerbashab, Y. Mohammed Iqbal, Praveen K.P, M. Mohamed Surputheen, and A Saleem Raja. "Diabetes Prediction using Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, Logistic Regression Classifiers." JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH 5, no. 4 (2023): 42–54. http://dx.doi.org/10.46947/joaasr542023680.

Full text

Abstract:

One of the world's deadliest diseases is diabetes. It is an additional creator of different assortments of problems. Ex: Coronary disappointment, Visual impairment, Urinary organ illnesses, and so forth. In such cases, the patients are expected to visit a hospital to get a consultation with doctors and their reports. They must contribute their time and cash every time they visit the hospital. Yet, with the development of AI techniques, we have the adaptability to search out a response to the present problem. We have progressed an advanced framework for handling data that can figure regardless of whether the patient has polygenic sickness. In addition, being able to foresee the onset of the disease is crucial for patients. Data withdrawal has the adaptability to eliminate concealed information from an enormous amount of diabetes-related data. The most important outcomes of this research are the establishment of a theoretical framework that can reliably predict a patient's level of risk for developing diabetes. We have utilized the existing categorization methods such as DT (Decision Tree), RF (Random Forest), SVM (Support vector Machine), LR (Logistic Regression) as well as K-NN (K-Nearest Neighbors) for predicting the severity of Type-II Diabetes patients. We got an accuracy of 99% for the Random Forest, 98.40% for the Decision Tree, 78.54% for Logistic Regression, 77.94% for SVM (Using RBF Kernal SVM), and 77.64% for KNN.

APA, Harvard, Vancouver, ISO, and other styles

24

Santana, Iris Viana dos Santos, Álvaro Sobrinho, Leandro Dias da Silva, and Angelo Perkusich. "Machine Learning for COVID-19 and Influenza Classification during Coexisting Outbreaks." Applied Sciences 13, no. 20 (2023): 11518. http://dx.doi.org/10.3390/app132011518.

Full text

Abstract:

This study compares the performance of machine learning models for selecting COVID-19 and influenza tests during coexisting outbreaks in Brazil, avoiding the waste of resources in healthcare units. We used COVID-19 and influenza datasets from Brazil to train the Decision Tree (DT), Multilayer Perceptron (MLP), Gradient Boosting Machine (GBM), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), K-Nearest Neighbors, Support Vector Machine (SVM), and Logistic Regression algorithms. Moreover, we tested the models using the 10-fold cross-validation method to increase confidence in the results. During the experiments, the GBM, DT, RF, XGBoost, and SVM models showed the best performances, with similar results. The high performance of tree-based models is relevant for the classification of COVID-19 and influenza because they are usually easier to interpret, positively impacting the decision-making of health professionals.

APA, Harvard, Vancouver, ISO, and other styles

25

Mondal, Shudipti Rani, Aafreen ., and Rakesh Pal. "PREDICTION OF BREAST CANCER USING MACHINE LEARNING." International Journal of Innovative Research in Advanced Engineering 8, no. 3 (2021): 28–33. http://dx.doi.org/10.26562/ijirae.2021.v0803.001.

Full text

Abstract:

In order to support and supervise patients, the key detection and estimation of cancer type should establish a compulsion in the cancer research. Many research teams from the biomedical and bioinformatics fields have been advised to learn and evaluate the use of machine learning (ML) methods because of the relevance of classifying cancer patients into high or low risk clusters. To predict breast cancer, the logistic regression method and many classifiers have been proposed to generate profound predictions about breast cancer data in a new environment. This paper discusses the various approaches to data mining using classification to create deep predictions that can be applied to Breast Cancer data. In addition, by testing datasets on different classifiers, this analysis predicts the best model that delivers high efficiency. In this paper, the UCI machine learning repository has 699 instances with 11 attributes collected from the Breast cancer dataset. First, the data set is pre-processed, visualized and fed to different classifiers such as Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree and Random Forest. 10-fold cross validation is implemented and testing is carried out in order to create and validate new models. Effective analysis shows that Logistic Regression generates the deep predictions of all classifiers and obtains the best model delivering strong and precise outcomes, followed by other methods: Support Vector Classifier, K-Nearest Neighbour, Decision Tree and Random Forest. Most models were less reliable compared to the approach of logistic regression.

APA, Harvard, Vancouver, ISO, and other styles

26

Aldo januansyah. H, Muhammad Fikry, and Yesy Afrillia. "Machine Learning Algorithms Comparison for Gender Identification." Proceedings of Malikussaleh International Conference on Multidisciplinary Studies (MICoMS) 4 (December 18, 2024): 00007. https://doi.org/10.29103/micoms.v4i.885.

Full text

Abstract:

Abstract. In this study, we presents a comprehensive analysis of gender identification methods utilising eight distinct classification models: K-Nearest Neighbors (KNN), Naive Bayes, Decision Tree, Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), and Neural Network. Gender identification is a critical task with significant applications in marketing, social analysis, and security systems, necessitating the exploration of various methodologies to achieve optimal performance. The dataset employed in this research underwent normalisation using the Min-Max scaling technique, which enhances the performance of classification models by ensuring that all features contribute equally, particularly when the data exhibits varying ranges of values. The results reveal that the K-Nearest Neighbors (KNN) model significantly outperformed the other models, achieving an impressive accuracy of 0.9758 with a support of 951, underscoring the effectiveness of the KNN algorithm in gender identification tasks and establishing it as a reliable choice for applications requiring high accuracy. Furthermore, the study emphasises the critical importance of selecting appropriate models in machine learning tasks and the substantial impact of data normalisation on model performance. Overall, this research provides valuable insights into the KNN algorithm, demonstrating its ease of implementation and exceptional effectiveness in achieving high precision in gender identification tasks, with implications for future research and practical applications across various fields. Keywords : classification models; data normalisation; gender identification; K-Nearest Neighbours; machine learning.

APA, Harvard, Vancouver, ISO, and other styles

27

Gautam, Sudarshan Kumar, Sanjeevan Shrestha, Subhadra Joshi, and Jeshan Pokharel. "Evaluating Machine Learning Algorithms for Forest Cover Extraction in Kailali, Nepal." Journal on Geoinformatics, Nepal 24 (May 28, 2025): 33–40. https://doi.org/10.3126/njg.v24i1.79346.

Full text

Abstract:

Forest cover mapping plays a critical role in environmental monitoring, biodiversity conservation, and sustainable land-use planning, especially in ecologically diverse regions like Nepal. This study evaluates the performance of ten supervised machine learning classifiers for forest cover extraction in the Kailali District using Sentinel-2 satellite imagery. The classifiers assessed include Random Forest, Support Vector Classifier, Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbors, Decision Tree, Gaussian Naïve Bayes, AdaBoost, Quadratic Discriminant Analysis, and Gaussian Process Classifier. Feature engineering involved the derivation of 17 vegetation and water indices alongside key spectral bands, followed by correlation analysis to optimize input variables. Ground truth data were collected through field surveys and high-resolution imagery to ensure accurate model training and validation. Classifier performance was evaluated using k-fold cross-validation and standard metrics, including accuracy, precision, recall, and F1-score. Among the models, Random Forest and Gaussian Process achieved the highest classification accuracies of 91.37% and 91.31%, respectively. The study demonstrates the effectiveness of machine learning techniques in forest cover classification and provides valuable insights for enhancing remote sensing-based monitoring frameworks in support of sustainable forest management in Nepal.

APA, Harvard, Vancouver, ISO, and other styles

28

Dahal, Narayan Prasad, and Subarna Shakya. "A Comparative Analysis of Prediction of Student Results Using Decision Trees and Random Forest." Journal of Trends in Computer Science and Smart Technology 4, no. 3 (2022): 113–25. http://dx.doi.org/10.36548/jtcsst.2022.3.001.

Full text

Abstract:

Many types of research are based on students' past data for predicting their performance. A lot of data mining techniques for analyzing the data have been used so far. This research project predicts the higher secondary students' results based on their academic background, family details, and previous examination results using three decision tree algorithms: ID3, C4.5 (J48), and CART (Classification and Regression Tree) with other classification algorithms: Random Forest (RF), K-nearest Neighbors (KNN), Support Vector Machine (SVM) and Artificial Neural Network (ANN). The research project analyzes the performance and accuracy based on the results obtained. It also identifies some common differences based on achieved output and previous research work.

APA, Harvard, Vancouver, ISO, and other styles

29

Airlangga, Gregorius. "Comparative Analysis of Machine Learning Models for Tree Species Classification from UAV LiDAR Data." Buletin Ilmiah Sarjana Teknik Elektro 6, no. 1 (2024): 54–62. https://doi.org/10.12928/biste.v6i1.10059.

Full text

Abstract:

Forest ecosystems play a pivotal role in maintaining global biodiversity and climate balance. The precise identification of tree species via remote sensing technologies is vital for effective ecological surveillance and forest stewardship. This research conducts a comparative analysis of various machine learning algorithms for the binary classification of tree species utilizing LiDAR data captured by Unmanned Aerial Vehicles (UAVs). We analyzed a dataset featuring 192 trees from a diverse forest, employing models such as Logistic Regression, Support Vector Machine (SVM), Random Forest, K-Nearest Neighbors (KNN), Gradient Boosting, and Decision Trees. These models were assessed on their accuracy, precision, recall, and F1-scores to ascertain their efficacy. Our findings reveal that Logistic Regression and SVM were superior, achieving precision and recall scores up to 0.96, indicating their robust predictive capability. In contrast, KNN underperformed, suggesting the need for parameter refinement. Although ensemble methods demonstrated resilience, they were more prone to overfitting in comparison to the more straightforward Logistic Regression and SVM models. Preliminary data preprocessing and feature engineering techniques are discussed, enhancing the models' performance. This work enriches the domain of remote sensing and ecological monitoring by offering an in-depth evaluation of machine learning models for tree species classification, underscoring their advantages and constraints. It underscores the transformative potential of machine learning in refining ecological analysis precision, thereby aiding in the pursuit of sustainable forest management. Future research directions could include model refinement through advanced feature selection or the exploration of novel machine learning algorithms for improved classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

30

Engl, Fabian, and Frank Herrmann. "Machine Learning based Approach on Employee Attrition Prediction with an Emphasize on predicting Leaving Reasons." Anwendungen und Konzepte der Wirtschaftsinformatik, no. 18 (December 28, 2023): 11. http://dx.doi.org/10.26034/lu.akwi.2023.4488.

Full text

Abstract:

Using Vitesco Technologies as an example, this article examines whether machine learning models are suitable for detecting employee attrition at an early stage, with the aim of uncovering underlying reasons for leaving. Nine different machine learning algorithms were examined: K-nearest-neighbors, Naive Bayes, logistic regression, a support vector machine, a neural network, a random forest, adaptive boosting, and two gradient boosting models. A three-way-holdout validation method was implemented to assess the quality of the results and measure both the f-score and the degree of model generalization. Initially, it was found that tree-based methods are best suited for classifying employees. A multiclass classification approach showed that under certain conditions it is even possible to predict the underlying leaving reasons.

APA, Harvard, Vancouver, ISO, and other styles

31

Brati, Esmeralda, Alma Braimllari, and Ardit Gjeçi. "Machine Learning Applications for Predicting High-Cost Claims Using Insurance Data." Data 10, no. 6 (2025): 90. https://doi.org/10.3390/data10060090.

Full text

Abstract:

Insurance is essential for financial risk protection, but claim management is complex and requires accurate classification and forecasting strategies. This study aimed to empirically evaluate the performance of classification algorithms, including Logistic Regression, Decision Tree, Random Forest, XGBoost, K-Nearest Neighbors, Support Vector Machine, and Naïve Bayes to predict high insurance claims. The research analyses the variables of claims, vehicles, and insured parties that influence the classification of high-cost claims. This investigation utilizes a dataset comprising 802 observations of bodily injury claims from the motor liability portfolio of a private insurance company in Albania, covering the period from 2018 to 2024. In order to evaluate and compare the performance of the models, we employed evaluation criteria, including classification accuracy (CA), area under the curve (AUC), confusion matrix, and error rates. We found that Random Forest performs better, achieving the highest classification accuracy (CA = 0.8867, AUC = 0.9437) with the lowest error rates, followed by the XGBoost model. At the same time, logistic regression demonstrated the weakest performance. Key predictive factors in high claim classification include claim type, deferred period, vehicle brand and age of driver. These findings highlight the potential of machine learning models in improving claim classification and risk assessment and refine underwriting policy.

APA, Harvard, Vancouver, ISO, and other styles

32

Harshit, Mathur, and Surana Aditya. "Glass Classification based on Machine Learning Algorithms." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 9, no. 11 (2020): 139–42. https://doi.org/10.35940/ijitee.H6819.0991120.

Full text

Abstract:

Glass Industry is considered one of the most important industries in the world. The Glass is used everywhere, from water bottles to X-Ray and Gamma Rays protection. This is a non-crystalline, amorphous solid that is most often transparent. There are lots of uses of glass, and during investigation in a crime scene, the investigators need to know what is type of glass in a scene. To find out the type of glass, we will use the online dataset and machine learning to solve the above problem. We will be using ML algorithms such as Artificial Neural Network (ANN), K-nearest neighbors (KNN) algorithm, Support Vector Machine (SVM) algorithm, Random Forest algorithm, and Logistic Regression algorithm. By comparing all the algorithm Random Forest did the best in glass classification.

APA, Harvard, Vancouver, ISO, and other styles

33

Omoruwou, Felix, Arnold Adimabua Ojugo, and Solomon Ebuka Ilodigwe. "Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing." Journal of Computing Theories and Applications 2, no. 1 (2024): 126–37. http://dx.doi.org/10.62411/jcta.9539.

Full text

Abstract:

The occurrence of scorch during the production of flexible polyurethane is a significant issue that negatively impacts foam products' resilience and generally jeopardizes their integrity. The likelihood of foam product failure can be decreased by optimizing production variables based on machine learning algorithms used to predict the occurrence of scorch. Investigating technology is required because prevention is the best approach to dealing with this problem. Hence, machine learning algorithms were trained to predict the occurrence of scorch using the thermodynamic profile of polyurethane foam, which is made up of recorded production variables. A variety of heuristics algorithms were trained and assessed for how well they performed, namely XGBoost, Decision trees, Random Forest, K-nearest neighbors, Naive Bayes, Support Vector Machines, and Logistic Regression. The XGboost ensemble was found to perform best. It outperformed others with an accuracy of 98.3% (i.e., 0.983), followed by logistic regression, decision tree, random forest, K-nearest neighbors, and naïve Bayes, yielding a training accuracy of 88.1%, 66.7%, 84.2%, 87.5%, and 67.5% respectively. The XGBoost was finally used, yielding 2-distinct cases of non(occurrence) of scorch. Ensemble demonstrates that it is quite capable and is an effective way to predict the occurrence of scorch.

APA, Harvard, Vancouver, ISO, and other styles

34

Yadav, Sudha, Harkesh Sehrawat, Vivek Jaglan, et al. "A Novel Effective Forecasting Model Developed Using Ensemble Machine Learning For Early Prognosis of Asthma Attack and Risk Grade Analysis." Scalable Computing: Practice and Experience 26, no. 1 (2025): 398–414. https://doi.org/10.12694/scpe.v26i1.3758.

Full text

Abstract:

Research curiosity enlarging the concern of clinician and researchers towards combination of medical science together with artificial intelligence to develop cost effective predictive model for asthma exacerbation. To accumulate the classification consequences, extensively known ensemble machine learning methods pivotal to artificial intelligence techniques are investigated and novel predictive model developed using catboost classifier that produced comparatively improved outcomes to predict the occurrence of asthma and asthma risk grade. Proposed model result is compared with other classifiers which are Support vector machine (SVM), K-Nearest neighbors (KNN), Logistic regression, Adaboost classifier, Gradient boosting classifier, Random forest, Decision tree. Model regulated classification accuracy as high as 93% with datasets selected for formation of early prognosis model of asthma disease by embracing only 20% of the features in the reduced feature set.

APA, Harvard, Vancouver, ISO, and other styles

35

Ria Suci Nurhalizah, Hadi Jayusman, and Purwatiningsih. "Exploration of Machine Learning Methods in Medical Disease Prediction: A Systematic Literature Review." Journal of Advanced Health Informatics Research 1, no. 3 (2024): 157–74. https://doi.org/10.59247/jahir.v1i3.174.

Full text

Abstract:

Exploration of Machine Learning methods in the systematic literature shows successful applications in disease diagnosis, disease prediction, and treatment planning. This literature only includes discussions on Classification methods consisting of Support Vector Machine(SVM), Naïve Bayes, Nearest Neighbors and Neural Network(NN) and Regression consisting of Decision Tree, Linear Regression, Random Forest Ensemble Methods, and Neural Network(NN). Clustering which consists of K-Means Clustering, Artificial Neural Network (ANN), Gaussian Mixture, Neural Network (NN) and Dimensionality reduction which consists of Principal Component Analysis (PCA). In the context of healthcare, the importance of sustainability, ethics, and data security are key factors. This research uses Systematic Literature Review (SLR) to explore Machine Learning methods in the medical context and recommends Support Vector Machine, Random Forest, and Neural Networks as effective methods. By exploring 300 papers and selecting 57 papers for discussion of machine learning methods in medical disease prediction. Method selection should be tailored to the dataset characteristics and disease prediction goals, while prioritizing

APA, Harvard, Vancouver, ISO, and other styles

36

Jayidan, Zirji, Amril Mutoi Siregar, Sutan Faisal, and Hanny Hikmayanti. "IMPROVING HEART DISEASE PREDICTION ACCURACY USING PRINCIPAL COMPONENT ANALYSIS (PCA) IN MACHINE LEARNING ALGORITHMS." Jurnal Teknik Informatika (Jutif) 5, no. 3 (2024): 821–30. https://doi.org/10.52436/1.jutif.2024.5.3.2047.

Full text

Abstract:

This study aims to improve the accuracy of heart disease prediction using Principal Component Analysis (PCA) for feature extraction and various machine learning algorithms. The dataset consists of 334 rows with 49 attributes, 5 classes and 31 target diagnoses. The five algorithms used were K-nearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT). Results show that algorithms using PCA achieve high accuracy, especially RF, LR, and DT with accuracy up to 1.00. This research highlights the potential of PCA-based machine learning models in early diagnosis of heart disease.

APA, Harvard, Vancouver, ISO, and other styles

37

DYMORA, PAWEŁ, MIROSŁAW MAZUREK, and ŁUKASZ SMYŁA. "A Comparative Analysis of Selected Data Mining Algorithms and Programming Languages." Journal of Education, Technology and Computer Science 5, no. 35 (2024): 69–83. https://doi.org/10.15584/jetacomps.2024.5.7.

Full text

Abstract:

This paper evaluates the performance of ten selected data mining algorithms in the context of classification and regression and the effectiveness between two popular programming languages used in data science: Python and R. The algorithms included in the study were Naive Bayes Classi fier, K-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting Machine (GBM), Logistic Regression, Linear Regression, Ridge Re gression, and LASSO Regression. The study aimed to evaluate how the various algorithms per form in classification and regression tasks in the context of a specific problem, in this case fraud detection. The performance of the algorithms was evaluated based on key metrics such as accura cy, execution time, the difference between the best and worst results, and in terms of mean square error (MSE). Moreover, learning tools such as R and Python enable students not only to perform multidimensional data analysis, but also to predict future trends and changes. The ability to work with data, modelling and visualisation are key competences in the context of many areas of mo dern life and to support the making of accurate business decisions.

APA, Harvard, Vancouver, ISO, and other styles

38

Ayepeku, Olukayode Felix. "Analysis and Visualization of Breast Cancer Prediction through Machine Learning Models." SISTEMASI 13, no. 3 (2024): 1178. http://dx.doi.org/10.32520/stmsi.v13i3.4100.

Full text

Abstract:

This research presents an in-depth exploration of breast cancer prediction through the application of machine learning models, specifically focusing on Logistic Regression, K-Nearest Neighbors, Support Vector Classifier, 'Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, AdaBoost Classifier, and XGBoost Classifier. The study utilizes a comprehensive dataset comprising clinical features extracted from Kaggle. Various algorithms are employed, and a meticulous analysis of precision, recall, F1-score, and accuracy is conducted to assess model performance. Through advanced visualization techniques and statistical analysis, the research provides insights into the effectiveness of machine learning models in predicting breast cancer. The outcomes of this study aim to contribute valuable knowledge to the field of medical diagnostics, emphasizing the importance of machine learning methodologies in enhancing breast cancer prediction and classification.

APA, Harvard, Vancouver, ISO, and other styles

39

Deshpande, Chinmay Vinod, and Sayed A Naveed. "ECG Classification Using Machine Learning." International Journal of Science and Healthcare Research 9, no. 1 (2024): 357–67. http://dx.doi.org/10.52403/ijshr.20240146.

Full text

Abstract:

In contemporary healthcare, Electrocardiography (ECG) played a crucial role in the diagnosis and monitoring of heart conditions. This paper introduced an automated system that meticulously processed ECG records, with a focus on extracting essential parameters. The data were sourced from multiple databases, including the prestigious MIT-BIH Arrhythmia Database and many more databases. The evaluation phase involved the meticulous assessment of machine learning models, specifically Logistic Regression, Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), for the purpose of classifying ECG records. A noteworthy aspect of this research lies in its innovative approach to classify records of the datasets, thereby enabling the detection of a wide range of cardiac conditions, such as Normal Sinus, Tachycardia, Bradycardia, First-Degree Heart Block, Long QT Syndrome, ST Elevation, and ST Depression. The automated system presented in this paper offers significant support for efficient heart health assessment, which, in turn, facilitates timely interventions and well-informed decisions, potentially contributing to a reduction in the burden of cardiac conditions. This research contributes a comprehensive and valuable system for the processing of ECG records, which promises to aid medical practitioners and researchers in enhancing patient care and advancing early arrhythmia detection. Keywords: Electrocardiography, SVM, KNN, Cardiac Parameters, Machine Learning, Arrhythmia, Logistic Regression, Random Forest.

APA, Harvard, Vancouver, ISO, and other styles

40

Nuraeni, Nia, and Puji Astuti. "Pendekatan Machine Learning untuk Deteksi Dini Kanker Paru-Paru: Mengoptimalkan Sensitivitas dan Akurasi." Jurnal Informatika Polinema 11, no. 3 (2025): 339–46. https://doi.org/10.33795/jip.v11i3.7011.

Full text

Abstract:

Kanker paru-paru merupakan salah satu penyebab utama kematian akibat kanker di dunia. Deteksi dini yang akurat sangat penting untuk meningkatkan peluang kesembuhan pasies. Penelitian ini membandingkan kinerja enam algorithma data mining, yaitu Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree Classifier (DTC), dan K-Nearest Neighbors (K-NN) dalam memprediksi kanker paru-paru. Hasil evalusi menunjukan bahwa DTC memiliki sensivitas tertinggi (0.558) dan F1- Score terbaik (0.543), sehingga lebih efektif dalam mendeteksi sebanyak mungkin kasus positif. Random Forest memiliki akurasi tinggi (90%), menunjukan keseimbangan antara prediksi yang benar secara keseluruhan dan performa disemua aspek. Model lainnya, seperti Logistic Regression dan SVM memiliki akurasi tinggi (89%) tetapi dengans ensitivitas yang lebih rendah dibandingkan Decision Tree Classifier. Berdasarkan hasil ini, model dengan sensitivitas tinggi lebih sesuai untuk skrining awal agar tidak melewatkan kasus kanker, sedangkan model dengan akurasi tinggi dapat digunakan untuk diagnosis yang lebih seimbang. Dengan demikian pendekatan machine learning dapat menjadi alat bantu yang potensial dalam deteksi dini kanker paru-paru.

APA, Harvard, Vancouver, ISO, and other styles

41

Bhakare, Mr Pradip. "Brain Stroke Prediction Using Machine Learning Algorithm." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem46323.

Full text

Abstract:

Abstract - Brain stroke is a serious medical illness that may result in severe neurological damage or even death if left undetected and untreated. Predicting the risk of stroke at an early stage can substantially improve clinical decision-making and prevention care. The proposed work involves using machine learning for predicting the risk of brain stroke based on patient health data. The used dataset includes related features like age, Hypertension, heart disease, smoking status, BMI, glucose level, and work type was employed. Different machine learning algorithms such as Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and K- Nearest Neighbors (KNN) were tried and evaluated against performance measures such as accuracy, precision, recall, and F1-score. The highest prediction accuracy was found in the random Forest model, implying that it could be used to accurately classify based on this problem. This research demonstrates the potential of machine learning to enhance stroke diagnosis and assist Healthcare decision – making. Key Words: Brain Stroke, Stroke Prediction, Random Forest, Health Data, Classification, Early Detection, Health Care Analytics.

APA, Harvard, Vancouver, ISO, and other styles

42

Salem Alzboon, Mowafaq, Mohammad Subhi Al-Batah, Muhyeeddin Alqaraleh, Ahmad Abuashour, and Ahmad Fuad Hamadah Bader. "Early Diagnosis of Diabetes: A Comparison of Machine Learning Methods." International Journal of Online and Biomedical Engineering (iJOE) 19, no. 15 (2023): 144–65. http://dx.doi.org/10.3991/ijoe.v19i15.42417.

Full text

Abstract:

Detection and management of diabetes at an early stage is essential since it is rapidly becoming a global health crisis in many countries. Predictions of diabetes using machine learning algorithms have been promising. In this work, we use data collected from the Pima Indians to assess the performance of multiple machine-learning approaches to diabetes prediction. Ages, body mass indexes, and glucose levels for 768 patients are included in the data set. The methods evaluated are Logistic Regression, Decision Tree, Random Forest, k-Nearest Neighbors, Naive Bayes, Support Vector Machine, Gradient Boosting, and Neural Network. The findings indicate that the Logistic Regression and Neural Network models perform the best on most criteria when considering all classes together. The SVM, Random Forest, and Naive Bayes models also receive moderate to high scores, suggesting their strength as classification models. However, the kNN and Tree models show poorer scores on most criteria across all classes, making them less favorable choices for this dataset. The SGD, AdaBoost, and CN2 rule inducer models perform the poorest when comparing all models using a weighted average of class scores. The results of the study suggest that machine learning algorithms may help predict the onset of diabetes and for detecting the disease at an early stage.

APA, Harvard, Vancouver, ISO, and other styles

43

Md Alif Sheakh, Mst. Sazia Tahosin, Lima Akter, et al. "Comparative analysis of machine learning algorithms for ECG-based heart attack prediction: A study using Bangladeshi patient data." World Journal of Advanced Research and Reviews 23, no. 3 (2024): 2572–84. http://dx.doi.org/10.30574/wjarr.2024.23.3.2928.

Full text

Abstract:

This study aims to identify the most accurate machine learning algorithm for predicting heart attacks using demographic data, physiological measurements, and electrocardiogram (ECG) results. We utilized a dataset of 4,000 patient records, combining data from DMCH and Kaggle. Our methodology involved comprehensive data preprocessing, including ECG noise removal and feature selection using the Brouta algorithm. We implemented and compared six machine learning algorithms: Decision Tree, Random Forest, Logistic Regression, Support Vector Machine, XGBoost, and K-Nearest Neighbors. The results demonstrate that our proposed method can accurately predict heart attacks with high sensitivity and specificity. Among the tested algorithms, Random Forest achieved the highest accuracy of 87%, with well-balanced precision (0.86), recall (0.85), and F1-score (0.87). K-Nearest Neighbors and XGBoost also showed strong performance, with accuracies of 81% and 80% respectively. This study contributes to the field by utilizing a large, diverse dataset and providing a comprehensive comparison of multiple algorithms. Our findings suggest the potential for integrating machine learning, particularly Random Forest models, into clinical practice for early heart attack risk assessment, representing a significant step towards improving cardiovascular care through advanced data analysis techniques.

APA, Harvard, Vancouver, ISO, and other styles

44

Md, Alif Sheakh, Sazia Tahosin Mst., Akter Lima, et al. "Comparative analysis of machine learning algorithms for ECG-based heart attack prediction: A study using Bangladeshi patient data." World Journal of Advanced Research and Reviews 23, no. 3 (2024): 2572–84. https://doi.org/10.5281/zenodo.14970377.

Full text

Abstract:

This study aims to identify the most accurate machine learning algorithm for predicting heart attacks using demographic data, physiological measurements, and electrocardiogram (ECG) results. We utilized a dataset of 4,000 patient records, combining data from DMCH and Kaggle. Our methodology involved comprehensive data preprocessing, including ECG noise removal and feature selection using the Brouta algorithm. We implemented and compared six machine learning algorithms: Decision Tree, Random Forest, Logistic Regression, Support Vector Machine, XGBoost, and K-Nearest Neighbors. The results demonstrate that our proposed method can accurately predict heart attacks with high sensitivity and specificity. Among the tested algorithms, Random Forest achieved the highest accuracy of 87%, with well-balanced precision (0.86), recall (0.85), and F1-score (0.87). K-Nearest Neighbors and XGBoost also showed strong performance, with accuracies of 81% and 80% respectively. This study contributes to the field by utilizing a large, diverse dataset and providing a comprehensive comparison of multiple algorithms. Our findings suggest the potential for integrating machine learning, particularly Random Forest models, into clinical practice for early heart attack risk assessment, representing a significant step towards improving cardiovascular care through advanced data analysis techniques.

APA, Harvard, Vancouver, ISO, and other styles

45

P, Subham, Aryan Sinha, Adarsh Kumar, Shantilata Palei, and Puspanjali Mohapatra. "MACHINE LEARNING TECHNIQUES FOR DIABETES CLASSIFICATION." ICTACT Journal on Data Science and Machine Learning 5, no. 4 (2024): 672–79. https://doi.org/10.21917/ijdsml.2024.0140.

Full text

Abstract:

Driven by the explosion in the generation of Biomedical Data and their complexities, Machine learning approaches have been found to be extremely compelling for detection, diagnosis and necessary medical decision making of diseases. The objective of this paper is to investigate the efficiency of various Machine Learning based algorithms in the analysis of very common and fatal disease like Diabetes. These algorithms are not only classifying the diabetic patient into different categories but also, they are advising the diabetic patient suffering from other associated diseases (originated due to diabetes) for immediate medical attention or not. The two datasets used in the study are Pima Indians Diabetes Dataset and 130 US Hospitals Data for the Year 1999-2008. The various Machine Learning algorithms used in the study include Logistic Regression, K- Nearest Neighbors, XGBoost, Decision Tree, Random Forest, Support Vector Machines and Neural Network based MLP Classifiers. The efficacy of the models is tested on the basis of Classification Accuracy and F1 Score. The results are analyzed and compared. It demonstrates that Logistic Regression model outperforms other models in the study of Pima Indian Diabetes Data whereas Neural Network based MLP Classifier outperforms other models in the study of the Diabetes 130-Us Hospitals Data for Years 1999-2008.

APA, Harvard, Vancouver, ISO, and other styles

46

Nguyễn, Văn Thủy. "Using Machine Learning models to predict the on-time graduation status of students." Tạp chí Khoa học và Đào tạo Ngân hàng, no. 255 (August 2023): 52–64. http://dx.doi.org/10.59276/tckhdt.2023.08.2506.

Full text

Abstract:

The study aims to perform optimal Machine Learning model selection to predict the on-time graduation status of students. By using the dataset of students majoring in Banking faculty from the Banking Academy during the period of 2010-2020 through Machine Learning models such as Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, XGBoost, and CatBoost, the study has chosen Random Forest as the optimal model. The research has identified 2 attributes: Academic processing information and Grade Point Average (GPA) of semesters 1 through 4 have a strong impact on the ability of students to graduate on time or late, and proposed some recommendations to help the school provide solutions to improve the graduation rate of students.

APA, Harvard, Vancouver, ISO, and other styles

47

Novianto, Anton, and Mila Desi Anasanti. "Autism Spectrum Disorder (ASD) Identification Using Feature-Based Machine Learning Classification Model." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 17, no. 3 (2023): 259. http://dx.doi.org/10.22146/ijccs.83585.

Full text

Abstract:

Autism Spectrum Disorder (ASD) is a developmental disorder that impairs the development of behaviors, communication, and learning abilities. Early detection of ASD helps patients to get beter training to communicate and interact with others. In this study, we identified ASD and non-ASD individuals using machine learning (ML) approaches. We used Gaussian naive Bayes (NB), k-nearest neighbors (KNN), random forest (RF), logistic regression (LR), Gaussian naive Bayes (NB), support vector machine (SVM) with linear basis function and decision tree (DT). We preprocessed the data using the imputation methods, namely linear regression, Mice forest, and Missforest. We selected the important features using the Simultaneous perturbation feature selection and ranking (SpFSR) technique from all 21 ASD features of three datasets combined (N=1,100 individuals) from University California Irvine (UCI) repository. We evaluated the performance of the method's discrimination, calibration, and clinical utility using a stratified 10-fold cross-validation method. We achieved the highest accuracy possible by using SVM with selected the most important 10 features. We observed the integration of imputation using linear regression, SpFSR and SVM as the most effective models, with an accuracy rate of 100% outperformed the previous studies in ASD prediciton

APA, Harvard, Vancouver, ISO, and other styles

48

Grzesiak, Wilhelm, Daniel Zaborski, Marcin Pluciński, Magdalena Jędrzejczak-Silicka, Renata Pilarczyk, and Piotr Sablik. "The Use of Selected Machine Learning Methods in Dairy Cattle Farming: A Review." Animals 15, no. 14 (2025): 2033. https://doi.org/10.3390/ani15142033.

Full text

Abstract:

The aim of this review was to present selected machine learning (ML) algorithms used in dairy cattle farming in recent years (2020–2024). A description of ML methods (linear and logistic regression, classification and regression trees, chi-squared automatic interaction detection, random forest, AdaBoost, support vector machines, k-nearest neighbors, naive Bayes classifier, multivariate adaptive regression splines, artificial neural networks, including deep neural networks and convolutional neural networks, as well as Gaussian mixture models and cluster analysis), with some examples of their application in various aspects of dairy cattle breeding and husbandry, is provided. In addition, the stages of model construction and implementation, as well as the performance indicators for regression and classification models, are described. Finally, time trends in the popularity of ML methods in dairy cattle farming are briefly discussed.

APA, Harvard, Vancouver, ISO, and other styles

49

D., Suma, Narendra V. G., Raviraja Holla M., and Darshan Holla M. "Morphological features for multi-model rice grain classification." International Journal of Electrical and Computer Engineering (IJECE) 15, no. 3 (2025): 3212. https://doi.org/10.11591/ijece.v15i3.pp3212-3225.

Full text

Abstract:

In the realm of agriculture and food processing, the automated classification of rice grains holds significant importance. The diverse varieties of rice available demand a systematic approach to categorization. This study tackles this challenge by employing diverse machine learning models, including support vector machine (SVM), random forest (RF), logistic regression (LR), decision tree (DT), Gaussian naive Bayes (GNB), and k-nearest neighbors (K-NN). The dataset, sourced from Kaggle, features five distinct rice types: Arborio, Basmati, Ipsala, Jasmine, and Karacadag. After the images undergo preprocessing, a set of 13 distinct morphological features is extracted. These features ensure a comprehensive representation of rice grains for accurate classification. This study aims to create an intelligent system for efficient and precise rice grain classification, contributing to optimizing agricultural and food industry processes. Among the models, K-NN demonstrated the highest classification accuracy at 97.80%, surpassing random forest (97.51%), DT (97.48%), GNB (96.99%), SVM (96.85%), and LR (96.05%). Our proposed K-NN-based classification model achieves an accuracy of 97.8%, demonstrating competitive performance and outclassing several state-of-the-art methods such as artificial neural network (ANN) and modified visual geometry group16 (VGG16) while maintaining simplicity and computational efficiency. This underscores the effectiveness of K-NN and RF in enhancing the precision of rice variety classification.

APA, Harvard, Vancouver, ISO, and other styles

50

Ammar Oad, Zulfikar Ahmed Maher, Imtiaz Hussain Koondhar, Karishima Kumari, and Hammad Bacha. "Optimizing Cardiovascular Risk Assessment with a Soft Voting Classifier Ensemble0." Sir Syed University Research Journal of Engineering & Technology 14, no. 2 (2024): 101–7. https://doi.org/10.33317/ssurj.649.

Full text

Abstract:

According to the latest data from the World Health Organization (WHO), heart disease has been the leading cause of death worldwide for the past several decades. It includes a variety of conditions that affect the heart. In Pakistan heart disease claims the lives of at least thirty people every hour. The best-known application of artificial intelligence is machine learning (ML). It is linked to numerous heart disease risk factors and the necessity of time to acquire sensitive accurate and dependable methods in order to make an early diagnosis. Experimental options have included the UCI repository’s datasets on heart disease (which have 14 attributes) and cardiovascular diseases (12 attributes). The proposed ensemble soft voting classifier employs an ensemble of seven machine learning algorithms to provide binary classification, the Naïve Bayes K Nearest Neighbor SVM Kernel Decision Tree Random Forest Logistic Regression and Support Vector Classifier. The accuracy precision recall and F1_score value is provided by the suggested ensemble method with 70.9% 72.3% 68.6%, 70.1% and Random Forest gives 71.5%, 72.2%, 70.3%, and 71.2%. Rest of classifiers gave average scores. It means the proposed method provided best results while compared with Decision tree, Logistic regression, Support Vector Classifier (SVC), SVM Kernel, K Nearest Neighbor and Naïve Bayes. Only Random forest gives more accuracy than proposed method on cardio heart disease dataset.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!