Log in

Relevant bibliographies by topics / Decision classification trees discriminant random forest / Journal articles

To see the other types of publications on this topic, follow the link: Decision classification trees discriminant random forest.

Journal articles on the topic 'Decision classification trees discriminant random forest'

Author: Grafiati

Published: 4 June 2025

Last updated: 15 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Decision classification trees discriminant random forest.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Koreň, Milan, Rastislav Jakuš, Martin Zápotocký, et al. "Assessment of Machine Learning Algorithms for Modeling the Spatial Distribution of Bark Beetle Infestation." Forests 12, no. 4 (2021): 395. http://dx.doi.org/10.3390/f12040395.

Full text

Abstract:

Machine learning algorithms (MLAs) are used to solve complex non-linear and high-dimensional problems. The objective of this study was to identify the MLA that generates an accurate spatial distribution model of bark beetle (Ips typographus L.) infestation spots. We first evaluated the performance of 2 linear (logistic regression, linear discriminant analysis), 4 non-linear (quadratic discriminant analysis, k-nearest neighbors classifier, Gaussian naive Bayes, support vector classification), and 4 decision trees-based MLAs (decision tree classifier, random forest classifier, extra trees classifier, gradient boosting classifier) for the study area (the Horní Planá region, Czech Republic) for the period 2003–2012. Each MLA was trained and tested on all subsets of the 8 explanatory variables (distance to forest damage spots from previous year, distance to spruce forest edge, potential global solar radiation, normalized difference vegetation index, spruce forest age, percentage of spruce, volume of spruce wood per hectare, stocking). The mean phi coefficient of the model generated by extra trees classifier (ETC) MLA with five explanatory variables for the period was significantly greater than that of most forest damage models generated by the other MLAs. The mean true positive rate of the best ETC-based model was 80.4%, and the mean true negative rate was 80.0%. The spatio-temporal simulations of bark beetle-infested forests based on MLAs and GIS tools will facilitate the development and testing of novel forest management strategies for preventing forest damage in general and bark beetle outbreaks in particular.

APA, Harvard, Vancouver, ISO, and other styles

2

Gómez, Jorge Gómez, Urueta Camilo Parra, Daniel Salas Álvarez, Riaño Velssy Hernández, and Gustavo Ramirez-Gonzalez. "Anemia Classification System Using Machine Learning." Informatics 12, no. 1 (2025): 19. https://doi.org/10.3390/informatics12010019.

Full text

Abstract:

In this study, a system was developed to predict anemia using blood count data and supervised learning algorithms. Anemia, a common condition characterized by low levels of red blood cells or hemoglobin, affects oxygenation and often causes symptoms, such as fatigue and shortness of breath. The diagnosis of anemia often requires laboratory tests, which can be challenging in low-resource areas where anemia is common. We built a supervised learning approach and trained three models (Linear Discriminant Analysis, Decision Trees, and Random Forest) using an anemia dataset from a previous study by Sabatini in 2022. The Random Forest model achieved an accuracy of 99.82%, highlighting its capability to subclassify anemia types (microcytic, normocytic, and macrocytic) with high precision, which is a novel advancement compared to prior studies limited to binary classification (presence/absence of anemia) of the same dataset.

APA, Harvard, Vancouver, ISO, and other styles

3

Asia, Mahdi Naser Alzubaidi, and Salih Al-Shamery Eman. "Projection pursuit Random Forest using discriminant feature analysis model for churners prediction in telecom industry." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 1406–21. https://doi.org/10.11591/ijece.v10i2.pp1406-1421.

Full text

Abstract:

A major and demand issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who attrites from the current provider to competitors searching for better service offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial than gain newly recruited customers. Researchers and practitioners are paying great attention to developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) while, the second method used Linear Discriminant Analysis (LDA) to achieve linear splitting of variables during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It is found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Lift, H-measure, AUC. Moreover, PPForest based on LDA delivers effective evaluators in the prediction model

APA, Harvard, Vancouver, ISO, and other styles

4

Manisha Sharma. "Improving the Accuracy of Epileptic Seizure Detection through EEG Analysis: A Comprehensive Classification Strategy." Journal of Information Systems Engineering and Management 10, no. 28s (2025): 77–85. https://doi.org/10.52783/jisem.v10i28s.4299.

Full text

Abstract:

Epilepsy is a neurological disorder which impacts millions globally and continues to be a major public health challenge. The prompt identification of epileptic seizures is essential for effective treatment. In this study, we present an innovative methodology designed to enhance the accuracy of seizure detection through EEG data analysis. Our strategy involves creating a comprehensive EEG database that includes both healthy individuals and those experiencing seizures (ictal). We utilize a diverse range of classification models, including random forests, decision trees, XGBoost and k-nearest neighbors algorithm. For feature extraction, we have selected Linear Discriminant Analysis (LDA) as our preferred technique. The experimental results indicate that the random forest model is the most effective, achieving a perfect accuracy rate of 100% in detecting epileptic seizures. The decision tree model follows closely with an accuracy of 90.00%. Although the kNN algorithm has a slightly lower accuracy of 82.50%, it still plays a significant role in differentiating between normal and ictal EEG signals. Our results clearly demonstrate the effectiveness of our proposed method in reliably extracting spatial and temporal information from multi-channel EEG data, enabling accurate classification of epileptic seizures. This research highlights the robustness of our feature extraction approach and its potential to improve early diagnosis and treatment of epilepsy.

APA, Harvard, Vancouver, ISO, and other styles

5

Akanbi, Olatunde David, Taiwo Mercy Faloni, and Sunday Olaniyi. "Prediction of Wine Quality: Comparing Machine Learning Models in R Programming." International Journal of Latest Technology in Engineering, Management & Applied Science 11, no. 09 (2022): 01–06. http://dx.doi.org/10.51583/ijltemas.2022.11901.

Full text

Abstract:

The consideration of wine quality before consumption or use is not a new decision scheme across ages, fields, and people. Gone were the days when quality of wine solely depended on taste or other physical checks. In this age of data science and machine learning, we can make decisions on the best wine quality with reference to different features/variables. This work was done with in predicting the dependent variable while using existing models to analyze the independent variables. This work utilizes the R programming language for this prediction, while comparing different machine learning models like Linear regression, Neural network, Naive Bayes Classification, Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF). The provided data was divided into the testing and training portions with parts for validation. It was achieved that Random Forest has a better model for this prediction when cross cross-validated in 10-folds. The accuracy was then used to select the optimal model. Hence, alcohol is the feature variable that contributes more to wine quality while volatile acidity and chloride contribute the least to the quality of wine. This would assist breweries in determining the right additions and subtraction when wine quality is in question

APA, Harvard, Vancouver, ISO, and other styles

6

Krishnarjun, Bora, Pratim Barman Manash, N. Patowary Arnab, and Bora Toralima. "Classification of Assamese Folk Songs' Melody using Supervised Learning Techniques." Indian Journal of Science and Technology 16, no. 2 (2023): 89–96. https://doi.org/10.17485/IJST/v16i2.1686.

Full text

Abstract:

ABSTRACT <strong>Objectives:</strong> A melody is made up of several musical notes or pitches that are joined together to form one whole. This experiment aims to develop four models based on the Mel- frequency Cepstral Coefficients (MFCC) to classify the melodies played on harmonium corresponding to five different class of Assamese folk Music. <strong>Methods:</strong> The melodies of five different categories of Assamese folk songs are selected for classification. With the help of expert musicians, these melodies are played in harmonium and audio samples are recorded in the same acoustic environment. 20 MFCC’s are extracted from each of the samples and classification of the melodies is done using four supervised learning techniques- Decision Tree Classifier, Linear Discriminant Analysis (LDA), Random Forest Classifier, and Support Vector Machine (SVM). <strong>Findings:</strong> The performance of the fitted models are evaluated using different evaluation techniques and presented. A maximum of 94.17% average accuracy score is achieved under Support Vector Machine. The average accuracy scores of Decision Tree Classifier, Linear Discriminant Analysis (LDA), and Random Forest Classifier are 73.58%, 85.58%, and 86.11% respectively. The models are developed based on 250 samples (50 from each type). However, increasing the training sample size, there is a possibility to improve the performances of the other three models also. <strong>Novelty:</strong> The developed approach for identifying the melodies is based on computational techniques. This work will certainly provide a basis for conducting further computational studies in folk music for any community. 

APA, Harvard, Vancouver, ISO, and other styles

7

Njimi, Houssem, Nesrine Chehata, and Frédéric Revers. "Fusion of Dense Airborne LiDAR and Multispectral Sentinel-2 and Pleiades Satellite Imagery for Mapping Riparian Forest Species Biodiversity at Tree Level." Sensors 24, no. 6 (2024): 1753. http://dx.doi.org/10.3390/s24061753.

Full text

Abstract:

Multispectral and 3D LiDAR remote sensing data sources are valuable tools for characterizing the 3D vegetation structure and thus understanding the relationship between forest structure, biodiversity, and microclimate. This study focuses on mapping riparian forest species in the canopy strata using a fusion of Airborne LiDAR data and multispectral multi-source and multi-resolution satellite imagery: Sentinel-2 and Pleiades at tree level. The idea is to assess the contribution of each data source in the tree species classification at the considered level. The data fusion was processed at the feature level and the decision level. At the feature level, LiDAR 2D attributes were derived and combined with multispectral imagery vegetation indices. At the decision level, LiDAR data were used for 3D tree crown delimitation, providing unique trees or groups of trees. The segmented tree crowns were used as a support for an object-based species classification at tree level. Data augmentation techniques were used to improve the training process, and classification was carried out with a random forest classifier. The workflow was entirely automated using a Python script, which allowed the assessment of four different fusion configurations. The best results were obtained by the fusion of Sentinel-2 time series and LiDAR data with a kappa of 0.66, thanks to red edge-based indices that better discriminate vegetation species and the temporal resolution of Sentinel-2 images that allows monitoring the phenological stages, helping to discriminate the species.

APA, Harvard, Vancouver, ISO, and other styles

8

Khan, Haroon, Farzan M. Noori, Anis Yazidi, Md Zia Uddin, M. N. Afzal Khan, and Peyman Mirtaheri. "Classification of Individual Finger Movements from Right Hand Using fNIRS Signals." Sensors 21, no. 23 (2021): 7943. http://dx.doi.org/10.3390/s21237943.

Full text

Abstract:

Functional near-infrared spectroscopy (fNIRS) is a comparatively new noninvasive, portable, and easy-to-use brain imaging modality. However, complicated dexterous tasks such as individual finger-tapping, particularly using one hand, have been not investigated using fNIRS technology. Twenty-four healthy volunteers participated in the individual finger-tapping experiment. Data were acquired from the motor cortex using sixteen sources and sixteen detectors. In this preliminary study, we applied standard fNIRS data processing pipeline, i.e., optical densities conversation, signal processing, feature extraction, and classification algorithm implementation. Physiological and non-physiological noise is removed using 4th order band-pass Butter-worth and 3rd order Savitzky–Golay filters. Eight spatial statistical features were selected: signal-mean, peak, minimum, Skewness, Kurtosis, variance, median, and peak-to-peak form data of oxygenated haemoglobin changes. Sophisticated machine learning algorithms were applied, such as support vector machine (SVM), random forests (RF), decision trees (DT), AdaBoost, quadratic discriminant analysis (QDA), Artificial neural networks (ANN), k-nearest neighbors (kNN), and extreme gradient boosting (XGBoost). The average classification accuracies achieved were 0.75±0.04, 0.75±0.05, and 0.77±0.06 using k-nearest neighbors (kNN), Random forest (RF) and XGBoost, respectively. KNN, RF and XGBoost classifiers performed exceptionally well on such a high-class problem. The results need to be further investigated. In the future, a more in-depth analysis of the signal in both temporal and spatial domains will be conducted to investigate the underlying facts. The accuracies achieved are promising results and could open up a new research direction leading to enrichment of control commands generation for fNIRS-based brain-computer interface applications.

APA, Harvard, Vancouver, ISO, and other styles

9

Atish, S. Tangawade, and A. Muley Aniket. "Classification of Parkinson's Disease Data Using Traditional and Advanced Data Mining Techniques." Indian Journal of Science and Technology 17, no. 11 (2024): 1043–50. https://doi.org/10.17485/IJST/v17i11.3059.

Full text

Abstract:

Abstract <strong>Objectives:</strong> (1) To apply various traditional classification tools, (2) To check effectiveness of the classifiers to the Parkinson Dataset (3) To use boosting classification tools and (4) Compare performance of all used classification tools and find the best accuracy classifier algorithm. Thus, the main aim of the study is to discriminate healthy people from those with PD. <strong>Methods:</strong> The methodology of this study is categorised into three stages:(1) Preprocessing and feature selection; (2) Application of classifiers; (3) Comparative study. We have used secondary dataset of voice recordings originally collected by University of Oxford by Max Little. In first step, the voice data of PD patients is collected for analysis. Then the collected data is normalized using min-max normalization followed by feature extraction. Thus, uses classification Data Mining Techniques viz., KNN, Logistic Regression, Decision Tree, SVM, Random Forest and boosting algorithm etc. to predict whether the person is healthy or has Parkinson’s disease. Finally, comparative analysis is made based on the accuracy provided by different data mining models. <strong>Findings:</strong> Results of our study reveals that GB algorithm is more accurate as compared with other models. It gives the highest accuracy, so that we recommend this algorithm to deal similar kind of studies in the future. These models are very useful in better and exact medical diagnosis and decision making. It is also found that, proposed methods are fully computerized and produce enhanced performance hence can be recommended for similar studies. Here, it is observed that Gradient Boost algorithm provide the best accuracy (100% for training and 92.02% for testing, 98.46% overall). <strong>Novelty:</strong> We have used boosting classification model for the classification of Parkinson’s disease. Our proposed method is one such good example giving faster and more accurate results for the classification of Parkinson’s disease patients with excellent accuracy. We have also compared the results with other existing approaches like linear discriminant analysis, support vector machine, K-nearest neighbour, decision tree, classification and regression trees, random forest, linear regression, logistic regression and Naive Bayes, but our proposed techniques were superior to existing studies in which gradient boost algorithm yielded an accuracy of 98.46%, so our method can be used as an effective means of computer-aided diagnosis of PD, and has important practical value. <strong>Keywords:</strong> Data Mining, Parkinson's Disease, Classification, Boosting Algorithms, Feature Selection

APA, Harvard, Vancouver, ISO, and other styles

10

Elisabeth, Thomas, Saji Arjun, M. S. Aswin, Salas Augustine, and Viju Emil. "A Comprehensive Review of Advancing Cattle Monitoring and Behavior Classification using Deep Learning." International Journal on Emerging Research Areas (IJERA) 04, no. 02 (2025): 7–12. https://doi.org/10.5281/zenodo.14642932.

Full text

Abstract:

This paper explores the application of deep learning and image processing techniques for cattle disease detection and pose estimation, drawing insights from various research papers. The use of wearable sensors embedded in collars emerges as a prominent method for monitoring cattle behavior and health. These sensors, particularly accelerometers, effectively capture movement data, facilitating the identification of behaviors like grazing, resting, walking, and ruminating. Several studies utilize supervised machine learning algorithms such as Random Forest, Decision Trees, and Linear Discriminant Analysis to classify these behaviors with high accuracy. Further, deep learning models, especially Convolutional Neural Networks (CNNs), demonstrate remarkable capabilities in detecting specific cattle diseases.YOLOv5, known for its speed and accuracy, proves effective in cattle detection.Image preprocessing techniques, including grayscale conversion, noise removal, and data augmentation, enhance the accuracy and robustness of these models. Additionally,pose estimation techniques like OpenPifPaf, combined with angle calculations between joints, provide valuable insights into cattle posture and aid in the early detection of lameness. The integration of these advanced technologies presents a significant opportunity to advance precision livestock farming practices. Early disease detection and efficient behavior monitoring can contribute to improved animal welfare, optimized farm management, and enhanced productivity in the cattle industry.

APA, Harvard, Vancouver, ISO, and other styles

11

Hozhyi, O. P., O. O. Zhebko, I. O. Kalinina та T. A. Hannichenko. "Іntelligent classification system based on ensemble methods". System technologies 3, № 146 (2023): 61–75. http://dx.doi.org/10.34185/1562-9945-3-146-2023-07.

Full text

Abstract:

In the paper, based on machine learning methods, the solution of the classification task was investigated using a two-level structure of ensembles of models. To improve forecasting results, an ensemble approach was used: several basic models were trained to solve the same problem, with subsequent aggregation and improvement of the ob-tained results. The problem of classification was studied. The architecture of the intelli-gent classification system is proposed. The system consists of the following components: a subsystem of preprocessing and data analysis, a subsystem of data distribution, a subsystem of building basic models, a subsystem of building and evaluating ensembles of models. A two-level ensemble structure was used to find a compromise between bias and variance inherent in machine learning models. At the first level, an ensemble based on stacking is implemented using a logistic regression model as a metamodel. The pre-dictions that are generated by the underlying models are used as input for training in the first layer. The following basic models of the first layer were chosen: decision trees (DecisionTree), naive Bayesian classifier (NB), quadratic discriminant analysis (QDA), logistic regression (LR), support vector method (SVM), random forest model (RF). The bagging method based on the Bagged CART algorithm was used in the second layer. The algorithm creates N regression trees using M initial training sets and averages the re-sulting predictions. As the basic models of the second layer, the following were chosen: the first-level model (Stacking LR), the model of artificial neural networks (ANN); the linear discriminant analysis (LDA) model and the nearest neighbor (KNN) model. A study of basic classification models and ensemble models based on stacking and bag-ging, as well as metrics for evaluating the effectiveness of the use of basic classifiers and models of the first and second level, was conducted. The following parameters were de-termined for all the methods in the work: prediction accuracy and error rate, Kappa statistic, sensitivity and specificity, accuracy and completeness, F-measure and area under the ROC curve. The advantages and effectiveness of the ensemble of models in comparison with each basic model are determined.

APA, Harvard, Vancouver, ISO, and other styles

12

Semakula, Jimmy, Rene A. Corner-Thomas, Stephen T. Morris, Hugh T. Blair, and Paul R. Kenyon. "Application of Machine Learning Algorithms to Predict Body Condition Score from Liveweight Records of Mature Romney Ewes." Agriculture 11, no. 2 (2021): 162. http://dx.doi.org/10.3390/agriculture11020162.

Full text

Abstract:

Body condition score (BCS) in sheep (Ovis aries) is a widely used subjective measure of the degree of soft tissue coverage. Body condition score and liveweight are statistically related in ewes; therefore, it was hypothesized that BCS could be accurately predicted from liveweight using machine learning models. Individual ewe liveweight and body condition score data at each stage of the annual cycle (pre-breeding, pregnancy diagnosis, pre-lambing and weaning) at 43 to 54 months of age were used. Nine machine learning (ML) algorithms (ordinal logistic regression, multinomial regression, linear discriminant analysis, classification and regression tree, random forest, k-nearest neighbors, support vector machine, neural networks and gradient boosting decision trees) were applied to predict BCS from a ewe’s current and previous liveweight record. A three class BCS (1.0–2.0, 2.5–3.5, >3.5) scale was used due to high-class imbalance in the five-scale BCS data. The results showed that using ML to predict ewe BCS at 43 to 54 months of age from current and previous liveweight could be achieved with high accuracy (>85%) across all stages of the annual cycle. The gradient boosting decision tree algorithm (XGB) was the most efficient for BCS prediction regardless of season. All models had balanced specificity and sensitivity. The findings suggest that there is potential for predicting ewe BCS from liveweight using classification machine learning algorithms.

APA, Harvard, Vancouver, ISO, and other styles

13

Carrillo, Jeniffer Katerine, Cristhian Manuel Durán, Juan Martin Cáceres, et al. "Assessment of E-Senses Performance through Machine Learning Models for Colombian Herbal Teas Classification." Chemosensors 11, no. 7 (2023): 354. http://dx.doi.org/10.3390/chemosensors11070354.

Full text

Abstract:

This paper describes different E-Senses systems, such as Electronic Nose, Electronic Tongue, and Electronic Eyes, which were used to build several machine learning models and assess their performance in classifying a variety of Colombian herbal tea brands such as Albahaca, Frutos Verdes, Jaibel, Toronjil, and Toute. To do this, a set of Colombian herbal tea samples were previously acquired from the instruments and processed through multivariate data analysis techniques (principal component analysis and linear discriminant analysis) to feed the support vector machine, K-nearest neighbors, decision trees, naive Bayes, and random forests algorithms. The results of the E-Senses were validated using HS-SPME-GC-MS analysis. The best machine learning models from the different classification methods reached a 100% success rate in classifying the samples. The proposal of this study was to enhance the classification of Colombian herbal teas using three sensory perception systems. This was achieved by consolidating the data obtained from the collected samples.

APA, Harvard, Vancouver, ISO, and other styles

14

T., Ahila, and A. C. Subha Jini. "A Comparative Study of HARR Feature Extraction and Machine Learning Algorithms for Covid-19 X-Ray Image Classification." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 6s (2023): 475–82. http://dx.doi.org/10.17762/ijritcc.v11i6s.6955.

Full text

Abstract:

In this study, we investigated how effectively COVID-19 image categorization using Harr feature extraction and machine learning algorithms. We were particularly interested in the effectiveness of these algorithms. A dataset of 500 X-ray scans, equally split between 250 COVID-19-positive cases and 250 healthy controls, served as the basis for our study. K-nearest neighbors,decision tree, Linear regression, support vector machine, regression, classification, naive Bayes,random forest, as well as linear discriminant analysis were among the seven machine-learning approaches used to categorize the photos. With the use of Harr feature extraction, the features of the pictures were extracted. We studied the efficacy of COVID-19 X-ray images for classification utilizing the combination of machine learning as well as the Harr feature extraction methods in the present investigation due to their effectiveness. We searched a database of 500 X-rays for this investigation, dividing them equally between groups of 250 patients with COVID-19-positive cases and 250 healthy people. Following that, the images were examined using seven various machine learning approaches for recognition. These methods included naive Bayes, linear discriminant analysis, random forests, classification,k-nearest neighbors, and regression trees. The information from the photos was gathered using the Harr feature extraction method. The effectiveness of the algorithms was evaluated with the help of a variety of metrics, such asF1 score, precision,accuracy, recall, the area under the ROC curve, and the region of interest curve. According to our research, the Support Vector Machine algorithm had the highest accuracy, at 77%, while the Naive Bayes approach had the lowest accuracy, at 58%. By using machine learning and Harr feature extraction approaches, the Random Forest method yields the best results, based on our research. The development of future COVID-19 X-ray image-based automated diagnostic systems may be influenced by these findings. Results from the suggested model were comparable to those of cutting-edge models trained using transfer learning techniques. The proposed model's main advantage is that it has ten times fewer parameters than the most advanced models.A receiver operating characteristic (ROC) curve's F1 score, and the algorithms' accuracy, precision, the area under the curve, and recall were all used as metrics. According to our findings, the Naive Bayes method gained the least accuracy (58%) and the Support Vector Machine method produced the highest accuracy (77%) when used. Our results reveal that employing Harr feature extraction and machine learning techniques, the Random Forest strategy is the most successful way to recognize COVID-19 X-ray pictures. These findings may be pertinent to the development of automated COVID-19 diagnosis tools relying on X-ray images. The recommended model produced results that were competitive when measured against cutting-edge models trained using transfer learning techniques. The suggested model employs 10 times fewer parameters than the most advanced models, which is its key selling point.

APA, Harvard, Vancouver, ISO, and other styles

15

Kazangirler, Buse Yaren, and Emrah Özkaynak. "Conventional Machine Learning and Ensemble Learning Techniques in Cardiovascular Disease Prediction and Analysis." Journal of Intelligent Systems: Theory and Applications 7, no. 2 (2024): 81–94. http://dx.doi.org/10.38016/jista.1439504.

Full text

Abstract:

Cardiovascular diseases, which significantly affect the heart and blood vessels, are one of the leading causes of death worldwide. Early diagnosis and treatment of these diseases, which cause approximately 19.1 million deaths, are essential. Many problems, such as coronary artery disease, blood vessel disease, irregular heartbeat, heart muscle disease, heart valve problems, and congenital heart defects, are included in this disease definition. Today, researchers in the field of cardiovascular disease are using approaches based on diagnosis-oriented machine learning. In this study, feature extraction is performed for the detection of cardiovascular disease, and classification processes are performed with a Support Vector Machine, Naive Bayes, Decision Tree, K-Nearest Neighbor, Bagging Classifier, Random Forest, Gradient Boosting, Logistic Regression, AdaBoost, Linear Discriminant Analysis and Artificial Neural Networks methods. A total of 918 observations from Cleveland, Hungarian Institute of Cardiology, University Hospitals of Switzerland, and Zurich, VA Medical Center were included in the study. Principal Component Analysis, a dimensionality reduction method, was used to reduce the number of features in the dataset. In the experimental findings, feature increase with artificial variables was also performed and used in the classifiers in addition to feature reduction. Support Vector Machines, Decision Trees, Grid Search Cross Validation, and existing various Bagging and Boosting techniques have been used to improve algorithm performance in disease classification. Gaussian Naïve Bayes was the highest-performing algorithm among the compared methods, with 91.0% accuracy on a weighted average basis as a result of a 3.0% improvement.

APA, Harvard, Vancouver, ISO, and other styles

16

Goldstein, Daniel, Chris Aldrich, Quanxi Shao, and Louisa O'Connor. "A Machine Learning Classification Approach to Geotechnical Characterization Using Measure-While-Drilling Data." Geosciences 15, no. 3 (2025): 93. https://doi.org/10.3390/geosciences15030093.

Full text

Abstract:

Bench-scale geotechnical characterization often suffers from high uncertainty, reducing confidence in geotechnical analysis on account of expensive resource development drilling and mapping. The Measure-While-Drilling (MWD) system uses sensors to collect the drilling data from open-pit blast hole drill rigs. Historically, the focus of MWD studies was on penetration rates to identify rock formations during drilling. This study explores the effectiveness of Artificial Intelligence (AI) classification models using MWD data to predict geotechnical categories, including stratigraphic unit, rock/soil strength, rock type, Geological Strength Index, and weathering properties. Feature importance algorithms, Minimum Redundancy Maximum Relevance and ReliefF, identified all MWD responses as influential, leading to their inclusion in Machine Learning (ML) models. ML algorithms tested included Decision Trees, Support Vector Machines (SVMs), Naive Bayes, Random Forests (RFs), K-Nearest Neighbors (KNNs), Linear Discriminant Analysis. KNN, SVMs, and RFs achieved up to 97% accuracy, outperforming other models. Prediction performance varied with class distribution, with balanced datasets showing wider accuracy ranges and skewed datasets achieving higher accuracies. The findings demonstrate a robust framework for applying AI to real-time orebody characterization, offering valuable insights for geotechnical engineers and geologists in improving orebody prediction and analysis

APA, Harvard, Vancouver, ISO, and other styles

17

Khan, Asfandyar, Abdullah Khan, Muhammad Muntazir Khan, Kamran Farid, Muhammad Mansoor Alam, and Mazliham Bin Mohd Su’ud. "Cardiovascular and Diabetes Diseases Classification Using Ensemble Stacking Classifiers with SVM as a Meta Classifier." Diagnostics 12, no. 11 (2022): 2595. http://dx.doi.org/10.3390/diagnostics12112595.

Full text

Abstract:

Cardiovascular disease includes coronary artery diseases (CAD), which include angina and myocardial infarction (commonly known as a heart attack), and coronary heart diseases (CHD), which are marked by the buildup of a waxy material called plaque inside the coronary arteries. Heart attacks are still the main cause of death worldwide, and if not treated right they have the potential to cause major health problems, such as diabetes. If ignored, diabetes can result in a variety of health problems, including heart disease, stroke, blindness, and kidney failure. Machine learning methods can be used to identify and diagnose diabetes and other illnesses. Diabetes and cardiovascular disease both can be diagnosed using several classifier types. Naive Bayes, K-Nearest neighbor (KNN), linear regression, decision trees (DT), and support vector machines (SVM) were among the classifiers employed, although all of these models had poor accuracy. Therefore, due to a lack of significant effort and poor accuracy, new research is required to diagnose diabetes and cardiovascular disease. This study developed an ensemble approach called “Stacking Classifier” in order to improve the performance of integrated flexible individual classifiers and decrease the likelihood of misclassifying a single instance. Naive Bayes, KNN, Linear Discriminant Analysis (LDA), and Decision Tree (DT) are just a few of the classifiers used in this study. As a meta-classifier, Random Forest and SVM are used. The suggested stacking classifier obtains a superior accuracy of 0.9735 percent when compared to current models for diagnosing diabetes, such as Naive Bayes, KNN, DT, and LDA, which are 0.7646 percent, 0.7460 percent, 0.7857 percent, and 0.7735 percent, respectively. Furthermore, for cardiovascular disease, when compared to current models such as KNN, NB, DT, LDA, and SVM, which are 0.8377 percent, 0.8256 percent, 0.8426 percent, 0.8523 percent, and 0.8472 percent, respectively, the suggested stacking classifier performed better and obtained a higher accuracy of 0.8871 percent.

APA, Harvard, Vancouver, ISO, and other styles

18

Musial, Jan Pawel, and Jedrzej Stanislaw Bojanowski. "Comparison of the Novel Probabilistic Self-Optimizing Vectorized Earth Observation Retrieval Classifier with Common Machine Learning Algorithms." Remote Sensing 14, no. 2 (2022): 378. http://dx.doi.org/10.3390/rs14020378.

Full text

Abstract:

The Vectorized Earth Observation Retrieval (VEOR) algorithm is a novel algorithm suited to the efficient supervised classification of large Earth Observation (EO) datasets. VEOR addresses shortcomings in well-established machine learning methods with an emphasis on numerical performance. Its characteristics include (1) derivation of classification probability; (2) objective selection of classification features that maximize Cohen’s kappa coefficient (κ) derived from iterative “leave-one-out” cross-validation; (3) reduced sensitivity of the classification results to imbalanced classes; (4) smoothing of the classification probability field to reduce noise/mislabeling; (5) numerically efficient retrieval based on a pre-computed look-up vector (LUV); and (6) separate parametrization of the algorithm for each discrete feature class (e.g., land cover). Within this study, the performance of the VEOR classifier was compared to other commonly used machine learning algorithms: K-nearest neighbors, support vector machines, Gaussian process, decision trees, random forest, artificial neural networks, AdaBoost, Naive Bayes and Quadratic Discriminant Analysis. Firstly, the comparison was performed using synthetic 2D (two-dimensional) datasets featuring different sample sizes, levels of noise (i.e., mislabeling) and class imbalance. Secondly, the same experiments were repeated for 7D datasets consisting of informative, redundant and insignificant features. Ultimately, the benchmarking of the classifiers involved cloud discrimination using MODIS satellite spectral measurements and a reference cloud mask derived from combined CALIOP lidar and CPR radar data. The results revealed that the proposed VEOR algorithm accurately discriminated cloud cover using MODIS data and accurately classified large synthetic datasets with low or moderate levels of noise and class imbalance. On the contrary, VEOR did not feature good classification skills for significantly distorted or for small datasets. Nevertheless, the comparisons performed proved that VEOR was within the 3–4 most accurate classifiers and that it can be applied to large Earth Observation datasets.

APA, Harvard, Vancouver, ISO, and other styles

19

Nguyen Duc, Phong, Son Nguyen Manh, Ha Nguyen Manh, et al. "Machine learning and deep learning models applied to identification and classification of mango." Heavy metals and arsenic concentrations in water, agricultural soil, and rice in Ngan Son district, Bac Kan province, Vietnam 7, no. 3 (2024): 429–37. http://dx.doi.org/10.47866/2615-9252/vjfc.4370.

Full text

Abstract:

This study utilizes the data published on the website https://data.mendeley.com/ datasets/46htwnp833/2, which includes visible-near-infrared (Vis-NIR) spectral data at wavelengths ranging from 309 nm to 1149 nm for 11691 mangoes in Australia, collected from 10 mango varieties across 2 different growing regions. The research developed machine learning models with open-source programming language Python such as: principal component analysis (PCA) combined with support vector machines (SVM), decision trees (DT), random forests (RF), and artificial neural networks (ANN); partial least squares model combined with discriminant analysis (PLS-DA); and a deep learning model 1-dimensional convolutional neural network (1D-CNN). The preprocessing steps were caried out based on the full spectral data with second derivative, smoothing using the Savitzky-Golay algorithm, and data balancing via a new Synthetic Minority Oversampling Technique (SMOTE). The results demonstrated that applying the SMOTE data preprocessing technique before running the machine learning models significantly enhanced classification accuracy. Furthermore, using a 1D-CNN model with a complex structure provided higher classification efficiency than conventional machine learning models. The accuracy of the 1D-CNN model in classifying mango ripeness, mango variety, and growing location was 99.40%, 94.35%, and 96.92%, respectively. The 1D-CNN deep learning model is well-suited for sample classification when dealing with large datasets containing tens of thousands of samples based on spectral data.

APA, Harvard, Vancouver, ISO, and other styles

20

Wongvibulsin, Shannon, Katherine C. Wu, and Scott L. Zeger. "Improving Clinical Translation of Machine Learning Approaches Through Clinician-Tailored Visual Displays of Black Box Algorithms: Development and Validation." JMIR Medical Informatics 8, no. 6 (2020): e15791. http://dx.doi.org/10.2196/15791.

Full text

Abstract:

Background Despite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and black box nature of these algorithms. Objective The study aimed to demonstrate a general and simple framework for generating clinically relevant and interpretable visualizations of black box predictions to aid in the clinical translation of ML. Methods To obtain improved transparency of ML, simplified models and visual displays can be generated using common methods from clinical practice such as decision trees and effect plots. We illustrated the approach based on postprocessing of ML predictions, in this case random forest predictions, and applied the method to data from the Left Ventricular (LV) Structural Predictors of Sudden Cardiac Death (SCD) Registry for individualized risk prediction of SCD, a leading cause of death. Results With the LV Structural Predictors of SCD Registry data, SCD risk predictions are obtained from a random forest algorithm that identifies the most important predictors, nonlinearities, and interactions among a large number of variables while naturally accounting for missing data. The black box predictions are postprocessed using classification and regression trees into a clinically relevant and interpretable visualization. The method also quantifies the relative importance of an individual or a combination of predictors. Several risk factors (heart failure hospitalization, cardiac magnetic resonance imaging indices, and serum concentration of systemic inflammation) can be clearly visualized as branch points of a decision tree to discriminate between low-, intermediate-, and high-risk patients. Conclusions Through a clinically important example, we illustrate a general and simple approach to increase the clinical translation of ML through clinician-tailored visual displays of results from black box algorithms. We illustrate this general model-agnostic framework by applying it to SCD risk prediction. Although we illustrate the methods using SCD prediction with random forest, the methods presented are applicable more broadly to improving the clinical translation of ML, regardless of the specific ML algorithm or clinical application. As any trained predictive model can be summarized in this manner to a prespecified level of precision, we encourage the use of simplified visual displays as an adjunct to the complex predictive model. Overall, this framework can allow clinicians to peek inside the black box and develop a deeper understanding of the most important features from a model to gain trust in the predictions and confidence in applying them to clinical care.

APA, Harvard, Vancouver, ISO, and other styles

21

Jeong, Dong-Hwa, Se-Eun Kim, Woo-Hyeok Choi, and Seong-Ho Ahn. "A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset." Healthcare 10, no. 7 (2022): 1255. http://dx.doi.org/10.3390/healthcare10071255.

Full text

Abstract:

Accelerometer data collected from wearable devices have recently been used to monitor physical activities (PAs) in daily life. While the intensity of PAs can be distinguished with a cut-off approach, it is important to discriminate different behaviors with similar accelerometry patterns to estimate energy expenditure. We aim to overcome the data imbalance problem that negatively affects machine learning-based PA classification by extracting well-defined features and applying undersampling and oversampling methods. We extracted various temporal, spectral, and nonlinear features from wrist-, hip-, and ankle-worn accelerometer data. Then, the influences of undersampilng and oversampling were compared using various ML and DL approaches. Among various ML and DL models, ensemble methods including random forest (RF) and adaptive boosting (AdaBoost) exhibited great performance in differentiating sedentary behavior (driving) and three walking types (walking on level ground, ascending stairs, and descending stairs) even in a cross-subject paradigm. The undersampling approach, which has a low computational cost, exhibited classification results unbiased to the majority class. In addition, we found that RF could automatically select relevant features for PA classification depending on the sensor location by examining the importance of each node in multiple decision trees (DTs). This study proposes that ensemble learning using well-defined feature sets combined with the undersampling approach is robust for imbalanced datasets in PA classification. This approach will be useful for PA classification in the free-living situation, where data imbalance problems between classes are common.

APA, Harvard, Vancouver, ISO, and other styles

22

Naseri, Hamed, E. Owen D. Waygood, Bobin Wang, Zachary Patterson, and Ricardo A. Daziano. "A Novel Feature Selection Technique to Better Predict Climate Change Stage of Change." Sustainability 14, no. 1 (2021): 40. http://dx.doi.org/10.3390/su14010040.

Full text

Abstract:

Indications of people’s environmental concern are linked to transport decisions and can provide great support for policymaking on climate change. This study aims to better predict individual climate change stage of change (CC-SoC) based on different features of transport-related behavior, General Ecological Behavior, New Environmental Paradigm, and socio-demographic characteristics. Together these sources result in over 100 possible features that indicate someone’s level of environmental concern. Such a large number of features may create several analytical problems, such as overfitting, accuracy reduction, and high computational costs. To this end, a new feature selection technique, named the Coyote Optimization Algorithm-Quadratic Discriminant Analysis (COA-QDA), is first proposed to find the optimal features to predict CC-SoC with the highest accuracy. Different conventional feature selection methods (Lasso, Elastic Net, Random Forest Feature Selection, Extra Trees, and Principal Component Analysis Feature Selection) are employed to compare with the COA-QDA. Afterward, eight classification techniques are applied to solve the prediction problem. Finally, a sensitivity analysis is performed to determine the most important features affecting the prediction of CC-SoC. The results indicate that COA-QDA outperforms conventional feature selection methods by increasing average testing data accuracy from 0.7% to 5.6%. Logistic Regression surpasses other classifiers with the highest prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

23

de Oliveira Matias, Ítalo, Patrícia Carneiro Genovez, Sarah Barrón Torres, et al. "Improved Classification Models to Distinguish Natural from Anthropic Oil Slicks in the Gulf of Mexico: Seasonality and Radarsat-2 Beam Mode Effects under a Machine Learning Approach." Remote Sensing 13, no. 22 (2021): 4568. http://dx.doi.org/10.3390/rs13224568.

Full text

Abstract:

Distinguishing between natural and anthropic oil slicks is a challenging task, especially in the Gulf of Mexico, where these events can be simultaneously observed and recognized as seeps or spills. In this study, a powerful data analysis provided by machine learning (ML) methods was employed to develop, test, and implement a classification model (CM) to distinguish an oil slick source (OSS) as natural or anthropic. A robust database containing 4916 validated oil samples, detected using synthetic aperture radar (SAR), was employed for this task. Six ML algorithms were evaluated, including artificial neural networks (ANN), random forest (RF), decision trees (DT), naive Bayes (NB), linear discriminant analysis (LDA), and logistic regression (LR). Using RF, the global CM achieved a maximum accuracy value of 73.15. An innovative approach evaluated how external factors, such as seasonality, satellite configurations, and the synergy between them, limit or improve OSS predictions. To accomplish this, specific classification models (SCMs) were derived from the global ones (CMs), tuning the best algorithms and parameters according to different scenarios. Median accuracies revealed winter and spring to be the best seasons and ScanSAR Narrow B (SCNB) as the best beam mode. The maximum median accuracy to distinguish seeps from spills was achieved in winter using SCNB (83.05). Among the tested algorithms, RF was the most robust, with a better performance in 81% of the investigated scenarios. The accuracy increment provided by the well-fitted models may minimize the confusion between seeps and spills. This represents a concrete contribution to reducing economic and geologic risks derived from exploration activities in offshore areas. Additionally, from an operational standpoint, specific models support specialists to select the best SAR products and seasons for new acquisitions, as well as to optimize performances according to the available data.

APA, Harvard, Vancouver, ISO, and other styles

24

Mota, Matheus Jhonnata Santos, Alberto Calson Alves Vieira, Lucas Silva Lima, et al. "Sex determination based on craniometric parameters: a comparative approach between linear and non-linear machine learning algorithms." Journal Archives of Health 5, no. 1 (2024): 634–51. http://dx.doi.org/10.46919/archv5n1-042.

Full text

Abstract:

Introduction: Determining sex based on cranial characteristics is of great relevance in forensic anthropology. Most studies have employed linear methods (such as logistic regression) for this estimation with accuracies around 70%, rarely exceeding 90% accuracy. Several authors have tested non-linear models such as neural networks, support vector machines, and decision trees with good results, surpassing linear models. Objective: To compare linear models (logistic regression, linear regression, and linear discriminant analysis) with non-linear models (neural networks, extreme gradient boosting, support vector machine, naive Bayes, random forest, decision tree, k-nearest neighbors, and adaptive multivariate spline regression). Materials and Methods: 241 skulls used in this study were obtained from the collection of Center for Study and Research in Anatomy and Forensic Anthropology at Tiradentes University, Farolândia campus in Aracaju, Sergipe. Each skull in the collection has secure detailed records. Eighty-nine skulls with signs of craniotomy (n=58) or damage (n=30) and one unidentified were excluded. The 152 eligible skulls underwent cranial measurements. Using the Anaconda platform and Jupyter editor, the data were divided into a training group (80% of the sample) and then were tested (20% of the sample). Eleven machine learning algorithms, including both linear and non-linear models, were applied. Results: The best machine learning algorithm was a neural network with average accuracy of 93%, after 50 runs. The difference to logistic regression, which had an accuracy of 68%, was significantly (p-value of 0.01016). Conclusion: This study demonstrated the potential of the neural network for solving the sex classification problem. The study has a limitation in that neural networks perform better with a large volume of data, and this study used data from a single center. Nevertheless, in the future, more studies should be conducted testing neural networks with larger samples and skulls from other continents.

APA, Harvard, Vancouver, ISO, and other styles

25

Md. Sadiq Iqbal and Mohammod Abul Kashem. "A Machine Learning Framework for Identifying Sources of AI-Generated Text." Statistics, Optimization & Information Computing 13, no. 5 (2025): 2186–204. https://doi.org/10.19139/soic-2310-5070-2225.

Full text

Abstract:

The rise of AI-generated text requires efficient identification methods to ascertain its origin. This research presents a comprehensive dataset derived from responses to various questions posed to AI models including ChatGPT, Gemini, DeepAI, and Bing, alongside human respondents. We meticulously preprocessed the dataset and utilized both manual methods such as Count Vector (CV), Bag of Words (BoW), and Hashing Vectorization (HV), as well as automated Deep Learning (DL) models like Bidirectional Encoder Representations from Transformers (BERT), Extreme Language understanding Network (XLNet), Enhanced Representation through Knowledge Integration (ERNIE), and Generative Pre-Trained Transformers (GPT) to convert text into features. These features are then used to train multiple Machine Learning (ML) classifiers, including Support Vector Machines (SVM), Logistic Regression (LR), Decision Trees (DT), Random Forests (RF), Naive Bayes (NB), and Extreme Gradient Boosting (XGB). This research also uses Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) to maximize the classification accuracy of ML models. Remarkably, the combination of HV with LDA and XGB achieved the highest accuracy of 99.40\%. Further evaluation using precision, recall, f1 score, specificity with Confusion Matrix (CM) and Receiver operating characteristic (ROC) Curve confirmed its superior performance, while Explainable Artificial Intelligence (XAI) tools such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) techniques are employed to explain the model's outputs, ensuring transparency and interpretability.

APA, Harvard, Vancouver, ISO, and other styles

26

Ruske, Simon, David O. Topping, Virginia E. Foot, et al. "Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer." Atmospheric Measurement Techniques 10, no. 2 (2017): 695–708. http://dx.doi.org/10.5194/amt-10-695-2017.

Full text

Abstract:

Abstract. Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.

APA, Harvard, Vancouver, ISO, and other styles

27

Logishetty, Kartik, Eros Montin, Srikar Namireddy, et al. "AUTOMATED MACHINE LEARNING-BASED MRI RADIOMIC ANALYSIS TO IDENTIFY AND DIAGNOSE PATIENTS WITH FEMOROACETABULAR IMPINGEMENT." Orthopaedic Proceedings 107-B, SUPP_2 (2025): 7. https://doi.org/10.1302/1358-992x.2025.2.007.

Full text

Abstract:

Diagnosing femoroacetabular impingement (FAI) using manual radiographic, CT and MRI analysis is user-dependent and has only moderate accuracy. We leverage radiomics and machine learning techniques—proven successful in oncology—to automatically extract quantitative MRI features beyond the human eye: first-order statistics, shape-based metrics, and texture matrices. We aimed to differentiate healthy, symptomatic, and asymptomatic hips; measuring reliability and generalisability through external validation.We investigated femoroacetabular impingement (FAI) using three distinct cohorts: 1) a primary cohort of 31 FAI patients (22F, mean age 36) imaged with/without contrast on 3T MRI using four-phase Dixon sequences; 2) healthy volunteers (20 patients); 3) cohort of asymptomatic FAI (31 patients); compared with an external validation cohort of 185 monolateral symptomatic FAI patients’ water-only Dixon images.Automated segmentation (Total Segmentator and STAPLE algorithm) isolated key hip structures. PyRadiomics extracted 6,752 radiomic features across ROIs after 1mm isotropic resampling.Ten machine learning models were evaluated using four-fold cross-validation, maintaining patient-level data separation between training/validation/testing sets. This yielded 480 unique models (10 classifiers × 48 feature sets), with top performers validated against the test set (three-label classification) and cohort 3 (symptomatic identification).Seven out of ten models achieved accuracy >0.85. Notably, Random Forest Classifier (RFC), Gradient Boosting Classifier (GBC), Extra Trees Classifier (ETC), and Bagging Classifier (BC) attained perfect accuracy at least once using fewer than 20 features. In the external validation cohort, eight models also surpassed 0.85 accuracy, including five methods—Support Vector Classifier (SVC), Decision Tree Classifier (DTC), Gaussian Naive Bayes (GNB), Quadratic Discriminant Analysis (QDA), and K-Nearest Neighbours (KNN)— with perfect accuracy though with increased variability.This study highlights the effectiveness of 20 novel MRI-based radiomic features to reliably differentiate between symptomatic FAI, asymptomatic FAI, and healthy hips. Decision Tree Classifier and Bagging Classifier predicted correctly 100% of the FAI patients even with varied imaging protocols. Additionally, automated radiomic selection identified novel gluteal muscle features specific to symptomatic FAI, improving the diagnostic yield of MRI beyond bone, cartilage and labrum in FAI.

APA, Harvard, Vancouver, ISO, and other styles

28

Ayyad, Sarah M., Mohamed A. Badawy, Mohamed Shehata, et al. "A New Framework for Precise Identification of Prostatic Adenocarcinoma." Sensors 22, no. 5 (2022): 1848. http://dx.doi.org/10.3390/s22051848.

Full text

Abstract:

Prostate cancer, which is also known as prostatic adenocarcinoma, is an unconstrained growth of epithelial cells in the prostate and has become one of the leading causes of cancer-related death worldwide. The survival of patients with prostate cancer relies on detection at an early, treatable stage. In this paper, we introduce a new comprehensive framework to precisely differentiate between malignant and benign prostate cancer. This framework proposes a noninvasive computer-aided diagnosis system that integrates two imaging modalities of MR (diffusion-weighted (DW) and T2-weighted (T2W)). For the first time, it utilizes the combination of functional features represented by apparent diffusion coefficient (ADC) maps estimated from DW-MRI for the whole prostate in combination with texture features with its first- and second-order representations, extracted from T2W-MRIs of the whole prostate, and shape features represented by spherical harmonics constructed for the lesion inside the prostate and integrated with PSA screening results. The dataset presented in the paper includes 80 biopsy confirmed patients, with a mean age of 65.7 years (43 benign prostatic hyperplasia, 37 prostatic carcinomas). Experiments were conducted using different well-known machine learning approaches including support vector machines (SVM), random forests (RF), decision trees (DT), and linear discriminant analysis (LDA) classification models to study the impact of different feature sets that lead to better identification of prostatic adenocarcinoma. Using a leave-one-out cross-validation approach, the diagnostic results obtained using the SVM classification model along with the combined feature set after applying feature selection (88.75% accuracy, 81.08% sensitivity, 95.35% specificity, and 0.8821 AUC) indicated that the system’s performance, after integrating and reducing different types of feature sets, obtained an enhanced diagnostic performance compared with each individual feature set and other machine learning classifiers. In addition, the developed diagnostic system provided consistent diagnostic performance using 10-fold and 5-fold cross-validation approaches, which confirms the reliability, generalization ability, and robustness of the developed system.

APA, Harvard, Vancouver, ISO, and other styles

29

Ashraf, Imran, Soojung Hur, and Yongwan Park. "MagIO: Magnetic Field Strength Based Indoor- Outdoor Detection with a Commercial Smartphone." Micromachines 9, no. 10 (2018): 534. http://dx.doi.org/10.3390/mi9100534.

Full text

Abstract:

A wide range of localization techniques has been proposed recently that leverage smartphone sensors. Context awareness serves as the backbone of these localization techniques, which helps them to shift the localization technologies to improve efficiency and energy utilization. Indoor-outdoor (IO) context sensing plays a vital role for such systems, which serve both indoor and outdoor localization. IO systems work with collaborative technologies including the Global Positioning System (GPS), cellular tower signals, Wi-Fi, Bluetooth and a variety of smartphone sensors. GPS- and Wi-Fi-based systems are power hungry, and their accuracy is severed by limiting factors like multipath, shadowing, etc. On the other hand, various built-in smartphone sensors can be deployed for environmental sensing. Although these sensors can play a crucial role, yet they are very less studied. This research aims at investigating the use of ambient magnetic field data alone from a smartphone for IO detection. The research first investigates the feasibility of utilizing magnetic field data alone for IO detection and then extracts different features suitable for IO detection to be used in machine learning-based classifiers to discriminate between indoor and outdoor environments. The experiments are performed at three different places including a subway station, a shopping mall and Yeungnam University (YU), Korea. The training data are collected from one spot of the campus, and testing is performed with data from various locations of the above-mentioned places. The experiment involves Samsung Galaxy S8, LG G6 and Samsung Galaxy Round smartphones. The results show that the magnetic data from smartphone magnetic sensor embody enough information and can discriminate the indoor environment from the outdoor environment. Naive Bayes (NB) outperforms with a classification accuracy of 83.26%, as against Support vector machines (SVM), random induction (RI), gradient boosting machines (GBM), random forest (RF), k-nearest neighbor (kNN) and decision trees (DT), whose accuracies are 67.21%, 73.38%, 73.40%, 78.59%, 69.53% and 68.60%, respectively. kNN, SVM and DT do not perform well when noisy data are used for classification. Additionally, other dynamic scenarios affect the attitude of magnetic data and degrade the performance of SVM, RI and GBM. NB and RF prove to be more noise tolerant and environment adaptable and perform very well in dynamic scenarios. Keeping in view the performance of these classifiers, an ensemble-based stacking scheme is presented, which utilizes DT and RI as the base learners and naive Bayes as the ensemble classifier. This approach is able to achieve an accuracy of 85.30% using the magnetic data of the smartphone magnetic sensor. Moreover, with an increase in training data, the accuracy of the stacking scheme can be elevated by 0.83%. The performance of the proposed approach is compared with GPS-, Wi-Fi- and light sensor-based IO detection.

APA, Harvard, Vancouver, ISO, and other styles

30

Hadi, Dhea Agustina, and Dwi Agustin Nuriani Sirodj. "Metode Random Forest untuk Klasifikasi Penyakit Diabetes." Bandung Conference Series: Statistics 3, no. 2 (2023): 428–35. http://dx.doi.org/10.29313/bcss.v3i2.8354.

Full text

Abstract:

Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age. Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age.

APA, Harvard, Vancouver, ISO, and other styles

31

Kinasih, Agnes Nola Sekar, Anik Nur Handayani, Jevri Tri Ardiansah, and Nor Salwa Damanhuri. "Comparative analysis of decision tree and random forest classifiers for structured data classification in machine learning." Science in Information Technology Letters 5, no. 2 (2024): 13–24. https://doi.org/10.31763/sitech.v5i2.1746.

Full text

Abstract:

This study explores the application of machine learning techniques, specifically classification, to improve data analysis outcomes. The primary objective is to evaluate and compare the performance of Decision Tree and Random Forest classifiers in the context of a structured dataset. Using the Elbow Method for optimal clustering alongside decision tree and random forest for classification algorithms, this research investigates the effectiveness of each method in accurately categorizing data. The study employs K-Means clustering to segment the data and Decision Trees and Random Forests for classification tasks. Dataset used in this research was obtained from Kaggle consisting of 13 attributes and 1048575 rows, all of which are numeric. The key results show that Random Forest outperforms Decision Trees in terms of classification accuracy, precision, recall, and F1 score, providing a more robust model for data classification. The performance improvement observed in Random Forest, particularly in handling complex datasets, demonstrates its superiority in generalizing across varied classes. The findings suggest that for applications requiring high accuracy and reliability, Random Forest is preferable to Decision Trees, especially when the dataset exhibits high variability. This research contributes to a deeper understanding of how different machine learning models can be applied to real-world classification problems, offering insights into the selection of the most appropriate model based on specific data characteristics.

APA, Harvard, Vancouver, ISO, and other styles

32

Tarchoune, Ilhem, Akila Djebbar, and Hayet Farida Merouani. "Improving Random Forest with Pre-pruning technique for Binary classification." All Sciences Abstracts 1, no. 2 (2023): 11. http://dx.doi.org/10.59287/as-abstracts.1202.

Full text

Abstract:

Random Forest (RF) is a popular machine learning algorithm. It is based on the concept of ensemble learning, which is a process of combining several classifiers to solve a complex problem and improve model performance. The random forest allows extending the notions of decision trees (DT) in order to build more stable models. In this work we propose to further improve the predictions of the trees in the forest by a pre-pruning technique, which aims to optimize the performance of the nodes and to minimize the size of the trees. Two experiments are performed to evaluate the performance of the proposed method; in the first experiment we applied the Classical Random Forest algorithm (CRF) with several different trees. While in the second one, a pre-pruning technique is established on the trees in order to define the optimal size of the forest. Finally, we compared the results obtained. The main objective is to produce accurate decision trees with high precision. The effectiveness of the proposed method is validated on five medical databases; the prediction precision will be improved with 83%, 94%, 95%, 97%, and 81% for Diabetes, Hepatitis, SaHeart, EEG-Eye-State, Prostate-cancer databases respectively. The performance results confirm that the proposed method performs better than the classical random forest algorithm.

APA, Harvard, Vancouver, ISO, and other styles

33

Santana, Dthenifer Cordeiro, Gustavo de Faria Theodoro, Ricardo Gava, et al. "A New Approach to Identifying Sorghum Hybrids Using UAV Imagery Using Multispectral Signature and Machine Learning." Algorithms 17, no. 1 (2024): 23. http://dx.doi.org/10.3390/a17010023.

Full text

Abstract:

Using multispectral sensors attached to unmanned aerial vehicles (UAVs) can assist in the collection of morphological and physiological information from several crops. This approach, also known as high-throughput phenotyping, combined with data processing by machine learning (ML) algorithms, can provide fast, accurate, and large-scale discrimination of genotypes in the field, which is crucial for improving the efficiency of breeding programs. Despite their importance, studies aimed at accurately classifying sorghum hybrids using spectral variables as input sets in ML models are still scarce in the literature. Against this backdrop, this study aimed: (I) to discriminate sorghum hybrids based on canopy reflectance in different spectral bands (SB) and vegetation indices (VIs); (II) to evaluate the performance of ML algorithms in classifying sorghum hybrids; (III) to evaluate the best dataset input for the algorithms. A field experiment was carried out in the 2022 crop season in a randomized block design with three replications and six sorghum hybrids. At 60 days after crop emergence, a flight was carried out over the experimental area using the Sensefly eBee real time kinematic. The spectral bands (SB) acquired by the sensor were: blue (475 nm, B_475), green (550 nm, G_550), red (660 nm, R_660), Rededge (735 nm, RE_735) e NIR (790 nm, NIR_790). From the SB acquired, vegetation indices (VIs) were calculated. Data were submitted to ML classification analysis, in which three input settings (using only SB, using only VIs, and using SB + VIs) and six algorithms were tested: artificial neural networks (ANN), support vector machine (SVM), J48 decision trees (J48), random forest (RF), REPTree (DT) and logistic regression (LR, conventional technique used as a control). There were differences in the spectral signature of each sorghum hybrid, which made it possible to differentiate them using SBs and VIs. The ANN algorithm performed best for the three accuracy metrics tested, regardless of the input used. In this case, the use of SB is feasible due to the speed and practicality of analyzing the data, as it does not require calculations to perform the VIs. RF showed better accuracy when VIs were used as an input. The use of VIs provided the best performance for all the algorithms, as did the use of SB + VIs which provided good performance for all the algorithms except RF. Using ML algorithms provides accurate identification of the hybrids, in which ANNs using only SB and RF using VIs as inputs stand out (above 55 for CC, above 0.4 for kappa and around 0.6 for F-score). There were differences in the spectral signature of each sorghum hybrid, which makes it possible to differentiate them using wavelengths and vegetation indices. Processing the multispectral data using machine learning techniques made it possible to accurately differentiate the hybrids, with emphasis on artificial neural networks using spectral bands as inputs and random forest using vegetation indices as inputs.

APA, Harvard, Vancouver, ISO, and other styles

34

Salsabila, Alifia Salwa, Christy Atika Sari, and Eko Hari Rachmawanto. "Classification of Movie Recommendation on Netflix Using Random Forest Algorithm." Advance Sustainable Science Engineering and Technology 6, no. 3 (2024): 02403016. http://dx.doi.org/10.26877/asset.v6i3.676.

Full text

Abstract:

Netflix is one of the most popular streaming platforms in this world. So many movies and shows with various genres and production countries are available on this platform. Netflix has their own recommendation systems for the subscribers according to their data and algorithm. This research aims to compare two methods of data classifications using Decision Tree and Random Forest algorithm and make a recommendation system based on Netflix dataset. This paper use feature importance to selecting relevant feature and how n_estimators affect the classification. In this research, Random Forest with 50 trees estimator with 96.84% accuracy before feature selection and 96.92% accuracy after feature selection has the best accuracy compared to the Decision Tree classification. Besides, Decision Tree has only 95.64% accuracy before feature selection and increases to 96.07% accuracy after feature selection. Trees estimator also affect the accuracy of Random Forest classification. After comparing the results, Random Forest with 50 trees estimators using feature selection provides best accuracy and it will be used to predict some similar movies and shows recommendation

APA, Harvard, Vancouver, ISO, and other styles

35

Salman, Hasan Ahmed, Ali Kalakech, and Amani Steiti. "Random Forest Algorithm Overview." Babylonian Journal of Machine Learning 2024 (June 8, 2024): 69–79. http://dx.doi.org/10.58496/bjml/2024/007.

Full text

Abstract:

A random forest is a machine learning model utilized in classification and forecasting. To train machine learning algorithms and artificial intelligence models, it is crucial to have a substantial amount of high-quality data for effective data collecting. System performance data is essential for refining algorithms, enhancing the efficiency of software and hardware, evaluating user be-havior, enabling pattern identification, decision-making, predictive modeling, and problem-solving, ultimately resulting in improved effectiveness and accuracy. The integration of diverse data collecting and processing methods enhances precision and innovation in problem-solving. Utilizing diverse methodologies in interdisciplinary research streamlines the research process, fosters innovation, and enables the application of data analysis findings to pattern recognition, decision-making, predictive modeling, and problem-solving. This approach also encourages in-novation in interdisciplinary research. This technique utilizes the concept of decision trees, con-structing a collection of decision trees and aggregating their outcomes to generate the ultimate prediction. Every decision tree inside a random forest is constructed using random subsets of data, and each individual tree is trained on a portion of the whole dataset. Subsequently, the outcomes of all decision trees are amalgamated to derive the ultimate forecast. One of the bene-fits of random forests is their capacity to handle unbalanced data and variables with missing values. Additionally, it mitigates the issue of arbitrary variable selection seen by certain alterna-tive models. Furthermore, random forests mitigate the issue of overfitting by training several de-cision trees on random subsets of data, hence enhancing their ability to generalize to novel data. Random forests are highly regarded as one of the most efficient and potent techniques in the domain of machine learning. They find extensive use in various applications such as automatic categorization, data forecasting, and supervisory learning.

APA, Harvard, Vancouver, ISO, and other styles

36

Mustamin, Nurul Fathanah, Ariyani Buang, Firman Aziz, and Nur Hamdani Nur. "Ensemble Techniques Based Risk Classification for Maternal Health During Pregnancy." ILKOM Jurnal Ilmiah 16, no. 2 (2024): 190–97. http://dx.doi.org/10.33096/ilkom.v16i2.2005.190-197.

Full text

Abstract:

This research focuses on the critical aspect of maternal health during pregnancy, emphasizing the need for early detection and intervention to address potential risks to both mothers and infants. Leveraging various classification methods, including Naïve Bayes, decision trees, and ensemble learning techniques, the study investigates the prediction of childbirth potential and pregnancy risks. The research begins with data collection, followed by preprocessing to clean and prepare the data, including handling missing values and normalization. Next, cross-validation is performed to ensure model robustness. Five ensemble techniques are used for risk classification: Ensemble Boosted Trees, which enhances the performance of decision trees; Ensemble Bagged Trees, which combines predictions from decision trees trained on different subsets of data; Ensemble Subspace Discriminant, which applies discriminant analysis on random subspaces; Ensemble Subspace KNN, which uses K-Nearest Neighbors (KNN) within random subspaces; and Ensemble RUS Boosted Trees. Key variables such as maternal age, height, Hb levels, blood pressure, and previous pregnancy history are considered in these analyses. Additionally, the study introduces Ensemble Learning based on Classification Trees, revealing significant improvements in accuracy compared to cost-sensitive learning approaches. The comparison of methods, including Naïve Bayes and K-Nearest Neighbor, provides insights into their respective performances, with ensemble techniques demonstrating their potential. The proposed ensemble learning techniques, namely Ensemble Boosted Trees, Ensemble Bagging Trees, Ensemble Subspace Discriminant, Ensemble Subspace KNN, and Ensemble RUS Boosted Trees, are systematically evaluated in classifying pregnancy risks based on a comprehensive dataset of 1014 records. The results showcase Ensemble Bagging Trees as a standout performer, with an accuracy of 85.6%, indicating robust generalization and effectiveness in clinical risk assessment compared to traditional methods such as Decision Tree (61.54% accuracy), K-Nearest Neighbor (74.48%), Ensemble Learning based on Cost-Sensitive Learning (73%), Ensemble Learning based on Classification Tree (76%), Gaussian Naïve Bayes (82.6%), Multinomial Naïve Bayes (84.8%), and Bernoulli Naïve Bayes (84.8%). Ensemble Bagging Trees achieved the highest accuracy proving to be more effective than the other methods. However, the study emphasizes the need for continuous refinement and adaptation of ensemble methods, considering both accuracy and interpretability, for successful deployment in healthcare decision-making. These findings contribute valuable insights into optimizing pregnancy risk classification models, paving the way for improved maternal and infant healthcare outcomes.

APA, Harvard, Vancouver, ISO, and other styles

37

Cheng, Zening, Shuangbin Tao, Meiling Ge, Weiwei Xing, and Tianyu Li. "Research on classified prediction algorithm of platform customer value based on "tree" concept." Applied and Computational Engineering 71, no. 1 (2024): 199–205. http://dx.doi.org/10.54254/2755-2721/71/20241622.

Full text

Abstract:

The concept of "trees," representing decision trees, random forests, and XGBoost algorithms, has gained increasing attention in the field of classification prediction in recent years. These tree-based machine learning algorithms have been widely applied in the e-commerce sector. This paper discusses the decision tree, random forest, and XGBoost algorithms individually, and extracts useful classification rules from large volumes of online customer data to provide intelligent decision support for computer network customer management. The study finds that algorithms related to the "tree" concept can integrate a customers current value (such as purchasing behavior) with potential value (such as customer interest inferred from online reviews) through detailed classification criteria, thereby constructing a customer value measurement model. The evolution from decision trees to random forest and XGBoost algorithms has effectively promoted research on customer value hierarchy prediction, improving the efficiency of data mining in computer networks.

APA, Harvard, Vancouver, ISO, and other styles

38

Unal, Yavuz, Husamettin Kaplan, Yusuf Bektas, and Muhammed Bedirhan Caglar. "CLASSIFICATION OF RAISIN GRAINS VARIETY USING SOME MACHINE LEARNING METHODS." New Trends in Computer Sciences 1, no. 1 (2023): 62–69. http://dx.doi.org/10.3846/ntcs.2023.18015.

Full text

Abstract:

One of the agricultural crops with considerable nutritional and financial worth is raisins. Every year, the world produces and consumes millions of tons of raisins. In this work, machine learning was used to categorize two different raisin kinds that are grown in our nation. Machine learning techniques Decision Trees and Random Forest were used to classify the 2-class data set with 7 different attributes that were acquired as a ready-made data set. With 020 Random Forest and Decision Trees, classification accuracy was 85.44% and 85.22%, respectively, in the analyses that were conducted.

APA, Harvard, Vancouver, ISO, and other styles

39

Iqbal, Muhammad, Hendri Mahmud Nawawi, Muhammad Rezki, Abdul Hamid, and Sri Rahayu. "Pendekatan Algoritma Klasifikasi Machine Learning untuk Deteksi Penyakit Demensia." Computer Science (CO-SCIENCE) 3, no. 2 (2023): 94–99. http://dx.doi.org/10.31294/coscience.v3i2.1987.

Full text

Abstract:

Early detection of dementia through the use of machine learning classification algorithms is important for providing appropriate interventions to patients. In this context, this study aims to compare the performance of several machine learning classification algorithms in detecting dementia using the attribute selection method. In the early stages, patient data including medical history, cognitive test results, and other attributes were collected as input, an attribute selection algorithm was used to select the most informative attribute subset in detecting dementia. The subset of attributes used as input for training machine learning classification models, several classification algorithms such as Extra Trees (ET), Linear Discriminant Analysis (LDA), Random Forest (RF) and Ridge. In this study, accuracy is used as the main metric to compare algorithm performance. The evaluation results show that the Random Forest (RF) algorithm produces the best performance with an accuracy of 91.56%. The Extra Trees (ET) algorithm has an almost comparable accuracy of 91.44%, while Ridge and Linear Discriminant Analysis (LDA) have an accuracy of 90.44% respectively. In the context of dementia detection, the performance of the Random Forest algorithm with the attribute selection method proved to be the best with an accuracy of 91.56%. These results indicate that the developed model is capable of recognizing complex patterns and relationships between features that are relevant to dementia status. The use of the attribute selection method also contributes to increasing the accuracy and efficiency of the classification algorithm.

APA, Harvard, Vancouver, ISO, and other styles

40

Simarmata, Ivan Luis, and I. Wayan Supriana. "Music Genre Classification Using Random Forest Model." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 12, no. 1 (2023): 83. http://dx.doi.org/10.24843/jlk.2023.v12.i01.p10.

Full text

Abstract:

Music genre is a grouping of music based on their style. To group music into certain genres is a long and boring task to do manually because one must listen to each song individually and determine which genre does this song belong to. This process can be made automatic using classification models like Random Forest. The Random Forest model is a mutated version of the decision tree model, where Random Forest uses multiple decision trees to get a single result. In this paper the model that will be tested is the Random Forest model and XGB Classification model for comparison. The XGB Classification model is used to compare because it is similar to the Random Forest model. XGB Classification is a mutated decision tree model which uses CART as its tree. The results show that with the Random Forest model, an accuracy of 72% is achieved when all audio features are included, and with the XGB Classification, an accuracy of 73% is achieved with some audio features dropped.

APA, Harvard, Vancouver, ISO, and other styles

41

Hasan, Md, Md Islam, Sayma Khandaker, Norizam Sulaiman, Ashraful Islam, and Mirza Hossain. "Ensemble-based machine learning models for vehicle drivers’ fatigue state detection utilizing EEG signals." Facta universitatis - series: Electronics and Energetics 37, no. 4 (2024): 671–86. https://doi.org/10.2298/fuee2404671h.

Full text

Abstract:

Currently, there is a great extent of academic research focused on evaluating fatigue among drivers due to its growing recognition as a major contributor to vehicle tragedies. Combining advanced features and machine learning techniques, signals from the electroencephalogram (EEG) can be analyzed to efficiently detect fatigue in the shortest possible time. This study presents an innovative approach to detect driver fatigue states utilizing ensemble-based machine learning techniques from EEG signals. Two ensemble models (Ensemble-based RUSBoosted Decision Trees and Ensemblebased Random Subspace Discriminant) were applied and compared. The study utilized an online EEG dataset of 12 individuals, with data collected during normal and fatigued driving conditions and Fast Fourier Transform was applied for feature extraction. The Ensemble-based RUSBoosted Decision Trees model achieved superior performance with 98.53% classification accuracy, compared to 83.13% for the Random Subspace Discriminant model. Multiple performance metrics were used for evaluation model performance. Finally, the proposed Ensemble-based RUSBoosted Decision Trees model outperformed Ensemble-based Random Subspace Discriminant model and existing conventional methods for fatigue state detection. This research contributes to the development of more accurate and reliable fatigue detection systems, which could potentially improve road safety by identifying fatigued drivers in real-time.

APA, Harvard, Vancouver, ISO, and other styles

42

Szűcs, Gábor. "Random Response Forest for Privacy-Preserving Classification." Journal of Computational Engineering 2013 (November 14, 2013): 1–6. http://dx.doi.org/10.1155/2013/397096.

Full text

Abstract:

The paper deals with classification in privacy-preserving data mining. An algorithm, the Random Response Forest, is introduced constructing many binary decision trees, as an extension of Random Forest for privacy-preserving problems. Random Response Forest uses the Random Response idea among the anonymization methods, which instead of generalization keeps the original data, but mixes them. An anonymity metric is defined for undistinguishability of two mixed sets of data. This metric, the binary anonymity, is investigated and taken into consideration for optimal coding of the binary variables. The accuracy of Random Response Forest is presented at the end of the paper.

APA, Harvard, Vancouver, ISO, and other styles

43

Ramírez-Fernández, Salomón Einstein, and Iván Alberto Lizarazo-Salcedo. "Digital classification of cloud masses from weather imagery using machine learning algorithms." Revista Facultad de Ingeniería Universidad de Antioquia, no. 73 (August 26, 2014): 43–57. http://dx.doi.org/10.17533/udea.redin.17254.

Full text

Abstract:

Accurate identification of precipitating clouds is a challenging task. In the present work, Support Vector Machines, Decision Trees and Random Forests algorithms were applied to discriminate between precipitating clouds and non-precipitating clouds from a satellite weather image GOES-13 covering the Colombian territory. The objective of this study was to evaluate the performance of machine learning (ML) algorithms for digital classification of cloud masses in terms of thematic accuracy classification using the conventional Mahalanobis algorithm as benchmark. Results show that ML algorithms provide more accurate classification of cloud masses than conventional algorithms. The best accuracy was obtained using Random Forests (RF), with an overall thematic accuracy of 97%. Furthermore, the classification obtained with the RF algorithm was compared pixel-to-pixel with NASA Tropical Rainfall Measurement Mission (TRMM) rainfall estimates, obtaining an overall accuracy of 94%. ML algorithms can therefore be used to improve current precipitating clouds identification methods.

APA, Harvard, Vancouver, ISO, and other styles

44

Zenina, Nadezda, and Arkady Borisov. "Transportation Mode Choice Analysis Based on Classification Methods." Scientific Journal of Riga Technical University. Computer Sciences 45, no. 1 (2011): 49–53. http://dx.doi.org/10.2478/v10143-011-0041-2.

Full text

Abstract:

Transportation Mode Choice Analysis Based on Classification MethodsMode choice analysis has received the most attention among discrete choice problems in travel behavior literature. Most traditional mode choice models are based on the principle of random utility maximization derived from econometric theory. This paper investigates performance of mode choice analysis with classification methods - decision trees, discriminant analysis and multinomial logit. Experimental results have demonstrated satisfactory quality of classification.

APA, Harvard, Vancouver, ISO, and other styles

45

Resti, Yulia, Chandra Irsan, Jeremy Firdaus Latif, Irsyadi Yani, and Novi Rustiana Dewi. "A Bootstrap-Aggregating in Random Forest Model for Classification of Corn Plant Diseases and Pests." Science and Technology Indonesia 8, no. 2 (2023): 288–97. http://dx.doi.org/10.26554/sti.2023.8.2.288-297.

Full text

Abstract:

Control of diseases and pests of maize plants is a significant challenge to ensure global food security, self-sufficiency, and sustainable agriculture. Classification or early detection of diseases and pests of corn plants is intended to assist the control process. Random forest is a classification model in tree-based statistical learning in making decisions. This approach is an ensemble method that generates many decision trees and makes classification decisions based on the majority of trees selecting the same class. However, tree-based methods are often unstable when small changes or disturbances exist in the learning data. Such instability can produce significant variances and affect model performance. This study classifies diseases and pests of the corn plant using a random forest method based on bootstrap-aggregating. It fits multiple models of a single random forest, then combines the predictions from all models and determines the final result using majority voting. The results showed that the bootstrap aggregating could improve the classification of diseases and pests of maize using a random forest if the number of trees is optimal.

APA, Harvard, Vancouver, ISO, and other styles

46

Ignatenko, Vera, Anton Surkov, and Sergei Koltcov. "Random forests with parametric entropy-based information gains for classification and regression problems." PeerJ Computer Science 10 (January 3, 2024): e1775. http://dx.doi.org/10.7717/peerj-cs.1775.

Full text

Abstract:

The random forest algorithm is one of the most popular and commonly used algorithms for classification and regression tasks. It combines the output of multiple decision trees to form a single result. Random forest algorithms demonstrate the highest accuracy on tabular data compared to other algorithms in various applications. However, random forests and, more precisely, decision trees, are usually built with the application of classic Shannon entropy. In this article, we consider the potential of deformed entropies, which are successfully used in the field of complex systems, to increase the prediction accuracy of random forest algorithms. We develop and introduce the information gains based on Renyi, Tsallis, and Sharma-Mittal entropies for classification and regression random forests. We test the proposed algorithm modifications on six benchmark datasets: three for classification and three for regression problems. For classification problems, the application of Renyi entropy allows us to improve the random forest prediction accuracy by 19–96% in dependence on the dataset, Tsallis entropy improves the accuracy by 20–98%, and Sharma-Mittal entropy improves accuracy by 22–111% compared to the classical algorithm. For regression problems, the application of deformed entropies improves the prediction by 2–23% in terms of R2 in dependence on the dataset.

APA, Harvard, Vancouver, ISO, and other styles

47

Goudman, Lisa, Jean-Pierre Van Buyten, Ann De Smedt, et al. "Predicting the Response of High Frequency Spinal Cord Stimulation in Patients with Failed Back Surgery Syndrome: A Retrospective Study with Machine Learning Techniques." Journal of Clinical Medicine 9, no. 12 (2020): 4131. http://dx.doi.org/10.3390/jcm9124131.

Full text

Abstract:

Despite the proven clinical value of spinal cord stimulation (SCS) for patients with failed back surgery syndrome (FBSS), factors related to a successful SCS outcome are not yet clearly understood. This study aimed to predict responders for high frequency SCS at 10 kHz (HF-10). Data before implantation and the last available data was extracted for 119 FBSS patients treated with HF-10 SCS. Correlations, logistic regression, linear discriminant analysis, classification and regression trees, random forest, bagging, and boosting were applied. Based on feature selection, trial pain relief, predominant pain location, and the number of previous surgeries were relevant factors for predicting pain relief. To predict responders with 50% pain relief, 58.33% accuracy was obtained with boosting, random forest and bagging. For predicting responders with 30% pain relief, 70.83% accuracy was obtained using logistic regression, linear discriminant analysis, boosting, and classification trees. For predicting pain medication decrease, accuracies above 80% were obtained using logistic regression and linear discriminant analysis. Several machine learning techniques were able to predict responders to HF-10 SCS with an acceptable accuracy. However, none of the techniques revealed a high accuracy. The inconsistent results regarding predictive factors in literature, combined with acceptable accuracy of the currently obtained models, might suggest that routinely collected baseline parameters from clinical practice are not sufficient to consistently predict the SCS response with a high accuracy in the long-term.

APA, Harvard, Vancouver, ISO, and other styles

48

Maulana, Bima, Dany Febrian, Irgie Rachmat Fachrezi, and Muhammad Ferdi Zeen. "Comparison of Support Vector Machine, Random Forest, and C4.5 Algorithms for Customer Loss Prediction." IJATIS: Indonesian Journal of Applied Technology and Innovation Science 2, no. 1 (2025): 1–6. https://doi.org/10.57152/ijatis.v2i1.1102.

Full text

Abstract:

Loss of customers has been discussed and many studies have been conducted, starting from using the Bayesian network algorithm, Decision tree, random vorest, Support vector machine, and neyral network Algorithms Support Vector Machine (SVM), Random Forest, and Decision Tree or C4.5 are algorithms used for prediction and have several advantages Random forest has the advantage of being able to combine many predictions from decision trees that have a tendency to reduce overfitting. This research uses the C4.5 algorithm, SVM and random forest. Research shows that the Random Forest method has the highest accuracy of 87.02% compared to the Support Vector Machine and Decision Tree methods. In contrast, Decision Tree gets low accuracy results with a value of 78.52%. Experimental results show that the Random forest method for customer loss prediction achieves an average classification accuracy of 4% - 9% higher than the Support Vector Machine and Decision Tree methods.

APA, Harvard, Vancouver, ISO, and other styles

49

I., Dwaraka Srihith, Vijaya Lakshmi P., David Donald A., Aditya Sai Srinivas T., and Thippanna G. "A Forest of Possibilities: Decision Trees and Beyond." Journal of Advancement in Parallel Computing 6, no. 3 (2023): 29–37. https://doi.org/10.5281/zenodo.8372196.

Full text

Abstract:

<em>Decision trees are fundamental in machine learning due to their interpretability and versatility. They are hierarchical structures used for classification and regression tasks, making decisions by recursively splitting data based on features. This abstract explores decision tree algorithms, tree construction, pruning to prevent overfitting, and ensemble methods like Random Forests. Additionally, it covers handling categorical data, imbalanced datasets, missing values, and hyperparameter tuning. Decision trees are valuable for feature selection and model interpretability. However, they have drawbacks, such as overfitting and sensitivity to data variations. Nevertheless, they find applications in fields like finance, medicine, and natural language processing, making them a critical topic in machine learning.</em>  

APA, Harvard, Vancouver, ISO, and other styles

50

Wu, Hao. "Solder joint defect classification based on ensemble learning." Soldering & Surface Mount Technology 29, no. 3 (2017): 164–70. http://dx.doi.org/10.1108/ssmt-08-2016-0016.

Full text

Abstract:

Purpose This paper aims to inspect the defects of solder joints of printed circuit board in real-time production line, simple computing and high accuracy are primary consideration factors for feature extraction and classification algorithm. Design/methodology/approach In this study, the author presents an ensemble method for the classification of solder joint defects. The new method is based on extracting the color and geometry features after solder image acquisition and using decision trees to guarantee the algorithm’s running executive efficiency. To improve algorithm accuracy, the author proposes an ensemble method of random forest which combined several trees for the classification of solder joints. Findings The proposed method has been tested using 280 samples of solder joints, including good and various defect types, for experiments. The results show that the proposed method has a high accuracy. Originality/value The author extracted the color and geometry features and used decision trees to guarantee the algorithm's running executive efficiency. To improve the algorithm accuracy, the author proposes using an ensemble method of random forest which combined several trees for the classification of solder joints. The results show that the proposed method has a high accuracy.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!