Academic literature on the topic 'Decision classification trees discriminant random forest'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Decision classification trees discriminant random forest.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Decision classification trees discriminant random forest"

1

Koreň, Milan, Rastislav Jakuš, Martin Zápotocký, et al. "Assessment of Machine Learning Algorithms for Modeling the Spatial Distribution of Bark Beetle Infestation." Forests 12, no. 4 (2021): 395. http://dx.doi.org/10.3390/f12040395.

Full text
Abstract:
Machine learning algorithms (MLAs) are used to solve complex non-linear and high-dimensional problems. The objective of this study was to identify the MLA that generates an accurate spatial distribution model of bark beetle (Ips typographus L.) infestation spots. We first evaluated the performance of 2 linear (logistic regression, linear discriminant analysis), 4 non-linear (quadratic discriminant analysis, k-nearest neighbors classifier, Gaussian naive Bayes, support vector classification), and 4 decision trees-based MLAs (decision tree classifier, random forest classifier, extra trees classifier, gradient boosting classifier) for the study area (the Horní Planá region, Czech Republic) for the period 2003–2012. Each MLA was trained and tested on all subsets of the 8 explanatory variables (distance to forest damage spots from previous year, distance to spruce forest edge, potential global solar radiation, normalized difference vegetation index, spruce forest age, percentage of spruce, volume of spruce wood per hectare, stocking). The mean phi coefficient of the model generated by extra trees classifier (ETC) MLA with five explanatory variables for the period was significantly greater than that of most forest damage models generated by the other MLAs. The mean true positive rate of the best ETC-based model was 80.4%, and the mean true negative rate was 80.0%. The spatio-temporal simulations of bark beetle-infested forests based on MLAs and GIS tools will facilitate the development and testing of novel forest management strategies for preventing forest damage in general and bark beetle outbreaks in particular.
APA, Harvard, Vancouver, ISO, and other styles
2

Gómez, Jorge Gómez, Urueta Camilo Parra, Daniel Salas Álvarez, Riaño Velssy Hernández, and Gustavo Ramirez-Gonzalez. "Anemia Classification System Using Machine Learning." Informatics 12, no. 1 (2025): 19. https://doi.org/10.3390/informatics12010019.

Full text
Abstract:
In this study, a system was developed to predict anemia using blood count data and supervised learning algorithms. Anemia, a common condition characterized by low levels of red blood cells or hemoglobin, affects oxygenation and often causes symptoms, such as fatigue and shortness of breath. The diagnosis of anemia often requires laboratory tests, which can be challenging in low-resource areas where anemia is common. We built a supervised learning approach and trained three models (Linear Discriminant Analysis, Decision Trees, and Random Forest) using an anemia dataset from a previous study by Sabatini in 2022. The Random Forest model achieved an accuracy of 99.82%, highlighting its capability to subclassify anemia types (microcytic, normocytic, and macrocytic) with high precision, which is a novel advancement compared to prior studies limited to binary classification (presence/absence of anemia) of the same dataset.
APA, Harvard, Vancouver, ISO, and other styles
3

Asia, Mahdi Naser Alzubaidi, and Salih Al-Shamery Eman. "Projection pursuit Random Forest using discriminant feature analysis model for churners prediction in telecom industry." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 1406–21. https://doi.org/10.11591/ijece.v10i2.pp1406-1421.

Full text
Abstract:
A major and demand issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who attrites from the current provider to competitors searching for better service offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial than gain newly recruited customers. Researchers and practitioners are paying great attention to developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) while, the second method used Linear Discriminant Analysis (LDA) to achieve linear splitting of variables during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It is found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Lift, H-measure, AUC. Moreover, PPForest based on LDA delivers effective evaluators in the prediction model
APA, Harvard, Vancouver, ISO, and other styles
4

Manisha Sharma. "Improving the Accuracy of Epileptic Seizure Detection through EEG Analysis: A Comprehensive Classification Strategy." Journal of Information Systems Engineering and Management 10, no. 28s (2025): 77–85. https://doi.org/10.52783/jisem.v10i28s.4299.

Full text
Abstract:
Epilepsy is a neurological disorder which impacts millions globally and continues to be a major public health challenge. The prompt identification of epileptic seizures is essential for effective treatment. In this study, we present an innovative methodology designed to enhance the accuracy of seizure detection through EEG data analysis. Our strategy involves creating a comprehensive EEG database that includes both healthy individuals and those experiencing seizures (ictal). We utilize a diverse range of classification models, including random forests, decision trees, XGBoost and k-nearest neighbors algorithm. For feature extraction, we have selected Linear Discriminant Analysis (LDA) as our preferred technique. The experimental results indicate that the random forest model is the most effective, achieving a perfect accuracy rate of 100% in detecting epileptic seizures. The decision tree model follows closely with an accuracy of 90.00%. Although the kNN algorithm has a slightly lower accuracy of 82.50%, it still plays a significant role in differentiating between normal and ictal EEG signals. Our results clearly demonstrate the effectiveness of our proposed method in reliably extracting spatial and temporal information from multi-channel EEG data, enabling accurate classification of epileptic seizures. This research highlights the robustness of our feature extraction approach and its potential to improve early diagnosis and treatment of epilepsy.
APA, Harvard, Vancouver, ISO, and other styles
5

Akanbi, Olatunde David, Taiwo Mercy Faloni, and Sunday Olaniyi. "Prediction of Wine Quality: Comparing Machine Learning Models in R Programming." International Journal of Latest Technology in Engineering, Management & Applied Science 11, no. 09 (2022): 01–06. http://dx.doi.org/10.51583/ijltemas.2022.11901.

Full text
Abstract:
The consideration of wine quality before consumption or use is not a new decision scheme across ages, fields, and people. Gone were the days when quality of wine solely depended on taste or other physical checks. In this age of data science and machine learning, we can make decisions on the best wine quality with reference to different features/variables. This work was done with in predicting the dependent variable while using existing models to analyze the independent variables. This work utilizes the R programming language for this prediction, while comparing different machine learning models like Linear regression, Neural network, Naive Bayes Classification, Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF). The provided data was divided into the testing and training portions with parts for validation. It was achieved that Random Forest has a better model for this prediction when cross cross-validated in 10-folds. The accuracy was then used to select the optimal model. Hence, alcohol is the feature variable that contributes more to wine quality while volatile acidity and chloride contribute the least to the quality of wine. This would assist breweries in determining the right additions and subtraction when wine quality is in question
APA, Harvard, Vancouver, ISO, and other styles
6

Krishnarjun, Bora, Pratim Barman Manash, N. Patowary Arnab, and Bora Toralima. "Classification of Assamese Folk Songs' Melody using Supervised Learning Techniques." Indian Journal of Science and Technology 16, no. 2 (2023): 89–96. https://doi.org/10.17485/IJST/v16i2.1686.

Full text
Abstract:
ABSTRACT <strong>Objectives:</strong>&nbsp;A melody is made up of several musical notes or pitches that are joined together to form one whole. This experiment aims to develop four models based on the Mel- frequency Cepstral Coefficients (MFCC) to classify the melodies played on harmonium corresponding to five different class of Assamese folk Music.&nbsp;<strong>Methods:</strong>&nbsp;The melodies of five different categories of Assamese folk songs are selected for classification. With the help of expert musicians, these melodies are played in harmonium and audio samples are recorded in the same acoustic environment. 20 MFCC&rsquo;s are extracted from each of the samples and classification of the melodies is done using four supervised learning techniques- Decision Tree Classifier, Linear Discriminant Analysis (LDA), Random Forest Classifier, and Support Vector Machine (SVM).&nbsp;<strong>Findings:</strong>&nbsp;The performance of the fitted models are evaluated using different evaluation techniques and presented. A maximum of 94.17% average accuracy score is achieved under Support Vector Machine. The average accuracy scores of Decision Tree Classifier, Linear Discriminant Analysis (LDA), and Random Forest Classifier are 73.58%, 85.58%, and 86.11% respectively. The models are developed based on 250 samples (50 from each type). However, increasing the training sample size, there is a possibility to improve the performances of the other three models also.&nbsp;<strong>Novelty:</strong>&nbsp;The developed approach for identifying the melodies is based on computational techniques. This work will certainly provide a basis for conducting further computational studies in folk music for any community.&nbsp;
APA, Harvard, Vancouver, ISO, and other styles
7

Njimi, Houssem, Nesrine Chehata, and Frédéric Revers. "Fusion of Dense Airborne LiDAR and Multispectral Sentinel-2 and Pleiades Satellite Imagery for Mapping Riparian Forest Species Biodiversity at Tree Level." Sensors 24, no. 6 (2024): 1753. http://dx.doi.org/10.3390/s24061753.

Full text
Abstract:
Multispectral and 3D LiDAR remote sensing data sources are valuable tools for characterizing the 3D vegetation structure and thus understanding the relationship between forest structure, biodiversity, and microclimate. This study focuses on mapping riparian forest species in the canopy strata using a fusion of Airborne LiDAR data and multispectral multi-source and multi-resolution satellite imagery: Sentinel-2 and Pleiades at tree level. The idea is to assess the contribution of each data source in the tree species classification at the considered level. The data fusion was processed at the feature level and the decision level. At the feature level, LiDAR 2D attributes were derived and combined with multispectral imagery vegetation indices. At the decision level, LiDAR data were used for 3D tree crown delimitation, providing unique trees or groups of trees. The segmented tree crowns were used as a support for an object-based species classification at tree level. Data augmentation techniques were used to improve the training process, and classification was carried out with a random forest classifier. The workflow was entirely automated using a Python script, which allowed the assessment of four different fusion configurations. The best results were obtained by the fusion of Sentinel-2 time series and LiDAR data with a kappa of 0.66, thanks to red edge-based indices that better discriminate vegetation species and the temporal resolution of Sentinel-2 images that allows monitoring the phenological stages, helping to discriminate the species.
APA, Harvard, Vancouver, ISO, and other styles
8

Khan, Haroon, Farzan M. Noori, Anis Yazidi, Md Zia Uddin, M. N. Afzal Khan, and Peyman Mirtaheri. "Classification of Individual Finger Movements from Right Hand Using fNIRS Signals." Sensors 21, no. 23 (2021): 7943. http://dx.doi.org/10.3390/s21237943.

Full text
Abstract:
Functional near-infrared spectroscopy (fNIRS) is a comparatively new noninvasive, portable, and easy-to-use brain imaging modality. However, complicated dexterous tasks such as individual finger-tapping, particularly using one hand, have been not investigated using fNIRS technology. Twenty-four healthy volunteers participated in the individual finger-tapping experiment. Data were acquired from the motor cortex using sixteen sources and sixteen detectors. In this preliminary study, we applied standard fNIRS data processing pipeline, i.e., optical densities conversation, signal processing, feature extraction, and classification algorithm implementation. Physiological and non-physiological noise is removed using 4th order band-pass Butter-worth and 3rd order Savitzky–Golay filters. Eight spatial statistical features were selected: signal-mean, peak, minimum, Skewness, Kurtosis, variance, median, and peak-to-peak form data of oxygenated haemoglobin changes. Sophisticated machine learning algorithms were applied, such as support vector machine (SVM), random forests (RF), decision trees (DT), AdaBoost, quadratic discriminant analysis (QDA), Artificial neural networks (ANN), k-nearest neighbors (kNN), and extreme gradient boosting (XGBoost). The average classification accuracies achieved were 0.75±0.04, 0.75±0.05, and 0.77±0.06 using k-nearest neighbors (kNN), Random forest (RF) and XGBoost, respectively. KNN, RF and XGBoost classifiers performed exceptionally well on such a high-class problem. The results need to be further investigated. In the future, a more in-depth analysis of the signal in both temporal and spatial domains will be conducted to investigate the underlying facts. The accuracies achieved are promising results and could open up a new research direction leading to enrichment of control commands generation for fNIRS-based brain-computer interface applications.
APA, Harvard, Vancouver, ISO, and other styles
9

Atish, S. Tangawade, and A. Muley Aniket. "Classification of Parkinson's Disease Data Using Traditional and Advanced Data Mining Techniques." Indian Journal of Science and Technology 17, no. 11 (2024): 1043–50. https://doi.org/10.17485/IJST/v17i11.3059.

Full text
Abstract:
Abstract <strong>Objectives:</strong>&nbsp;(1) To apply various traditional classification tools, (2) To check effectiveness of the classifiers to the Parkinson Dataset (3) To use boosting classification tools and (4) Compare performance of all used classification tools and find the best accuracy classifier algorithm. Thus, the main aim of the study is to discriminate healthy people from those with PD.&nbsp;<strong>Methods:</strong>&nbsp;The methodology of this study is categorised into three stages:(1) Preprocessing and feature selection; (2) Application of classifiers; (3) Comparative study. We have used secondary dataset of voice recordings originally collected by University of Oxford by Max Little. In first step, the voice data of PD patients is collected for analysis. Then the collected data is normalized using min-max normalization followed by feature extraction. Thus, uses classification Data Mining Techniques viz., KNN, Logistic Regression, Decision Tree, SVM, Random Forest and boosting algorithm etc. to predict whether the person is healthy or has Parkinson&rsquo;s disease. Finally, comparative analysis is made based on the accuracy provided by different data mining models.&nbsp;<strong>Findings:</strong>&nbsp;Results of our study reveals that GB algorithm is more accurate as compared with other models. It gives the highest accuracy, so that we recommend this algorithm to deal similar kind of studies in the future. These models are very useful in better and exact medical diagnosis and decision making. It is also found that, proposed methods are fully computerized and produce enhanced performance hence can be recommended for similar studies. Here, it is observed that Gradient Boost algorithm provide the best accuracy (100% for training and 92.02% for testing, 98.46% overall).&nbsp;<strong>Novelty:</strong>&nbsp;We have used boosting classification model for the classification of Parkinson&rsquo;s disease. Our proposed method is one such good example giving faster and more accurate results for the classification of Parkinson&rsquo;s disease patients with excellent accuracy. We have also compared the results with other existing approaches like linear discriminant analysis, support vector machine, K-nearest neighbour, decision tree, classification and regression trees, random forest, linear regression, logistic regression and Naive Bayes, but our proposed techniques were superior to existing studies in which gradient boost algorithm yielded an accuracy of 98.46%, so our method can be used as an effective means of computer-aided diagnosis of PD, and has important practical value. <strong>Keywords:</strong> Data Mining, Parkinson's Disease, Classification, Boosting Algorithms, Feature Selection
APA, Harvard, Vancouver, ISO, and other styles
10

Elisabeth, Thomas, Saji Arjun, M. S. Aswin, Salas Augustine, and Viju Emil. "A Comprehensive Review of Advancing Cattle Monitoring and Behavior Classification using Deep Learning." International Journal on Emerging Research Areas (IJERA) 04, no. 02 (2025): 7–12. https://doi.org/10.5281/zenodo.14642932.

Full text
Abstract:
This paper explores the application of deep learning and image processing techniques for cattle disease detection and pose estimation, drawing insights from various research papers. The use of wearable sensors embedded in collars emerges as a prominent method for monitoring cattle behavior and health. These sensors, particularly accelerometers, effectively capture movement data, facilitating the identification of behaviors like grazing, resting, walking, and ruminating. Several studies utilize supervised machine learning algorithms such as Random Forest, Decision Trees, and Linear Discriminant Analysis to classify these behaviors with high accuracy. Further, deep learning models, especially Convolutional Neural Networks (CNNs), demonstrate remarkable capabilities in detecting specific cattle diseases.YOLOv5, known for its speed and accuracy, proves effective in cattle detection.Image preprocessing techniques, including grayscale conversion, noise removal, and data augmentation, enhance the accuracy and robustness of these models. Additionally,pose estimation techniques like OpenPifPaf, combined with angle calculations between joints, provide valuable insights into cattle posture and aid in the early detection of lameness. The integration of these advanced technologies presents a significant opportunity to advance precision livestock farming practices. Early disease detection and efficient behavior monitoring can contribute to improved animal welfare, optimized farm management, and enhanced productivity in the cattle industry.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Decision classification trees discriminant random forest"

1

Tandan, Isabelle, and Erika Goteman. "Bank Customer Churn Prediction : A comparison between classification and evaluation methods." Thesis, Uppsala universitet, Statistiska institutionen, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-411918.

Full text
Abstract:
This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base.   The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages.   For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results.
APA, Harvard, Vancouver, ISO, and other styles
2

Кичигіна, Анастасія Юріївна. "Прогнозування ІМТ за допомогою методів машинного навчання". Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2020. https://ela.kpi.ua/handle/123456789/37413.

Full text
Abstract:
Дипломна робота містить : 100 с., 17 табл., 16 рис., 2 дод. та 24 джерела. Об’єктом дослідження є індекс маси тіла людини. Предметом дослідження є методи машинного навчання – регресійні моделі, ансамблева модель випадковий ліс та нейронна мережа. В даній роботі проведено дослідження залежності індексу маси тіла людини та наявності надмірної маси тіла від харчових та побутових звичок. Для побудови дослідження були використані методи машинного навчання та аналізу даних, проведено роботу для визначення можливостей по покращенню роботи стандартних моделей та визначено кращу модель для реалізації прогнозування та класифікації на основі наведених даних. Напрямок роботи є в понижені розмірності простору ознак, відбору кращих спостережень з валідними даним для кращої роботи моделей, а також у комбінуванні різних методів навчання та отриманні більш ефективних ансамблевих моделей.<br>Thesis: 100 p., 17 tabl., 16 fig., 2 add. and 24 references. The object of the study is the human body mass index. The subject of research is machine learning methods - regression models, ensemble model random forest and neural network. In this paper, a study of the dependence of the human body mass index and the presence of excess body weight on eating and living habits. To build the study, the methods of machine learning and data analysis were used, work was done to identify opportunities to improve the performance of standard models and identified the best model for the implementation of predicting and classification based on the data. The direction of work is in the reduced dimensions of the feature space, selection of the best observations with valid data for better performance of models, as well as in combining different teaching methods and obtaining more effective ensemble models.
APA, Harvard, Vancouver, ISO, and other styles
3

Alves, Ana Sofia Tavares Jordão. "Time series classification for device fingerprinting: internship project at a telecommunications and technology company." Master's thesis, 2021. http://hdl.handle.net/10362/112035.

Full text
Abstract:
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics<br>The telecommunication service providers seek an accurate insight into the devices that are connected within a home network, in order to provide a better in-home experience. In this way, the goal of the internship was to develop a machine learning model for fingerprinting of Amazon devices. This can be translated to a timeseries binary classification problem and assumes an exploration background of understanding the employment of bytes received by the router over time as an indicator of the internet usage to detect the Amazon devices. A feature-based analysis was conducted to make it possible to apply the most common and simple classifiers, which is relevant within a company context. The available data presented some challenges, namely a high imbalance and number of missing values. For this, it was studied several combinations of different techniques to increase the importance of the minority class and to impute the unknown values. In addition, multiple models were trained, whose results were evaluated and compared. The achieved performance of the best model was not considered satisfactory to correctly identify the Amazon devices, which lead to the conclusion that other approaches, algorithms and/or variable(s) need to be considered in a future iteration. The project contributed to a better understanding of the path to take on the identification of the devices and introduced new approaches and reasoning when dealing with similar data as the timeseries in analysis.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Decision classification trees discriminant random forest"

1

El-Nasr, Magy Seif, Truong Huy Nguyen Dinh, Alessandro Canossa, and Anders Drachen. "Supervised Learning in Game Data Science." In Game Data Science. Oxford University Press, 2021. http://dx.doi.org/10.1093/oso/9780192897879.003.0007.

Full text
Abstract:
This chapter discusses several classification and regression methods that can be used with game data. Specifically, we will discuss regression methods, including Linear Regression, and classification methods, including K-Nearest Neighbor, Naïve Bayes, Logistic Regression, Linear Discriminant Analysis, Support Vector Machines, Decisions Trees, and Random Forests. We will discuss how you can setup the data to apply these algorithms, as well as how you can interpret the results and the pros and cons for each of the methods discussed. We will conclude the chapter with some remarks on the process of application of these methods to games and the expected outcomes. The chapter also includes practical labs to walk you through the process of applying these methods to real game data.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang Qing, Zhang Liang, Chi Mingmin, and Guo Jiankui. "MTForest: Ensemble Decision Trees based on Multi-Task Learning." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2008. https://doi.org/10.3233/978-1-58603-891-5-122.

Full text
Abstract:
Many ensemble methods, such as Bagging, Boosting, Random Forest, etc, have been proposed and widely used in real world applications. Some of them are better than others on noise-free data while some of them are better than others on noisy data. But in reality, ensemble methods that can consistently gain good performance in situations with or without noise are more desirable. In this paper, we propose a new method namely MTForest, to ensemble decision tree learning algorihms by enumerating each input attribute as extra task to introduce different additional inductive bias to generate diverse yet accurate component decision tree learning algorithms in the ensemble. The experimental results show that in situations without classification noise, MTForest is comparable to Boosting and Random Forest and significantly better than Bagging, while in situations with classification noise, MTForest is significantly better than Boosting and Random Forest and is slightly better than Bagging. So MTForest is a good choice for ensemble decision tree learning algorithms in situations with or without noise. We conduct the experiments on the basis of 36 widely used UCI data sets that cover a wide range of domains and data characteristics and run all the algorithms within the Weka platform.
APA, Harvard, Vancouver, ISO, and other styles
3

Pascual-Fontanilles, Jordi, Lenka Lhotska, Antonio Moreno, and Aida Valls. "Adapting a Fuzzy Random Forest for Ordinal Multi-Class Classification." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2022. http://dx.doi.org/10.3233/faia220336.

Full text
Abstract:
Fuzzy Random Forests are well-known Machine Learning ensemble methods. They combine the outputs of multiple Fuzzy Decision Trees to improve the classification performance. Moreover, they can deal with data uncertainty and imprecision thanks to the use of fuzzy logic. Although many classification tasks are binary, in some situations we face the problem of classifying data into a set of ordered categories. This is a particular case of multi-class classification where the order between the classes is relevant, for example in medical diagnosis to detect the severity of a disease. In this paper, we explain how a binary Fuzzy Random Forest may be adapted to deal with ordinal classification. The work is focused on the prediction stage, not on the construction of the fuzzy trees. When a new instance arrives, the rules activation is done with the usual fuzzy operators, but the aggregation of the outputs given by the different rules and trees has been redefined. In particular, we present a procedure for managing the conflicting cases where different classes are predicted with similar support. The support of the classes is calculated using the OWA operator that permits to model the concept of majority agreement.
APA, Harvard, Vancouver, ISO, and other styles
4

Pascual-Fontanilles, Jordi, Aida Valls, Antonio Moreno, and Pedro Romero-Aroca. "Iterative Update of a Random Forest Classifier for Diabetic Retinopathy." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2021. http://dx.doi.org/10.3233/faia210136.

Full text
Abstract:
Random Forests are well-known Machine Learning classification mechanisms based on a collection of decision trees. In the last years, they have been applied to assess the risk of diabetic patients to develop Diabetic Retinopathy. The results have been good, despite the unbalance of data between classes and the inherent ambiguity of the problem (patients with similar data may belong to different classes). In this work we propose a new iterative method to update the set of trees in the Random Forest by considering trees generated from the data of the new patients that are visited in the medical centre. With this method, it has been possible to improve the results obtained with standard Random Forests.
APA, Harvard, Vancouver, ISO, and other styles
5

Xu, Ning, Jiangping Wang, Guojun Qi, Thomas S. Huang, and Weiyao Lin. "Ontological Random Forests for Image Classification." In Computer Vision. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5204-8.ch031.

Full text
Abstract:
Previous image classification approaches mostly neglect semantics, which has two major limitations. First, categories are simply treated independently while in fact they have semantic overlaps. For example, “sedan” is a specific kind of “car”. Therefore, it's unreasonable to train a classifier to distinguish between “sedan” and “car”. Second, image feature representations used for classifying different categories are the same. However, the human perception system is believed to use different features for different objects. In this paper, we leverage semantic ontologies to solve the aforementioned problems. The authors propose an ontological random forest algorithm where the splitting of decision trees are determined by semantic relations among categories. Then hierarchical features are automatically learned by multiple-instance learning to capture visual dissimilarities at different concept levels. Their approach is tested on two image classification datasets. Experimental results demonstrate that their approach not only outperforms state-of-the-art results but also identifies semantic visual features.
APA, Harvard, Vancouver, ISO, and other styles
6

Vijaya Lakshmi, Adluri, Sowmya Gudipati Sri, Ponnuru Sowjanya, and K. Vedavathi. "Prediction using Machine Learning." In Handbook of Artificial Intelligence. BENTHAM SCIENCE PUBLISHERS, 2023. http://dx.doi.org/10.2174/9789815124514123010005.

Full text
Abstract:
This chapter begins with a concise introduction to machine learning and the classification of machine learning systems (supervised learning, unsupervised learning, and reinforcement learning). ‘Breast Cancer Prediction Using ML Techniques’ is the topic of Chapter 2. This chapter describes various breast cancer prediction algorithms, including convolutional neural networks (CNN), support vector machines, Nave Bayesian classification, and weighted Nave Bayesian classification. Prediction of Heart Disease Using Machine Learning Techniques is the topic of Chapter 3. This chapter describes the numerous heart disease prediction algorithms, including Support Vector Machines (SVM), Logistic Regression, KNN, Random Forest Classifier, and Deep Neural Networks. Prediction of IPL Data Using Machine Learning Techniques is the topic of Chapter 4. The following algorithms are covered in this chapter: decision trees, naive bayes, K-Nearest Neighbour Random Forest, data mining techniques, fuzzy clustering logic, support vector machines, reinforcement learning algorithms, data analytics approaches and Bayesian prediction techniques. Chapter Five: Software Error Prediction by means of machine learning- The AR model and the Known Power Model (POWM), as well as artificial neural networks (ANNs), particle swarm optimisation (PSO), decision trees (DT), Nave Bayes (NB), and linear classifiers, are among the approaches (K-nearest neighbours, Nave Bayes, C-4.5, and decision trees) Prediction of Rainfall Using Machine Learning Techniques, Chapter 6: The following are discussed: LASSO (Least Absolute Shrinkage and Selection Operator) Regression, ANN (Artificial Neural Network), Support Vector Machine, Multi-Layer Perception, Decision Tree, Adaptive Neuro-Fuzzy Inference System, Wavelet Neural Network, Ensemble Prediction Systems, ARIMA model, PCA and KMeans algorithms, Recurrent Neural Network (RNN), statistical KNN classifier, and neural SOM Weather Prediction Using Machine Learning Techniques that includes Bayesian Networks, Linear Regression, Logistic Regression, KNN Decision Tree, Random Forest, K-Means, and Apriori's Algorithm, as well as Linear Regression, Polynomial Regression, Random Forest Regression, Artificial Neural Networks, and Recurrent Neural Networks.
APA, Harvard, Vancouver, ISO, and other styles
7

A.C. Minho, Lucas, Bárbara E.A. de Magalhães, and Alexandre G.M. de Freitas. "Potential Use of Tree-based Tools for Chemometric Analysis of Infrared Spectra." In Advances in Computing Communications and Informatics. BENTHAM SCIENCE PUBLISHERS, 2022. http://dx.doi.org/10.2174/9789815040401122030005.

Full text
Abstract:
One of the most elegant and versatile techniques of machine learning is the decision tree. The decision tree is a simple tool to predict and explain the relationship between the object and the target value, recursively partitioning the input space. Tree ensembles such as random forest and gradient boosting trees significantly improve the predictive power of supervised models based on tree weak predictors. In a random forest, the generalized error that is included in the model prediction is dependent on the correlation strength between the trees and the individual predictors' quality. The random selection of features in each node split is at the core of random forest, which makes it as effective as other complex machine learning techniques while having a lower computational cost, which is appealing in the analysis of large data matrices such as those generated by infrared spectroscopy because most analysts do not have computers with high processing capacity for implementing those complex models. Also, techniques based on the decision tree are more robust to noise, which is preferable for the analysis of trace level contaminants. In this chapter, we present the techniques based on decision trees and apply them to solve problems related to classification, regression, and feature selection in spectra obtained experimentally and provided by public repositories. Comparisons of the performance obtained with techniques based on the decision tree in relation to other chemometric tools are also performed.
APA, Harvard, Vancouver, ISO, and other styles
8

Mondal, Anoushka, and Sudhanshu Sudhakar Dubey. "Machine Learning-based Water Potability Prediction: Model Evaluation, and Hyperparameter Optimization." In Advancements in Communication and Systems. Soft Computing Research Society, 2024. http://dx.doi.org/10.56155/978-81-955020-7-3-4.

Full text
Abstract:
This research aims to predict water potability, which is of utmost importance for community safety. A comprehensive analysis of a machine learning model is presented here, considering various quality parameters, to achieve this prediction. The model incorporates decision tree, KNN, Random Forest, SGD, SVM, logistic regression, and other algorithms. All steps of the study, including pre-processing, exploratory data analysis, feature scaling, model construction, assessment, and hyperparameter tuning, are thoroughly covered. Performance indicators like accuracy, confusion matrix, and classification report are used to evaluate the effectiveness of each model. Hyperparameter tweaking is implemented in decision trees, random forest algorithms, and K-nearest neighbors through grid search to optimize accuracy. The suggested model demonstrates its capability to forecast water portability accurately. It provides stakeholders with a systematic approach to model construction, evaluation, and optimization, thus ensuring water safety.
APA, Harvard, Vancouver, ISO, and other styles
9

Ghose Soumya, Mitra Jhimli, Khanna Sankalp, and Dowling Jason. "An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework." In Studies in Health Technology and Informatics. IOS Press, 2015. https://doi.org/10.3233/978-1-61499-558-6-56.

Full text
Abstract:
Dynamic and automatic patient specific prediction of the risk associated with ICU mortality may facilitate timely and appropriate intervention of health professionals in hospitals. In this work, patient information and time series measurements of vital signs and laboratory results from the first 48 hours of ICU stays of 4000 adult patients from a publicly available dataset are used to design and validate a mortality prediction system. An ensemble of decision trees are used to simultaneously predict and associate a risk score against each patient in a k-fold validation framework. Risk assessment prediction accuracy of 87% is achieved with our model and the results show significant improvement over a baseline algorithm of SAPS-I that is commonly used for mortality prediction in ICU. The performance of our model is further compared to other state-of-the-art algorithms evaluated on the same dataset.
APA, Harvard, Vancouver, ISO, and other styles
10

Qin, Meng. "A Software Code Infringement Detection Scheme Based on Integration Learning." In Advances in Transdisciplinary Engineering. IOS Press, 2024. http://dx.doi.org/10.3233/atde231264.

Full text
Abstract:
A software code plagiarism detection scheme based on ensemble learning is designed to address the issue of low accuracy in traditional abstract syntax tree based software code infringement detection methods. We adopt the AST structure of the code to integrate domain partitioning in IR with AST, and use a weighted simplified abstract syntax tree to design feature extraction and similarity calculation methods, to achieve partial detection of semantic plagiarism and calculate the similarity between text and source code. Then, the feature set of the known classification training set is placed into a random forest based ensemble classifier for training, and an association between error rate and the classification effect of the decision tree in the random forest are proposed to acquire feature node matching with the feature in the code base. The experimental results show that our scheme has higher accuracy than traditional detection methods based on abstract syntax trees. It can not only detect code similarity, but also provide the types of plagiarism, which has better comprehensive identification performance.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Decision classification trees discriminant random forest"

1

Pirić, David, and Romana Masnikosa. "PERFORMANCE OF RANDOM FORESTS, EXTREME GRADIENT BOOSTING AND SUPPORT VECTOR MACHINES EMPLOYED IN LIPIDOMICS." In 17th International Conference on Fundamental and Applied Aspects of Physical Chemistry. Society of Physical Chemists of Serbia, 2024. https://doi.org/10.46793/phys.chem24i.223p.

Full text
Abstract:
Herein we present the performance of three supervised machine learning (ML) algorithms: random forests (RF), extreme gradient boosting (XGB) and support vector machines (SVM) in classification of human serum samples into pancreatic cancer or control group, using a lipidomic dataset retrieved from the research article „Lipidomic profiling of human serum enables detection of pancreatic cancer“ by Wolrab et al. [1]. Our main objective was to assess and compare, for the three ML techniques, the performance metrics, that is accuracy, precision, sensitivity, F1 score and ROC- AUC, with those computed by decision trees (DT), which is the basis of RF and XGB algorithms, and commonly used partial least squares – discriminant analysis (PLS-DA) and orthogonal projections to latent structures – discriminant analysis (OPLS-DA). We suggest that RF, XGB and SVM represent excellent binary classifiers, making these three promising candidates for future use in the discovery of potential lipid biomarkers.
APA, Harvard, Vancouver, ISO, and other styles
2

Sakai, Hajar. "Machine Learning Approaches for Stroke Classification." In 2023 IISE Annual Conference & Expo. Curran Associates, Inc., 2023. http://dx.doi.org/10.21872/2023iise_1127.

Full text
Abstract:
Stroke is known for being one of the severe silent diseases that cause sudden death. It may as well cause permanent disabilities and also shorten the patient's life expectancy. Its related number of occurrences keeps increasing in the United States which makes it a threatening medical emergency for which healthcare providers need a decision support tool. All these reasons aggregated are the motive behind this research. The prognosis of stroke disease can be improved promptly such that its early detection will increase the probability of getting cured easily or being identified at risk for urgent care. Therefore, an analytical approach is suggested for support. Multiple machine learning algorithms are applied and compared using a dataset recording 5110 patients, 5% of them with stroke history. They include Logistic Regression, two Tree-based Algorithms, Linear Discriminant Analysis, and Multilayer Perceptron. The dataset comprises variables describing the physiological features of a patient such as the Body Mass Index (BMI), and the average glucose level as well as information related to the patient’s clinical history. As a result, Random Forest outperforms the others in terms of both Accuracy (94%) and F1-score (92%). While Logistic Regression and Linear Discriminant Analysis equally show the best results in terms of AUC score (84%).
APA, Harvard, Vancouver, ISO, and other styles
3

Al-Khudafi, Abbas M., Hamzah A. Al-Sharifi, Ghareb M. Hamada, Mohamed A. Bamaga, Abdulrahman A. Kadi, and A. A. Al-Gathe. "Evaluation of Different Tree-Based Machine Learning Approaches for Formation Lithology Classification." In International Geomechanics Symposium. ARMA, 2023. http://dx.doi.org/10.56952/igs-2023-0026.

Full text
Abstract:
Abstract This study aims to assess the effectiveness of several decision tree techniques for identifying formation lithology. 20966 data points from 4 wells were used to create the study's data. Lithology is determined using seven log parameters. The seven log parameters are the density log, neutron log, sonic log, gamma ray log, deep latero log, shallow latero log, and resistivity log. Different decision tree-based algorithms for classification approaches were applied. six typical machine learning models, namely the, Random Forest. Random trees, J48, reduced-error pruning decision trees, logistic model trees, HoeffdingTree were evaluated for formation lithology identification using well logging data. The obtained results shows that the random forest model, out of the proposed decision tree models, performed best at lithology identification, with precession, recall, and F-score values of 0.913, 0.914, and 0.913, respectively. Random trees Random trees came next. With average precision, recall, and F1-score of 0.837, 0.84, and 0.837, respectively, the J48 model came in third place. The HoeffdingTree classification model, however, showed the worst performance. We conclude that boosting strategies enhance the performance of tree-based models. Evaluation of prediction capability of models is also carried out using different datasets.
APA, Harvard, Vancouver, ISO, and other styles
4

Moghadam, Armin, and Fatemeh Davoudi Kakhki. "Comparative Study of Decision Tree Models for Bearing Fault Detection and Classification." In Intelligent Human Systems Integration (IHSI 2022) Integrating People and Intelligent Systems. AHFE International, 2022. http://dx.doi.org/10.54941/ahfe100968.

Full text
Abstract:
Fault diagnosis of bearings is essential in reducing failures and improving functionality and reliability of rotating machines. As vibration signals are non-linear and non-stationary, extracting features for dimension reduction and efficient fault detection is challenging. This study aims at evaluating performance of decision tree-based machine learning models in detection and classification of bearing fault data. A machine learning approach combining the tree-based classifiers with derived statistical features is proposed for localized fault classification. Statistical features are extracted from normal and faulty vibration signals though time domain analysis to develop tree-based models of AdaBoost (AD), classification and regression trees (CART), LogitBoost trees (LBT), and Random Forest trees (RF). The results confirm that machine learning classifiers have satisfactory performance and strong generalization ability in fault detection, and provide practical models for classify running state of the bearing.
APA, Harvard, Vancouver, ISO, and other styles
5

Amagada, P. U. "An Inferable Machine Learning Approach for Reservoir Lithology Characterization Using Drilling Data." In SPE Annual Technical Conference and Exhibition. SPE, 2023. http://dx.doi.org/10.2118/217485-stu.

Full text
Abstract:
Abstract Reservoir lithology is a key factor in petroleum exploration and petrophysical calculations. It is of utmost importance as it serves as a foundation for reservoir characterization and formation evaluation. Accurate estimation of the reservoir permeability, porosity, and water saturation, is greatly dependent on accurate identification of the reservoir lithology. Ideally, the reservoir lithology is determined by obtaining physical samples of the reservoir. This process is however very expensive and time-consuming, hence the wide adoption of well log responses for identifying the reservoir lithology. Most Machine learning approaches are imminently built to render good classification, and some have been adapted to probability estimation. The purpose of this study is to demonstrate how machine learning can be used to estimate the probability of reservoir lithology with the use of drilling data. The drilling data used in this research is from the Volve oil field in Stavanger, Norway. The preprocessed data consisted of pump pressure, surface torque average, rotation per minute of drill bit, mudflow rate, total gas content, effective circulation density, pump stroke rate, lithology type, and weight on bit. The data was split into 80% for training and 20% for the test set. Feature selection was done using expert domain knowledge. The three lithology characteristics captured by the data include sandstone, claystone, and marl. Intelligent models are algorithms designed to learn from large volumes of data and draw valuable insights from them. Examples are neural networks, logistic regression, and Random Forest. In this study, we are primarily interested in probabilistic prediction rather than label classification or a deterministic prediction. The problem was treated as a probability estimation problem using logistic regression, Decision trees, and Random Forest models. Decision Trees are a type of supervised machine learning where the data is continuously split according to a certain parameter. Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. Random Forest is an ensemble learning method for classification and regression that operates by constructing multiple decision trees at training time. The probabilistic classifier predicts a probability distribution over a set of lithology classes using drilling data. The stratified k-fold cross validation technique was used for model comparison on the training data. The performance of models was evaluated using the metrics- accuracy score, the area under the receiver operating characteristic curve (AUC), precision, recall and f1 score. The AUC score was considered to be the best evaluation metric for the task. We relied on the receiver operating characteristic curve (ROC) and the area under the curve (AUC) to evaluate the performance of the models. The higher the AUC, the better the ability to distinguish between the lithology classes. The logistic regression, Decision trees, and Random Forest models achieved ROC AUC scores of 0.7547, 0.8747, and 0.9932 respectively. The results revealed that the Random Forest model outperformed the other models. The Random Forest model achieved a ROC AUC score of 98.59% on the test dataset indicating its capability to estimate the probability of having a reservoir lithology with a high confidence level. This study resulted in the application of machine learning techniques to develop models capable of estimating the probability of a reservoir lithology in the absence of a reservoir sample. The models were developed by fitting logistic regression, Decision trees, and Random Forest machine-learning algorithms to a drilling dataset. The results revealed that the models performed satisfactorily in estimating the probability of a reservoir lithology. The Random Forest model outperformed the other models. Therefore, in the absence of a reservoir sample, the probability of a reservoir lithology can be estimated using the model. These predictions can be used for compatibility tests between formation and bit, improved bit selection programs, and drilling rate optimization. The accurate predictions from the model will be very useful for drilling planning and bit optimization thereby reducing drilling costs. Lithology characterization based on drilling data is also important for real-time geosteering in the oil and gas industry.
APA, Harvard, Vancouver, ISO, and other styles
6

Oliveira, Gustavo Henrique de, and Franklin César Flores. "Classification of heart arrhythmia by digital image processing and machine learning." In Seminário Integrado de Software e Hardware. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/semish.2023.230225.

Full text
Abstract:
The electrocardiogram (ECG) exam can be used reliably as a measure to monitor the functionality of the cardiovascular system. Although there are many similarities between different ECG conditions, the focus of most studies has been to classify a set of database signals known as PhysionNet MIT-BIH and PTB Diagnostics data sets, rather than classifying problems in real images. In this article, we propose methods to extract features from the exam image and then algorithms such as CNN, decision tree, extra trees and random forest are used for the classification of exams, which is able to accurately classify according to the AAMI EC57 standard . According to the results, the suggested method is capable of making predictions with an average accuracy of 97.4 %.
APA, Harvard, Vancouver, ISO, and other styles
7

Al-Sharifi, H. A., A. M. Alkhudafi, A. A. Al-Gathe, S. O. Baarimah, Wahbi Al-Ameri, and A. T. Alyazidi. "Prediction of Two-Phase Flow Regimes in Vertical Pipes Using Tree-Based Ensemble Models." In International Petroleum Technology Conference. IPTC, 2024. http://dx.doi.org/10.2523/iptc-24084-ms.

Full text
Abstract:
Abstract The multi-phase fluid transfer pattern in vertical flow through pipelines is a significant parameter to be predetermined for predicting the pressure gradient, liquid holdup, and other flow properties. In the present study, the prediction of two-phase flow patterns in vertical pipes using ensemble machine-learning classification models is presented. For this purpose, ensemble machine learning techniques including boosting, bagging, and random forest have been applied. A decision tree-based classifier is proposed, such as Random trees (RT), J48, reduced-error pruning decision trees (REPT), logistic model trees (LMT), and decision trees with naive Bayes (NBT), to predict flow regimes. Datasets consisting of more than 2250 data points were used to develop the ensemble models. The importance of attributes for different models was investigated based on a dataset consisting of 1088 data points. Feature selection was performed by applying six different optimization methods. For this task, training, and cross-validation were used. To check the performance of the classifier, a learning curve is used to determine the optimal number of training data points to use. The performance of the algorithm is evaluated based on the metrics of classification accuracy, confusion matrix, precision, recall, F1-score, and the PRC area. The boosting approach and random forest classifiers have higher prediction accuracy compared with the other ensemble methods. AdaBoost, LogitBoost, and MultiBoosting algorithms were applied as boosting approaches. Multiposting has a better performance compared with the other two techniques. The random forests provided a high level of performance. Its average precision, recall, and F1 scores are 0.957, 0.958, and 0.949, respectively. It is concluded that comparing the results of single classifiers, the ensemble algorithm performed better than the single model. As such, the accuracy rate of the prediction of flow regimes can be increased to 96%. This study presents a robust and improved technique as an alternative method for the prediction of two-phase flow regimes in vertical flow with high accuracy, low effort, and lower costs. The developed models provide satisfactory and adequate results under different conditions.
APA, Harvard, Vancouver, ISO, and other styles
8

Marsh, Kennedy, Clifton Wallace, Jeffrey Hernandez, Rodney Dejournett, Xiaohong Yuan, and Kaushik Roy. "Authentication Based on Periocular Biometrics and Skin Tone." In 2022 KSU CONFERENCE ON CYBERSECURITY EDUCATION, RESEARCH AND PRACTICE. Kennesaw State University, 2022. http://dx.doi.org/10.32727/28.2023.6.

Full text
Abstract:
Face images with masks have a major effect on the identification and authentication of people with masks covering key facial features such as noses and mouths. In this paper, we propose to use periocular region and skin tone for authenticating users with masked faces. We first extract the periocular region of faces with masks, then detect the skin tone for each face. We then train models using machine learning algorithms Random Forest, XGBoost, and Decision Trees using skin tone information and perform classification on two datasets. Experiment results show these models had good performance.
APA, Harvard, Vancouver, ISO, and other styles
9

Idogun, Akpevwe Kelvin, Ruth Oyanu Ujah, and Lesley Anne James. "Surrogate-Based Analysis of Chemical Enhanced Oil Recovery – A Comparative Analysis of Machine Learning Model Performance." In SPE Nigeria Annual International Conference and Exhibition. SPE, 2021. http://dx.doi.org/10.2118/208452-ms.

Full text
Abstract:
Abstract Optimizing decision and design variables for Chemical EOR is imperative for sensitivity and uncertainty analysis. However, these processes involve multiple reservoir simulation runs which increase computational cost and time. Surrogate models are capable of overcoming this impediment as they are capable of mimicking the capabilities of full field three-dimensional reservoir simulation models in detail and complexity. Artificial Neural Networks (ANN) and regression-based Design of Experiments (DoE) are common methods for surrogate modelling. In this study, a comparative analysis of data-driven surrogate model performance on Recovery Factor (RF) for Surfactant-Polymer flooding is investigated with seven input variables including Kv/Kh ratio, polymer concentration in polymer drive, surfactant slug size, surfactant concentration in surfactant slug, polymer concentration in surfactant slug, polymer drive size and salinity of polymer drive. Eleven Machine learning models including Multiple Linear Regression (MLR), Ridge and Lasso regression; Support Vector Regression (SVR), ANN as well as Classification and Regression Tree (CART) based algorithms including Decision Trees, Random Forest, eXtreme Gradient Boosting (XGBoost), Gradient Boosting and Extremely Randomized Trees (ERT), are applied on a dataset consisting of 202 datapoints. The results obtained indicate high model performance and accuracy for SVR, ANN and CART based ensemble techniques like Extremely Randomized Trees, Gradient Boost and XGBoost regression, with high R2 values and lowest Mean Squared Error (MSE) values for the training and test dataset. Unlike other studies on Chemical EOR surrogate modelling where sensitivity was analyzed with statistical DoE, we rank the input features using Decision Tree-based algorithms while model interpretability is achieved with Shapely Values. Results from feature ranking indicate that surfactant concentration, and slug size are the most influential parameters on the RF. Other important factors, though with less influence, are the polymer concentration in surfactant slug, polymer concentration in polymer drive and polymer drive size. The salinity of the polymer drive and the Kv/Kh ratio both have a negative effect on the RF, with a corresponding least level of significance.
APA, Harvard, Vancouver, ISO, and other styles
10

Leal Jauregui, Jairo Alonso, Alfredo Jose Arevalo Lopez, Mohammed Atwi, and Daniel Alejandro Leal Leal. "A New Approach to Choke Flow Models Using Machine Learning Algorithms." In International Petroleum Technology Conference. IPTC, 2022. http://dx.doi.org/10.2523/iptc-22168-ms.

Full text
Abstract:
Abstract Computer Science Technology has been widely used for simulation of Gas and Petroleum Networks. Wellhead chokes or Pressure Control Valves are specialized equipment used extensively in the Hydrocarbon Industry for two purposes; to maintain stable downstream pressure from the wells, and to provide necessary backpressure to balance gas well productivity while controlling downhole drawdown. Use of multiphase choke flow models and empirical choke flow equations have been developed in the past half-century to improve gas estimation at different fluid, flow regime, flow types and pressure drop scenarios. All these have carried over certain measurement errors which make it difficult to predict well performance parameters with the mentioned methods. Traditional models use sonic flow equation and Gilbert-type formulae for critical flow of multiphase choke cases as a base line. Evolution of new models capture further regression refinements, constrains values and multiple regression studies at different pressure drop, PVT properties, gas-liquid ratios, and choke sizes. The new Algorithm has been developed by using Random Forests Regression (RFR) and has applied the help and learn method to data classification by constructing a multitude of decision trees for stored measurements of multiple gas production variables. A decade ago a second generation of choke equation models was developed, consolidating multiple databases from production operations. This choke equation has been used extensively showing single digit errors in most of gas estimations when compared against conventional well testing physical equipment readings. The use of this 2010 Choke Gas Equation (Ref. 9) has been valuable on reducing use of conventional testing equipment without jeopardizing data quality. However, the prediction error of these models starts to increase in deviating conditions such as low gas rates or increased water and condensate ratios. New data collection has been taking place considering multiple different scenarios, different time laps and additional variables. These new and enhanced databases help evaluate new models and better data-driven analytics. The application of this algorithm improves the prediction accuracy compared to traditional regression methods as it captures more of the variance in the data, thus the implementation of RFR and enables more accurate prediction of the Separator rate for the overall gas wells in the field. This paper, explains and applies the machine learning algorithm known as RFR (Random Forest Regression) and compare with GPR (Gaussian Process Regression) to this particular request on Gas Production Engineering metering. The algorithm allows the computer to understand underlying patterns in the data and make better predictions based on different regression trees and their use for nonlinear multiple regressions. This paper explains the application of RFR and GPR methods to the separator gas rate estimation, and shows better prediction results. This paper also explains and application of those two-machine learning algorithm (Random Forest Regression and Gaussian Process Regression) helping us to predict gas volume, using choke size, upstream and downstream flowing pressures, condensate to gas ratio (CGR) and upstream temperatures. These approaches are benchmarked against the first (back 2005) and second models (Ref. 9) and demonstrate a drastic reduction in prediction error and a more robust ability to manage high variability in the data in comparison previous models using single variable statistics tools.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Decision classification trees discriminant random forest"

1

Alwan, Iktimal, Dennis D. Spencer, and Rafeed Alkawadri. Comparison of Machine Learning Algorithms in Sensorimotor Functional Mapping. Progress in Neurobiology, 2023. http://dx.doi.org/10.60124/j.pneuro.2023.30.03.

Full text
Abstract:
Objective: To compare the performance of popular machine learning algorithms (ML) in mapping the sensorimotor cortex (SM) and identifying the anterior lip of the central sulcus (CS). Methods: We evaluated support vector machines (SVMs), random forest (RF), decision trees (DT), single layer perceptron (SLP), and multilayer perceptron (MLP) against standard logistic regression (LR) to identify the SM cortex employing validated features from six-minute of NREM sleep icEEG data and applying standard common hyperparameters and 10-fold cross-validation. Each algorithm was tested using vetted features based on the statistical significance of classical univariate analysis (p&lt;0.05) and extended () 17 features representing power/coherence of different frequency bands, entropy, and interelectrode-based distance. The analysis was performed before and after weight adjustment for imbalanced data (w). Results: 7 subjects and 376 contacts were included. Before optimization, ML algorithms performed comparably employing conventional features (median CS accuracy: 0.89, IQR [0.88-0.9]). After optimization, neural networks outperformed others in means of accuracy (MLP: 0.86), the area under the curve (AUC) (SLPw, MLPw, MLP: 0.91), recall (SLPw: 0.82, MLPw: 0.81), precision (SLPw: 0.84), and F1-scores (SLPw: 0.82). SVM achieved the best specificity performance. Extending the number of features and adjusting the weights improved recall, precision, and F1-scores by 48.27%, 27.15%, and 39.15%, respectively, with gains or no significant losses in specificity and AUC across CS and Function (correlation r=0.71 between the two clinical scenarios in all performance metrics, p&lt;0.001). Interpretation: Computational passive sensorimotor mapping is feasible and reliable. Feature extension and weight adjustments improve the performance and counterbalance the accuracy paradox. Optimized neural networks outperform other ML algorithms even in binary classification tasks. The best-performing models and the MATLAB® routine employed in signal processing are available to the public at (Link 1).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography