Academic literature on the topic 'Decision Tree and Random Forest Classifier'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Decision Tree and Random Forest Classifier.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Decision Tree and Random Forest Classifier"

1

O. A., Oluwabunmi, Zainab I. A., and Adeolu L. "Comparative Analysis of Weather Prediction Using Classification Algorithm: Random Forest Classifier, Decision Tree Classifier and Extra Tree Classifier." African Journal of Mathematics and Statistics Studies 7, no. 2 (2024): 162–71. http://dx.doi.org/10.52589/ajmss-f6h03bne.

Full text
Abstract:
Comparison of machine learning models is carried out in order to determine which models are best to deploy as a system. However, for the purpose of our research, we carried out a comparative analysis on Random Forest classifier, Decision Tree classifier and Extra Tree classifier for weather prediction systems as we focused on seeking the classifier with the highest performance metrics. Based on the metrics, accuracy score, the best model for the system was determined. We carried out training, testing and validation of the three different models on the same dataset from the Kaggle dataset. We were able to implement Random Forest Classifier, Decision Tree Classifier and Extra Tree Classifier from Scikit-Learn to make weather prediction and using matplotlib to visualize the accuracy score of the implemented models. The Random Forest Classifier was chosen as the best able to achieve the highest at 66% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
2

Sahu, H., D. Haldar, A. Danodia, and S. Kumar. "CLASSIFICATION OF ORCHARD CROP USING SENTINEL-1A SYNTHETIC APERTURE RADAR DATA." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-5 (November 19, 2018): 335–38. http://dx.doi.org/10.5194/isprs-archives-xlii-5-335-2018.

Full text
Abstract:
<p><strong>Abstract.</strong> A study was conducted in Saharanpur District of Uttar Pradesh to asses the potential of Sentinel-1A SAR Data in orchard crop classification. The objective of the study was to evaluate three different classifiers that are maximum likelihood classifier, decision tree algorithm and random forest algorithm in Sentinel-1A SAR Data. An attempt is made to study Sentinel-1A SAR Data to classify orchard crop using this approach. Here the rule-based classifiers such as decision tree algorithm and random forest algorithm are compared with conventional maximum likelihood classifier. Statistical analysis of the classification show that the distribution of the crop, forest orchard, settlement and waterbody was 17.47<span class="thinspace"></span>%, 0.47<span class="thinspace"></span>%, 28.3<span class="thinspace"></span>%, 28.3<span class="thinspace"></span>% and 25.5<span class="thinspace"></span>% respectively in all the classification algorithm but root mean square error for maximum likelihood classifier (1.278) is more than decision tree algorithm (1.196) and random forest algorithm (1.193). Out of three, a percentage correct prediction is highest in case of decision tree algorithm (73.4) than random forest algorithm (72.5) and least for maximum likelihood classifier (66.8) in December 2017. The accuracy for orchard class is 0.81 for maximum likelihood classifier, 0.80 for decision tree algorithm and 0.78 for random forest algorithm. Thus Sentinel-1A SAR Data was effectively utilized for the classification of orchard crops.</p>
APA, Harvard, Vancouver, ISO, and other styles
3

Kulkarni, Vrushali Y., Pradeep K. Sinha, and Manisha C. Petare. "Weighted Hybrid Decision Tree Model for Random Forest Classifier." Journal of The Institution of Engineers (India): Series B 97, no. 2 (2015): 209–17. http://dx.doi.org/10.1007/s40031-014-0176-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Prasanna, S. T. P., and T. Veeramani. "Supervised study of Novel Random Forest Algorithm for prediction of heart disease in Comparison With The Decision Tree Algorithm." CARDIOMETRY, no. 25 (February 14, 2023): 1483–90. http://dx.doi.org/10.18137/cardiometry.2022.25.14831490.

Full text
Abstract:
Aim: The aim of this work is to evaluate the accuracy and precision in predicting heart disease using Decision Tree (DT) and Novel Random forest (RF) Classification algorithms. Materials and Methods: Novel Random forest is appealed on a heart dataset which consists of 150 records. A framework for predicting heart disease in the medical field comparing the proposed and developed RF and DT classifiers. Sample Size Calculated as 55 in every group by using 80% G power. Sample Size Calculated using clinical analysis, with Alpha and Beta values of 0.05 and 0.5, the confidence level. confidence is 95%, nicest strength is 80% and registration rate is 1. Results: The Decision Tree classifier produces 96.42% accuracy in predicting the heart disease on the data set, whereas the Random forest classifier predicts the same at the rate of 78.45% of the time with a statistically significant difference between the two groups (p=0.004;p<0.05)with confidence interval 95%. Hence Novel Random forest is better than the Decision Tree. Conclusion: The results show that the performance of Random forest is better compared with Decision Tree in terms of both precision and accuracy.
APA, Harvard, Vancouver, ISO, and other styles
5

Ram, P. Sathya Sai, C. Sravan Kumar, Mukund Pandey, D. Rakesh, V. Naveen, and K. Prem Kumar. "End to End Car Selling Portal By Loan Prediction Using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (2022): 841–51. http://dx.doi.org/10.22214/ijraset.2022.43870.

Full text
Abstract:
Abstract: Cars have become an asset even though it’s a liability for common use because of the comforts it provides, users wouldn’t want to miss out on the different brands and luxuries it gives but jolting down the types , prices, and the finance part becomes hectic and a consumer usually buys it offline, but what if processes can be done faster digitally?. Buying online can significantly give many options for users/consumers .We are creating a solution by integrating a web application created using Express, React, Node.js, Google Firebase with Machine learning using the Random Forest Classifier after analysis of different models like KNN, SVC, Logistic Regression, Decision Tree Classifier, Extra Trees Classifier. Keywords: Web app, React, Node.js, Express, Firebase, Machine Learning, KNN, SVC, Logistic Regression, Decision Tree Classifier, Extra Trees Classifier, Random Forest Classifier
APA, Harvard, Vancouver, ISO, and other styles
6

Syed, Muzibuddin, and Rani Dr.N.Usha. "Analysis of Diabetes Prediction using Decision Tree Classifier." International Journal of Innovative Research in Advanced Engineering 11, no. 03 (2024): 164–71. http://dx.doi.org/10.26562/ijirae.2024.v1103.04.

Full text
Abstract:
Diabetes is a serious complaint that affects the maturity of the population. Now a days its play a major role on human life. Imbalance in insulin processing by the body which leads to varieties of disorder. The main aim of this work is to make an early prediction of diabetes more precisely by using Auto Machine Learning Tools. Auto Machine learning Tools provide better results in diabetes detection by constructing models from patient datasets. This model automates the training, tuning, and deploying machine learning models. Recent developments in Machine learning show that Automatic Diabetic detection using Random Forest Algorithm models can be very beneficial in such problems. The proposed Random forest model predicts the diabetes at early stage. We use Decision tree classifier to predict whether a patient has diabetes based on diagnostic measurements. Performance and accuracy of the applied algorithm is discusses and compared.
APA, Harvard, Vancouver, ISO, and other styles
7

Homjandee, Suvaporn, and Krung Sinapiromsaran. "A Random Forest with Minority Condensation and Decision Trees for Class Imbalanced Problems." WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL 16 (September 16, 2021): 502–7. http://dx.doi.org/10.37394/23203.2021.16.46.

Full text
Abstract:
Building an effective classifier that could classify a target or class of instances in a dataset from historical data has played an important role in machine learning for a decade. The standard classification algorithm has difficulty generating an appropriate classifier when faced with an imbalanced dataset. In 2019, the efficient splitting measure, minority condensation entropy (MCE) [1] is proposed that could build a decision tree to classify minority instances. The aim of this research is to extend the concept of a random forest to use both decision trees and minority condensation trees. The algorithm will build a minority condensation tree from a bootstrapped dataset maintaining all minorities while it will build a decision tree from a bootstrapped dataset of a balanced dataset. The experimental results on synthetic datasets apparent the results that confirm this proposed algorithm compared with the standard random forest are suitable for dealing with the binary-class imbalanced problem. Furthermore, the experiment on real-world datasets from the UCI repository shows that this proposed algorithm constructs a random forest that outperforms other existing random forest algorithms based on the recall, the precision, the F-measure, and the Geometric mean
APA, Harvard, Vancouver, ISO, and other styles
8

Le, Ngoc-Bich, Thi-Thu-Hien Pham, Sy-Hoang Nguyen, Nhat-Minh Nguyen, and Tan-Nhu Nguyen. "AI-powered Predictive Model for Stroke and Diabetes Diagnostic." International Journal of Intelligent Systems and Applications 16, no. 1 (2024): 24–40. http://dx.doi.org/10.5815/ijisa.2024.01.03.

Full text
Abstract:
Research efforts in the prediction of stroke and diabetes prioritize early detection in order to enhance patient outcomes. To achieve this, a variety of methodologies are integrated. Existing studies, on the other hand, are marred by imbalanced datasets, lack of diversity in their datasets, potential bias, and inadequate model comparisons; these flaws underscore the necessity for more comprehensive and inclusive research methodologies. This paper provides a thorough assessment of machine learning algorithms in the context of early detection and diagnosis of stroke and diabetes. The research employed widely used algorithms, including Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and XGBoost Classifier, to examine medical data and derive significant findings. The XGBoost Classifier demonstrated superior performance, with an outstanding accuracy, precision, recall, and F1-score of 87.5%. The comparative examination of the algorithms indicated that the Decision Tree, Random Forest, and XGBoost classifiers consistently exhibited strong performance across all measures. The models demonstrated impressive discrimination capabilities, with the XGBoost Classifier and Random Forest reaching accuracy rates of roughly 87.5% and 86.5% respectively. The Decision Tree Classifier exhibited notable performance, with an accuracy rate of 83%. The overall accuracy of the models was evident in the F1-score, a metric that incorporates recall and precision, where the XGBoost model exhibited a marginal improvement of 2% over the Random Forest and Decision Tree models, and 4.25 percent over the last two. The aforementioned results underscore the effectiveness of the XGBoost Classifier, which will be employed as a predictive model in this study, alongside the Random Forest and Decision Tree models, for the accurate identification of stroke and diabetes. Furthermore, combining datasets improves model performance by utilizing relative features. This integrated dataset improves the model's efficiency and creates a resilient and comprehensive prediction model, improving healthcare outcomes. The findings of this research make a valuable contribution to the advancement of AI-driven diagnostic systems, hence enhancing the quality of healthcare decision-making.
APA, Harvard, Vancouver, ISO, and other styles
9

Azeez, N. A., S. S. Oladele, and O. Ologe. "Identification of pharming in communication networks using ensemble learning." Nigerian Journal of Technological Development 19, no. 2 (2022): 172–80. http://dx.doi.org/10.4314/njtd.v19i2.10.

Full text
Abstract:
Pharming scams are carried out by exploiting the DNS as the main weapon while phishing attacks employ spoofed websites that appear to be legitimate to internet users. Phishing makes use of baits such as fake links but pharming leverages and negotiates on the DNS server to move and redirect internet users to a fake and simulated website.Having seen several challenges through pharming resulting into vulnerable websites, personal emails and accounts on social media, the usage and reliability on internet calls for caution. Against this backdrop, this work aims at enhancing pharming detection strategies by adopting machine learning classification algorithms. To further obtain the best classification results, an ensemble learning approach was adopted. The algorithms used include K-Nearest Neighbors (KNN), Decision Tree, Random Forest, Gaussian Naive Bayes, Logistic Regression, Support Vector Machine, Adaptive Boosting, Gradient Boosting, and Extra Trees Classifier. During the testing process, the classifiers were tested against four popular metrics: accuracy, recall, precision, F1 score, and Log loss. The results demonstrate the performance of all algorithms used, as well as their relationships. The ensemble model that included Logistic Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine, Gradient Boosting Classifier, AdaBoost Classifier, Extra Trees Classifier, and Random Forest produced the best results after evaluating them on the two datasets. Random Forest Classifiers showed a better performance of the classifiers, with mean accuracies of 0.932 and 0.939, respectively for each of the datasets when compared to 0.476 and 0.519 obtained for Naive Bayes.
APA, Harvard, Vancouver, ISO, and other styles
10

Bagui, Sikha S., Dustin Mink, Subhash C. Bagui, et al. "Introducing the UWF-ZeekDataFall22 Dataset to Classify Attack Tactics from Zeek Conn Logs Using Spark’s Machine Learning in a Big Data Framework." Electronics 12, no. 24 (2023): 5039. http://dx.doi.org/10.3390/electronics12245039.

Full text
Abstract:
This study introduces UWF-ZeekDataFall22, a newly created dataset labeled using the MITRE ATT&CK framework. Although the focus of this research is on classifying the never-before classified resource development tactic, the reconnaissance and discovery tactics were also classified. The results were also compared to a similarly created dataset, UWF-ZeekData22, created in 2022. Both of these datasets, UWF-ZeekDataFall22 and UWF-ZeekData22, created using Zeek Conn logs, were stored in a Big Data Framework, Hadoop. For machine learning classification, Apache Spark was used in the Big Data Framework. To summarize, the uniqueness of this work is its focus on classifying attack tactics. For UWF-ZeekdataFall22, the binary as well as the multinomial classifier results were compared, and overall, the results of the binary classifier were better than the multinomial classifier. In the binary classification, the tree-based classifiers performed better than the other classifiers, although the decision tree and random forest algorithms performed almost equally well in the multinomial classification too. Taking training time into consideration, decision trees can be considered the most efficient classifier.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Decision Tree and Random Forest Classifier"

1

Федоров, Д. П. "Comparison of classifiers based on the decision tree." Thesis, ХНУРЕ, 2021. https://openarchive.nure.ua/handle/document/16430.

Full text
Abstract:
The main purpose of this work is to compare classifiers. Random Forest and XGBoost are two popular machine learning algorithms. In this paper, we looked at how they work, compared their features, and obtained accurate results from their robots.
APA, Harvard, Vancouver, ISO, and other styles
2

Holloway, Jacinta. "Extending decision tree methods for the analysis of remotely sensed images." Thesis, Queensland University of Technology, 2021. https://eprints.qut.edu.au/207763/1/Jacinta_Holloway_Thesis.pdf.

Full text
Abstract:
One UN Sustainable Development Goal focuses on monitoring the presence, growth, and loss of forests. The cost of tracking progress towards this goal is often prohibitive. Satellite images provide an opportunity to use free data for environmental monitoring. However, these images have missing data due to cloud cover, particularly in the tropics. In this thesis I introduce fast and accurate new statistical methods to fill these data gaps. I create spatial and stochastic extensions of decision tree machine learning methods for interpolating missing data. I illustrate these methods with case studies monitoring forest cover in Australia and South America.
APA, Harvard, Vancouver, ISO, and other styles
3

Булах, В. А., Л. О. Кіріченко, and Т. А. Радівілова. "Classification of Multifractal Time Series by Decision Tree Methods." Thesis, КНУ, 2018. http://openarchive.nure.ua/handle/document/5840.

Full text
Abstract:
The article considers classification task of model fractal time series by the methods of machine learning. To classify the series, it is proposed to use the meta algorithms based on decision trees. To modeling the fractal time series, binomial stochastic cascade processes are used. Classification of time series by the ensembles of decision trees models is carried out. The analysis indicates that the best results are obtained by the methods of bagging and random forest which use regression trees.
APA, Harvard, Vancouver, ISO, and other styles
4

Assareh, Amin. "OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION." Kent State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1353971575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Doubleday, Kevin. "Generation of Individualized Treatment Decision Tree Algorithm with Application to Randomized Control Trials and Electronic Medical Record Data." Thesis, The University of Arizona, 2016. http://hdl.handle.net/10150/613559.

Full text
Abstract:
With new treatments and novel technology available, personalized medicine has become a key topic in the new era of healthcare. Traditional statistical methods for personalized medicine and subgroup identification primarily focus on single treatment or two arm randomized control trials (RCTs). With restricted inclusion and exclusion criteria, data from RCTs may not reflect real world treatment effectiveness. However, electronic medical records (EMR) offers an alternative venue. In this paper, we propose a general framework to identify individualized treatment rule (ITR), which connects the subgroup identification methods and ITR. It is applicable to both RCT and EMR data. Given the large scale of EMR datasets, we develop a recursive partitioning algorithm to solve the problem (ITR-Tree). A variable importance measure is also developed for personalized medicine using random forest. We demonstrate our method through simulations, and apply ITR-Tree to datasets from diabetes studies using both RCT and EMR data. Software package is available at https://github.com/jinjinzhou/ITR.Tree.
APA, Harvard, Vancouver, ISO, and other styles
6

Wright, Lindsey. "Classifying textual fast food restaurant reviews quantitatively using text mining and supervised machine learning algorithms." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/honors/451.

Full text
Abstract:
Companies continually seek to improve their business model through feedback and customer satisfaction surveys. Social media provides additional opportunities for this advanced exploration into the mind of the customer. By extracting customer feedback from social media platforms, companies may increase the sample size of their feedback and remove bias often found in questionnaires, resulting in better informed decision making. However, simply using personnel to analyze the thousands of relative social media content is financially expensive and time consuming. Thus, our study aims to establish a method to extract business intelligence from social media content by structuralizing opinionated textual data using text mining and classifying these reviews by the degree of customer satisfaction. By quantifying textual reviews, companies may perform statistical analysis to extract insight from the data as well as effectively address concerns. Specifically, we analyzed a subset of 56,000 Yelp reviews on fast food restaurants and attempt to predict a quantitative value reflecting the overall opinion of each review. We compare the use of two different predictive modeling techniques, bagged Decision Trees and Random Forest Classifiers. In order to simplify the problem, we train our model to accurately classify strongly negative and strongly positive reviews (1 and 5 stars) reviews. In addition, we identify drivers behind strongly positive or negative reviews allowing businesses to understand their strengths and weaknesses. This method provides companies an efficient and cost-effective method to process and understand customer satisfaction as it is discussed on social media.
APA, Harvard, Vancouver, ISO, and other styles
7

Lundström, Love, and Oscar Öhman. "Machine Learning in credit risk : Evaluation of supervised machine learning models predicting credit risk in the financial sector." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-164101.

Full text
Abstract:
When banks lend money to another party they face a risk that the borrower will not fulfill its obligation towards the bank. This risk is called credit risk and it’s the largest risk banks faces. According to the Basel accord banks need to have a certain amount of capital requirements to protect themselves towards future financial crisis. This amount is calculated for each loan with an attached risk-weighted asset, RWA. The main parameters in RWA is probability of default and loss given default. Banks are today allowed to use their own internal models to calculate these parameters. Thus hold capital with no gained interest is a great cost, banks seek to find tools to better predict probability of default to lower the capital requirement. Machine learning and supervised algorithms such as Logistic regression, Neural network, Decision tree and Random Forest can be used to decide credit risk. By training algorithms on historical data with known results the parameter probability of default (PD) can be determined with a higher certainty degree compared to traditional models, leading to a lower capital requirement. On the given data set in this article Logistic regression seems to be the algorithm with highest accuracy of classifying customer into right category. However, it classifies a lot of people as false positive meaning the model thinks a customer will honour its obligation but in fact the customer defaults. Doing this comes with a great cost for the banks. Through implementing a cost function to minimize this error, we found that the Neural network has the lowest false positive rate and will therefore be the model that is best suited for this specific classification task.<br>När banker lånar ut pengar till en annan part uppstår en risk i att låntagaren inte uppfyller sitt antagande mot banken. Denna risk kallas för kredit risk och är den största risken en bank står inför. Enligt Basel föreskrifterna måste en bank avsätta en viss summa kapital för varje lån de ger ut för att på så sätt skydda sig emot framtida finansiella kriser. Denna summa beräknas fram utifrån varje enskilt lån med tillhörande risk-vikt, RWA. De huvudsakliga parametrarna i RWA är sannolikheten att en kund ej kan betala tillbaka lånet samt summan som banken då förlorar. Idag kan banker använda sig av interna modeller för att estimera dessa parametrar. Då bundet kapital medför stora kostnader för banker, försöker de sträva efter att hitta bättre verktyg för att uppskatta sannolikheten att en kund fallerar för att på så sätt minska deras kapitalkrav. Därför har nu banker börjat titta på möjligheten att använda sig av maskininlärningsalgoritmer för att estimera dessa parametrar. Maskininlärningsalgoritmer såsom Logistisk regression, Neurala nätverk, Beslutsträd och Random forest, kan användas för att bestämma kreditrisk. Genom att träna algoritmer på historisk data med kända resultat kan parametern, chansen att en kund ej betalar tillbaka lånet (PD), bestämmas med en högre säkerhet än traditionella metoder. På den givna datan som denna uppsats bygger på visar det sig att Logistisk regression är den algoritm med högst träffsäkerhet att klassificera en kund till rätt kategori. Däremot klassifiserar denna algoritm många kunder som falsk positiv vilket betyder att den predikterar att många kunder kommer betala tillbaka sina lån men i själva verket inte betalar tillbaka lånet. Att göra detta medför en stor kostnad för bankerna. Genom att istället utvärdera modellerna med hjälp av att införa en kostnadsfunktion för att minska detta fel finner vi att Neurala nätverk har den lägsta falsk positiv ration och kommer därmed vara den model som är bäst lämpad att utföra just denna specifika klassifierings uppgift.
APA, Harvard, Vancouver, ISO, and other styles
8

Rosales, Martínez Octavio. "Caracterización de especies en plasma frío mediante análisis de espectroscopia de emisión óptica por técnicas de Machine Learning." Tesis de maestría, Universidad Autónoma del Estado de México, 2020. http://hdl.handle.net/20.500.11799/109734.

Full text
Abstract:
La espectroscopía de emisión óptica es una técnica que permite la identificación de elementos químicos usando el espectro electromagnético que emite un plasma. Con base en la literatura. tiene aplicaciones diversas, por ejemplo: en la identificación de entes estelares, para determinar el punto final de los procesos de plasma en la fabricación de semiconductores o bien, específicamente en este trabajo, se tratan espectros para la determinación de elementos presentes en la degradación de compuestos recalcitrantes. En este documento se identifican automáticamente espectros de elementos tales como He, Ar, N, O, y Hg, en sus niveles de energía uno y dos, mediante técnicas de Machine Learning (ML). En primer lugar, se descargan las líneas de elementos reportadas en el NIST (National Institute of Standards and Technology), después se preprocesan y unifican para los siguientes procesos: a) crear un generador de 84 espectros sintéticos implementado en Python y el módulo ipywidgets de Jupyter Notebook, con las posibilidades de elegir un elemento, nivel de energía, variar la temperatura, anchura a media altura, y normalizar el especto y, b) extraer las líneas para los elementos He, Ar, N, O y Hg en el rango de los 200 nm a 890 nm, posteriormente, se les aplica sobremuestreo para realizar la búsqueda de hiperparámetros para los algoritmos: Decision Tree, Bagging, Random Forest y Extremely Randomized Trees basándose en los principios del diseño de experimentos de aleatorización, replicación, bloqueo y estratificación.
APA, Harvard, Vancouver, ISO, and other styles
9

Yan, Ping. "Anomaly Detection in Categorical Data with Interpretable Machine Learning : A random forest approach to classify imbalanced data." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158185.

Full text
Abstract:
Metadata refers to "data about data", which contains information needed to understand theprocess of data collection. In this thesis, we investigate if metadata features can be usedto detect broken data and how a tree-based interpretable machine learning algorithm canbe used for an effective classification. The goal of this thesis is two-fold. Firstly, we applya classification schema using metadata features for detecting broken data. Secondly, wegenerate the feature importance rate to understand the model’s logic and reveal the keyfactors that lead to broken data. The given task from the Swedish automotive company Veoneer is a typical problem oflearning from extremely imbalanced data set, with 97 percent of data belongs healthy dataand only 3 percent of data belongs to broken data. Furthermore, the whole data set containsonly categorical variables in nominal scales, which brings challenges to the learningalgorithm. The notion of handling imbalanced problem for continuous data is relativelywell-studied, but for categorical data, the solution is not straightforward. In this thesis, we propose a combination of tree-based supervised learning and hyperparametertuning to identify the broken data from a large data set. Our methods arecomposed of three phases: data cleaning, which is eliminating ambiguous and redundantinstances, followed by the supervised learning algorithm with random forest, lastly, weapplied a random search for hyper-parameter optimization on random forest model. Our results show empirically that tree-based ensemble method together with a randomsearch for hyper-parameter optimization have made improvement to random forest performancein terms of the area under the ROC. The model outperformed an acceptableclassification result and showed that metadata features are capable of detecting brokendata and providing an interpretable result by identifying the key features for classificationmodel.
APA, Harvard, Vancouver, ISO, and other styles
10

Stříteský, Radek. "Sémantické rozpoznávání komentářů na webu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-317212.

Full text
Abstract:
The main goal of this paper is the identification of comments on internet websites. The theoretical part is focused on artificial intelligence, mainly classifiers are described there. The practical part deals with creation of training database, which is formed by using generators of features. A generated feature might be for example a title of the HTML element where the comment is. The training database is created by input of classifiers. The result of this paper is testing classifiers in the RapidMiner program.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Decision Tree and Random Forest Classifier"

1

Machine learning Beginners Guide Algorithms: Supervised & Unsupervised learning, Decision Tree & Random Forest Introduction. CreateSpace Independent Publishing Platform, 2017.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Decision Tree and Random Forest Classifier"

1

Nandhini, P., and R. Mahaveerakannan. "An efficient treatment for cherry tree disease using the random forest classifier comparison with the decision tree algorithm." In Applications of Mathematics in Science and Technology. CRC Press, 2025. https://doi.org/10.1201/9781003606659-122.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bhagat, Meenu, and Brijesh Bakariya. "Prediction of Heart Disease Through KNN, Random Forest, and Decision Tree Classifier Using K-Fold Cross-Validation." In Artificial Intelligence and Sustainable Computing. Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-1653-3_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Xin, Cun, Dangfeng Yang, Xiaodong Liu, Yong Huang, and Xueming Qian. "Research on Dam Crack Identification Method Based on Multi-source Information Fusion." In Lecture Notes in Civil Engineering. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-9184-2_1.

Full text
Abstract:
AbstractCracks as the main safety concern of dams, high-precision identification of dam cracks is of great application value and scientific significance to ensure the safety of dams. The paper proposes a dam crack identification method based on multi-source information fusion. Specifically, image gray scale and geometric features are extracted based on the image information. And then a single crack identification model based on Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), XGBoost, and BP Neural Network are established based on the features, respectively. Finally, a multi-classifier fusion algorithm based on D-S evidence theory is established to identify the presence of cracks by fusing single identification models. Experiments are carried out to compare the proposed method with the existing identification methods based on the evaluation metrics such as accuracy, precision, F1-score, and recall. The results show that the accuracy of crack identification of the proposed method in this paper reaches 98.9%, and the crack identification results are better than the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
4

Arslantas, Mustafa Kemal, Tunc Asuroglu, Reyhan Arslantas, et al. "Using Machine Learning Methods to Predict the Lactate Trend of Sepsis Patients in the ICU." In Communications in Computer and Information Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-59091-7_1.

Full text
Abstract:
AbstractSerum lactate levels are considered a biomarker of tissue hypoxia. In sepsis or septic shock patients, as suggested by The Surviving Sepsis Campaign, early lactate clearance-directed therapy is associated with decreased mortality; thus, serum lactate levels should be assessed. Monitoring a patient’s vital parameters and repetitive blood analysis may have deleterious effects on the patient and also bring an economic burden. Machine learning and trend analysis are gaining importance to overcome these issues. In this context, we aimed to investigate if a machine learning approach can predict lactate trends from non-invasive parameters of patients with sepsis. This retrospective study analyzed adult sepsis patients in the Medical Information Mart for Intensive Care IV (MIMIC-IV) dataset. Inclusion criteria were two or more lactate tests within 6 h of diagnosis, an ICU stay of at least 24 h, and a change of ≥1 mmol/liter in lactate level. Naïve Bayes, J48 Decision Tree, Logistic Regression, Random Forest, and Logistic Model Tree (LMT) classifiers were evaluated for lactate trend prediction. LMT algorithm outperformed other classifiers (AUC = 0.803; AUPRC = 0.921). J48 decision tree performed worse than the other methods when predicting constant trend. LMT algorithm with four features (heart rate, oxygen saturation, initial lactate, and time interval variables) achieved 0.80 in terms of AUC (AUPRC = 0.921). We can say that machine learning models that employ logistic regression architectures, i.e., LMT algorithm achieved good results in lactate trend prediction tasks, and it can be effectively used to assess the state of the patient, whether it is stable or improving.
APA, Harvard, Vancouver, ISO, and other styles
5

D. M., Basavarajaiah, and Bhamidipati Narasimha Murthy. "Random Forest and Concept of Decision Tree Model." In Design of Experiments and Advanced Statistical Techniques in Clinical Research. Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-8210-3_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wu, Datong, Taotao Wu, and Xiaotong Wu. "A Differentially Private Random Decision Tree Classifier with High Utility." In Machine Learning for Cyber Security. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-62223-7_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hai, Tao, Jincheng Zhou, Oluwabukola A. Adetiloye, Shirin Abolfath Zadeh, Yanli Yin, and Celestine Iwendi. "DDoS Attack Prediction Using Decision Tree and Random Forest Algorithms." In Lecture Notes in Networks and Systems. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37164-6_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dutt, Rohit, Harish Dureja, and A. K. Madan. "Classification Models Using Decision Tree, Random Forest, and Moving Average Analysis." In New Frontiers in Nanochemistry. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429022951-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Devi, Usha, and Neera Batra. "Comparison of Decision Tree and Random Forest for Default Risk Prediction." In International Conference on Innovative Computing and Communications. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-3315-0_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Dan, Tao Hai, Doyinsola Ayandiran, et al. "Accuracy Prediction of Rainfall Using Decision Tree Algorithm and Random Forest." In Lecture Notes in Networks and Systems. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37164-6_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Decision Tree and Random Forest Classifier"

1

Mareby, Yeremia Steven Putra, and Ririn Ikana Desanti. "Exploring WeTV Application with Naïve Bayes, Decision Tree, and Random Forest Classifiers for Sentiment Analysis." In 2024 International Visualization, Informatics and Technology Conference (IVIT). IEEE, 2024. http://dx.doi.org/10.1109/ivit62102.2024.10692731.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Dodda, Ratnam, C. Raghavendra, Marpu Aashritha, Hima Varshini Macherla, and Anusha Reddy Kuntla. "A Comparative Study of Machine Learning Algorithms for Predicting Customer Churn: Analyzing Sequential, Random Forest, and Decision Tree Classifier Models." In 2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC). IEEE, 2024. http://dx.doi.org/10.1109/icesc60852.2024.10690131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jamal, Haggouni, Khalid Azzimani, Hayat Bihri, Azzouzi Salma, and Moulay El Hassan Charaf. "A Comparative Analysis of Random Forest and Decision Tree Classifiers for Predicting Type 2 Diabetes using K-Fold Cross-Validation." In 2024 6th International Symposium on Advanced Electrical and Communication Technologies (ISAECT). IEEE, 2024. https://doi.org/10.1109/isaect64333.2024.10799515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

BP, Keerthana, Aparna Siva, T. Senthilkumar, Palanisamy T, N. Prabhu, and Kartik Srinivasan. "Web Phishing Detection Using Decision Tree Random Forest and XGBoost." In 2025 International Conference on Inventive Computation Technologies (ICICT). IEEE, 2025. https://doi.org/10.1109/icict64420.2025.11005064.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pant, Jatin, Khushi, Daksh Rawat, Kunal Verma, Satvik Vats, and Vikrant Sharma. "Prediction of Breast Cancer utilizing Logistic Regression, Decision Tree, Random Forest." In 2024 1st International Conference on Advanced Computing and Emerging Technologies (ACET). IEEE, 2024. http://dx.doi.org/10.1109/acet61898.2024.10730726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Singh, Reetika, and Goonjan Jain. "Forestrank: Automatic Keyphrase Extraction Leveraging Random Forest Classifier and Multi-Criteria Decision-Making." In 2025 5th Asia Conference on Information Engineering (ACIE). IEEE, 2025. https://doi.org/10.1109/acie64499.2025.00017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Winarto, Rudy, Mauridhi Hery Purnomo, and Wiwik Anggraeni. "Evaluating GRNN, Decision Tree, and Random Forest: A Gas Turbine Emission Prediction Comparative Study." In 2024 5th International Conference on Big Data Analytics and Practices (IBDAP). IEEE, 2024. http://dx.doi.org/10.1109/ibdap62940.2024.10689706.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gu, Yixiao. "A Comparative Analysis Study of Stock Prediction Based on Random Forest and Decision Tree." In 2024 International Conference on Electronics and Devices, Computational Science (ICEDCS). IEEE, 2024. https://doi.org/10.1109/icedcs64328.2024.00022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Jiacong, Q. "Component Analysis and Classification Study of Glassware Based on Random Forest and Decision Tree Classification." In 2024 4th Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS). IEEE, 2024. http://dx.doi.org/10.1109/acctcs61748.2024.00115.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kaur, Arpanpreet, Kanwarpartap Singh Gill, Sonal Malhotra, and Swati Devliyal. "Enhancing Chronic Kidney Disease Prediction through Comparative Analysis of Decision Tree and Random Forest Algorithms." In 2024 3rd International Conference for Advancement in Technology (ICONAT). IEEE, 2024. https://doi.org/10.1109/iconat61936.2024.10774782.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Decision Tree and Random Forest Classifier"

1

Liu, Hongrui, and Rahul Ramachandra Shetty. Analytical Models for Traffic Congestion and Accident Analysis. Mineta Transportation Institute, 2021. http://dx.doi.org/10.31979/mti.2021.2102.

Full text
Abstract:
In the US, over 38,000 people die in road crashes each year, and 2.35 million are injured or disabled, according to the statistics report from the Association for Safe International Road Travel (ASIRT) in 2020. In addition, traffic congestion keeping Americans stuck on the road wastes millions of hours and billions of dollars each year. Using statistical techniques and machine learning algorithms, this research developed accurate predictive models for traffic congestion and road accidents to increase understanding of the complex causes of these challenging issues. The research used US Accidents data consisting of 49 variables describing 4.2 million accident records from February 2016 to December 2020, as well as logistic regression, tree-based techniques such as Decision Tree Classifier and Random Forest Classifier (RF), and Extreme Gradient boosting (XG-boost) to process and train the models. These models will assist people in making smart real-time transportation decisions to improve mobility and reduce accidents.
APA, Harvard, Vancouver, ISO, and other styles
2

Lasko, Kristofer, Francis O’Neill, and Elena Sava. Automated mapping of land cover type within international heterogenous landscapes using Sentinel-2 imagery with ancillary geospatial data. Engineer Research and Development Center (U.S.), 2024. http://dx.doi.org/10.21079/11681/49367.

Full text
Abstract:
A near-global framework for automated training data generation and land cover classification using shallow machine learning with low-density time series imagery does not exist. This study presents a methodology to map nine-class, six-class, and five-class land cover using two dates of a Sentinel-2 granule across seven international sites. The approach uses a series of spectral, textural, and distance decision functions combined with modified ancillary layers to create binary masks from which to generate a balanced set of training data applied to a random forest classifier. For the land cover masks, stepwise threshold adjustments were applied to reflectance, spectral index values, and Euclidean distance layers, with 62 combinations evaluated. Global and regional adaptive thresholds were computed. An annual 95th and 5th percentile NDVI composite was used to provide temporal corrections to the decision functions, and these corrections were compared against the original model. The accuracy assessment found that the regional adaptive thresholds for both the two-date land cover and the temporally corrected land cover could accurately map land cover type within nine-class, six-class, and five-class schemes. Lastly, the five-class and six-class models were compared with a manually labeled deep learning model (Esri), where they performed with similar accuracies. The results highlight performance in line with an intensive deep learning approach, and reasonably accurate models created without a full annual time series of imagery.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography