Academic literature on the topic 'Random forest classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Random forest classification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Random forest classification"

1

Zhao, Zi Ming, Cui Hua Li, Hua Shi, and Quan Zou. "Material Classification Using Random Forest." Advanced Materials Research 301-303 (July 2011): 73–79. http://dx.doi.org/10.4028/www.scientific.net/amr.301-303.73.

Full text
Abstract:
Random forest has demonstrated excellent performance to deal with many problems of computer vision, such as image classification and keypoint recognition. This paper proposes an approach to classify materials, which combines random forest with MR8 filter bank. Firstly, we employ MR8 filter bank to filter the texture image. These filter responses are taken as texture feature. Secondly, Random forest grows on sub-window patches which are randomly extracted from these filter responses, then we use this trained forest to classify a given image (under unknown viewpoint and illumination) into texture classes. We carry out experiments on Columbia-Utrecht database. The experimental results show that our method successfully solves plain texture classification problem with high computational efficiency.
APA, Harvard, Vancouver, ISO, and other styles
2

Hatwell, Julian, Mohamed Medhat Gaber, and R. Muhammad Atif Azad. "CHIRPS: Explaining random forest classification." Artificial Intelligence Review 53, no. 8 (June 4, 2020): 5747–88. http://dx.doi.org/10.1007/s10462-020-09833-6.

Full text
Abstract:
Abstract Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting.
APA, Harvard, Vancouver, ISO, and other styles
3

Paul, Angshuman, Dipti Prasad Mukherjee, Prasun Das, Abhinandan Gangopadhyay, Appa Rao Chintha, and Saurabh Kundu. "Improved Random Forest for Classification." IEEE Transactions on Image Processing 27, no. 8 (August 2018): 4012–24. http://dx.doi.org/10.1109/tip.2018.2834830.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Razooq, Mohammed M., and Md Jan Nordin. "Texture Classification Using Random Forest." Advanced Science Letters 20, no. 10 (October 1, 2014): 1918–21. http://dx.doi.org/10.1166/asl.2014.5649.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

K., Vengatesan. "A Random Forest-based Classification Method for Prediction of Car Price." International Journal of Psychosocial Rehabilitation 24, no. 3 (March 30, 2020): 2639–48. http://dx.doi.org/10.37200/ijpr/v24i3/pr2020298.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chow, Una Y. "Random forest classification of Gitksan stops." Journal of the Acoustical Society of America 148, no. 4 (October 2020): 2473. http://dx.doi.org/10.1121/1.5146850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rukmawan, S. H., F. R. Aszhari, Z. Rustam, and J. Pandelaki. "Classification of Infarction using Random Forest." Journal of Physics: Conference Series 1752, no. 1 (February 1, 2021): 012044. http://dx.doi.org/10.1088/1742-6596/1752/1/012044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Teng, Bingbing Ni, Xinyu Wu, Qingwei Gao, Qianmu Li, and Dong Sun. "On random hyper-class random forest for visual classification." Neurocomputing 172 (January 2016): 281–89. http://dx.doi.org/10.1016/j.neucom.2014.10.101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Szűcs, Gábor. "Random Response Forest for Privacy-Preserving Classification." Journal of Computational Engineering 2013 (November 14, 2013): 1–6. http://dx.doi.org/10.1155/2013/397096.

Full text
Abstract:
The paper deals with classification in privacy-preserving data mining. An algorithm, the Random Response Forest, is introduced constructing many binary decision trees, as an extension of Random Forest for privacy-preserving problems. Random Response Forest uses the Random Response idea among the anonymization methods, which instead of generalization keeps the original data, but mixes them. An anonymity metric is defined for undistinguishability of two mixed sets of data. This metric, the binary anonymity, is investigated and taken into consideration for optimal coding of the binary variables. The accuracy of Random Response Forest is presented at the end of the paper.
APA, Harvard, Vancouver, ISO, and other styles
10

Vimal, C., and B. Sathish. "Random Forest Classifier Based ECG Arrhythmia Classification." International Journal of Healthcare Information Systems and Informatics 5, no. 2 (April 2010): 1–10. http://dx.doi.org/10.4018/jhisi.2010040101.

Full text
Abstract:
Heart Rate Variability (HRV) analysis is a non-invasive tool for assessing the autonomic nervous system and for arrhythmia detection and classification. This paper presents a Random Forest classifier based diagnostic system for detecting cardiac arrhythmias using ECG data. The authors use features extracted from ECG signals using HRV analysis and DWT for classification. The experimental results indicate that a prediction accuracy of more than 98% can be obtained using the proposed method. This system can be further improved and fine-tuned for practical applications.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Random forest classification"

1

Linusson, Henrik, Robin Rudenwall, and Andreas Olausson. "Random forest och glesa datarespresentationer." Thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-16672.

Full text
Abstract:
In silico experimentation is the process of using computational and statistical models to predict medicinal properties in chemicals; as a means of reducing lab work and increasing success rate this process has become an important part of modern drug development. There are various ways of representing molecules - the problem that motivated this paper derives from collecting substructures of the chemical into what is known as fractional representations. Assembling large sets of molecules represented in this way will result in sparse data, where a large portion of the set is null values. This consumes an excessive amount of computer memory which inhibits the size of data sets that can be used when constructing predictive models.In this study, we suggest a set of criteria for evaluation of random forest implementations to be used for in silico predictive modeling on sparse data sets, with regard to computer memory usage, model construction time and predictive accuracy.A novel random forest system was implemented to meet the suggested criteria, and experiments were made to compare our implementation to existing machine learning algorithms to establish our implementation‟s correctness. Experimental results show that our random forest implementation can create accurate prediction models on sparse datasets, with lower memory usage overhead than implementations using a common matrix representation, and in less time than existing random forest implementations evaluated against. We highlight design choices made to accommodate for sparse data structures and data sets in the random forest ensemble technique, and therein present potential improvements to feature selection in sparse data sets.
Program: Systemarkitekturutbildningen
APA, Harvard, Vancouver, ISO, and other styles
2

Nelson, Marc. "Evaluating Multitemporal Sentinel-2 data for Forest Mapping using Random Forest." Thesis, Stockholms universitet, Institutionen för naturgeografi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-146657.

Full text
Abstract:
The mapping of land cover using remotely sensed data is most effective when a robust classification method is employed. Random forest is a modern machine learning algorithm that has recently gained interest in the field of remote sensing due to its non-parametric nature, which may be better suited to handle complex, high-dimensional data than conventional techniques. In this study, the random forest method is applied to remote sensing data from the European Space Agency’s new Sentinel-2 satellite program, which was launched in 2015 yet remains relatively untested in scientific literature using non-simulated data. In a study site of boreo-nemoral forest in Ekerö mulicipality, Sweden, a classification is performed for six forest classes based on CadasterENV Sweden, a multi-purpose land covermapping and change monitoring program. The performance of Sentinel-2’s Multi-SpectralImager is investigated in the context of time series to capture phenological conditions, optimal band combinations, as well as the influence of sample size and ancillary inputs.Using two images from spring and summer of 2016, an overall map accuracy of 86.0% was achieved. The red edge, short wave infrared, and visible red bands were confirmed to be of high value. Important factors contributing to the result include the timing of image acquisition, use of a feature reduction approach to decrease the correlation between spectral channels, and the addition of ancillary data that combines topographic and edaphic information. The results suggest that random forest is an effective classification technique that is particularly well suited to high-dimensional remote sensing data.
APA, Harvard, Vancouver, ISO, and other styles
3

Kindbom, Hannes. "LSTM vs Random Forest for Binary Classification of Insurance Related Text." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252748.

Full text
Abstract:
The field of natural language processing has received increased attention lately, but less focus is put on comparing models, which differ in complexity. This thesis compares Random Forest to LSTM, for the task of classifying a message as question or non-question. The comparison was done by training and optimizing the models on historic chat data from the Swedish insurance company Hedvig. Different types of word embedding were also tested, such as Word2vec and Bag of Words. The results demonstrated that LSTM achieved slightly higher scores than Random Forest, in terms of F1 and accuracy. The models’ performance were not significantly improved after optimization and it was also dependent on which corpus the models were trained on. An investigation of how a chatbot would affect Hedvig’s adoption rate was also conducted, mainly by reviewing previous studies about chatbots’ effects on user experience. The potential effects on the innovation’s five attributes, relative advantage, compatibility, complexity, trialability and observability were analyzed to answer the problem statement. The results showed that the adoption rate of Hedvig could be positively affected, by improving the first two attributes. The effects a chatbot would have on complexity, trialability and observability were however suggested to be negligible, if not negative.
Det vetenskapliga området språkteknologi har fått ökad uppmärksamhet den senaste tiden, men mindre fokus riktas på att jämföra modeller som skiljer sig i komplexitet. Den här kandidatuppsatsen jämför Random Forest med LSTM, genom att undersöka hur väl modellerna kan användas för att klassificera ett meddelande som fråga eller icke-fråga. Jämförelsen gjordes genom att träna och optimera modellerna på historisk chattdata från det svenska försäkringsbolaget Hedvig. Olika typer av word embedding, så som Word2vec och Bag of Words, testades också. Resultaten visade att LSTM uppnådde något högre F1 och accuracy än Random Forest. Modellernas prestanda förbättrades inte signifikant efter optimering och resultatet var också beroende av vilket korpus modellerna tränades på. En undersökning av hur en chattbot skulle påverka Hedvigs adoption rate genomfördes också, huvudsakligen genom att granska tidigare studier om chattbotars effekt på användarupplevelsen. De potentiella effekterna på en innovations fem attribut, relativ fördel, kompatibilitet, komplexitet, prövbarhet and observerbarhet analyserades för att kunna svara på frågeställningen. Resultaten visade att Hedvigs adoption rate kan påverkas positivt, genom att förbättra de två första attributen. Effekterna en chattbot skulle ha på komplexitet, prövbarhet och observerbarhet ansågs dock vara försumbar, om inte negativ.
APA, Harvard, Vancouver, ISO, and other styles
4

Alkazaz, Ayham, and Kharouki Marwa Saado. "Evaluation of Adaptive random forest algorithm for classification of evolving data stream." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283114.

Full text
Abstract:
In the era of big data, online machine learning algorithms have gained more and more traction from both academia and industry. In multiple scenarios decisions and predictions has to be made in near real-time as data is observed from continuously evolving data streams. Offline learning algorithms fall short in different ways when it comes to handling such problems. Apart from the costs and difficulties of storing these data streams in storage clusters and the computational difficulties associated with retraining the models each time new data is observed in order to keep the model up to date, these methods also don’t have built-in mechanisms to handle seasonality and non-stationary data streams. In such streams, the data distribution might change over time in what is called concept drift. Adaptive random forests are well studied and effective for online learning and non-stationary data streams. By using bagging and drift detection mechanisms adaptive random forests aim to improve the accuracy and performance of traditional random forests for online learning. In this study, we analyze the predictive classification accuracy of adaptive random forests when used in conjunction with different data streams and concept drifts. The data streams used to evaluate the accuracy are SEA and Agrawal. Each data stream is tested in 3 different concept drift configurations; gradual, sudden, and recur- ring. The results obtained from the performed benchmarks shows that adaptive random forests have better accuracy handling SEA than Agrawal, which could be interpreted by the dimensions and structure of the input attributes. Adaptive random forests showed no clear difference in accuracy between gradual and sudden concept drifts. However, recurring concept drifts had lower accuracy in the benchmarks than both the sudden and the gradual counterparts. This could be a result of the higher frequency of concept drifts within the same time period (number of observed samples).
I big data tiden har online-maskininlärningsalgoritmer fått mer och mer dragkraft från både akademin och industrin. I flera scenarier måste beslut och predektioner göras i nära realtid när data observeras från dataströmmar som kontinuerligt utvecklas. Offline-inlärningsalgoritmer brister på olika sätt när det gäller att hantera sådana problem. Bortsett från kostnaderna och svårigheterna med att lagra dessa dataströmmar i en lagringskluster och den beräkningsmässiga svårigheterna förknippade med att träna modellen på nytt varje gång ny data observeras för att hålla modellen uppdaterad. Dessa metoder har inte heller inbyggda mekanismer för att hantera säsongsbetonade och icke-stationära dataströmmar. I sådana strömmar kan datadistributionen förändras över tid i det som kallas konceptdrift. Anpassningsbara slumpmässiga skogar (Adaptive random forests) är väl studerade och effektiva modeller för online-inlärning och hantering av icke-stationära dataströmmar. Genom att använda mekanismer för att upptäcka konceptdrift och bagging syftar adaptiva slumpmässiga skogar att förbättra noggrannheten och prestandan hos traditionella slumpmässiga skogar för onlineinlärning. I denna studie analyserar vi den prediktiva klassificeringsnoggrannheten för adaptiva slumpmässiga skogar när de används i samband med olika dataströmmar och konceptdrift. Dataströmmarna som används för att utvärdera prestandan är SEA och Agrawal. Varje dataström testas i 3 olika konceptdriftkonfigurationer; gradvis, plötslig och återkommande. Resultaten som erhållits från de utförda experiment visar att anpassningsbara slumpmässiga skogar har bättre noggrannhet än Agrawal, vilket kan tolkas av  antal dimensioner och strukturen av inmatningsattributen. Adaptiva slumpmässiga skogar visade dock ingen tydlig skillnad i noggrannhet mellan gradvisa och plötsliga konceptdrift. Emellertid hade återkommande konceptdrift lägre noggrannhet i riktmärken än både de plötsliga och gradvisa motstycken. Detta kan vara ett resultat av den högre frekvensen av konceptdrift inom samma tidsperiod (antal observerade prover).
APA, Harvard, Vancouver, ISO, and other styles
5

Linusson, Henrik. "Multi-Output Random Forests." Thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-17167.

Full text
Abstract:
The Random Forests ensemble predictor has proven to be well-suited for solving a multitudeof different prediction problems. In this thesis, we propose an extension to the Random Forestframework that allows Random Forests to be constructed for multi-output decision problemswith arbitrary combinations of classification and regression responses, with the goal ofincreasing predictive performance for such multi-output problems. We show that our methodfor combining decision tasks within the same decision tree reduces prediction error for mosttasks compared to single-output decision trees based on the same node impurity metrics, andprovide a comparison of different methods for combining such metrics.
Program: Magisterutbildning i informatik
APA, Harvard, Vancouver, ISO, and other styles
6

Röhss, Josefine. "A Statistical Framework for Classification of Tumor Type from microRNA Data." Thesis, KTH, Matematisk statistik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191990.

Full text
Abstract:
Hepatocellular carcinoma (HCC) is a type of liver cancer with low survival rate, not least due to the difficulty of diagnosing it in an early stage. The objective of this thesis is to build a random forest classification method based on microRNA (and messenger RNA) expression profiles from patients with HCC. The main purpose is to be able to distinguish between tumor samples and normal samples by measuring the miRNA expression. If successful, this method can be used to detect HCC at an earlier stage and to design new therapeutics. The microRNAs and messenger RNAs which have a significant difference in expression between tumor samples and normal samples are selected for building random forest classification models. These models are then tested on paired samples of tumor and surrounding normal tissue from patients with HCC. The results show that the classification models built for classifying tumor and normal samples have high prediction accuracy and hence show high potential for using microRNA and messenger RNA expression levels for diagnosis of HCC.
Hepatocellulär cancer (HCC) är en typ av levercancer med mycket låg överlevnadsgrad, inte minst på grund av svårigheten att diagnosticera i ett tidigt skede. Syftet med det här projektet är att bygga en klassificeringsmodell med random forest, baserad på uttrycksprofiler av mikroRNA (och budbärar-RNA) från patienter med HCC. Målet är att kunna skilja mellan tumörprover och normala prover genom att mäta uttrycket av mikroRNA. Om detta mål uppnås kan metoden användas för att upptäcka HCC i ett tidigare skede och för att utveckla nya läkemedel. De mikroRNA och budbärar-RNA som har en signifikant skillnad i uttryck mellan prover från tumörvävnad och intilliggande normal vävnad väljs ut för att bygga klassificaringsmodeller med random forest. Dessa modeller testas sedan på parade prover av tumörvävnad och intilliggande vävnad från patienter med HCC. Resultaten visar att modeller som byggs med denna metod kan klassificera tumörprover och normala prover med hög noggrannhet. Det finns således stor potential för att använda uttrycksprofiler från mikroRNA och budbärar-RNA för att diagnosticera HCC.
APA, Harvard, Vancouver, ISO, and other styles
7

Ringqvist, Sanna. "Classification of terrain using superpixel segmentation and supervised learning." Thesis, Linköpings universitet, Datorseende, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-112511.

Full text
Abstract:
The usage of 3D-modeling is expanding rapidly. Modeling from aerial imagery has become very popular due to its increasing number of both civilian and mili- tary applications like urban planning, navigation and target acquisition. This master thesis project was carried out at Vricon Systems at SAAB. The Vricon system produces high resolution geospatial 3D data based on aerial imagery from manned aircrafts, unmanned aerial vehicles (UAV) and satellites. The aim of this work was to investigate to what degree superpixel segmentation and supervised learning can be applied to a terrain classification problem using imagery and digital surface models (dsm). The aim was also to investigate how the height information from the digital surface model may contribute compared to the information from the grayscale values. The goal was to identify buildings, trees and ground. Another task was to evaluate existing methods, and compare results. The approach for solving the stated goal was divided into several parts. The first part was to segment the image using superpixel segmentation, after that features were extracted. Then the classifiers were created and trained and finally the classifiers were evaluated. The classification method that obtained the best results in this thesis had approx- imately 90 % correctly labeled superpixels. The result was equal, if not better, compared to other solutions available on the market.
APA, Harvard, Vancouver, ISO, and other styles
8

Wålinder, Andreas. "Evaluation of logistic regression and random forest classification based on prediction accuracy and metadata analysis." Thesis, Linnéuniversitetet, Institutionen för matematik (MA), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-35126.

Full text
Abstract:
Model selection is an important part of classification. In this thesis we study the two classification models logistic regression and random forest. They are compared and evaluated based on prediction accuracy and metadata analysis. The models were trained on 25 diverse datasets. We calculated the prediction accuracy of both models using RapidMiner. We also collected metadata for the datasets concerning number of observations, number of predictor variables and number of classes in the response variable.     There is a correlation between performance of logistic regression and random forest with significant correlation of 0.60 and confidence interval [0.29 0.79]. The models appear to perform similarly across the datasets with performance more influenced by choice of dataset rather than model selection.     Random forest with an average prediction accuracy of 81.66% performed better on these datasets than logistic regression with an average prediction accuracy of 73.07%. The difference is however not statistically significant with a p-value of 0.088 for Student's t-test.     Multiple linear regression analysis reveals none of the analysed metadata have a significant linear relationship with logistic regression performance. The regression of logistic regression performance on metadata has a p-value of 0.66. We get similar results with random forest performance. The regression of random forest performance on metadata has a p-value of 0.89. None of the analysed metadata have a significant linear relationship with random forest performance.     We conclude that the prediction accuracies of logistic regression and random forest are correlated. Random forest performed slightly better on the studied datasets but the difference is not statistically significant. The studied metadata does not appear to have a significant effect on prediction accuracy of either model.
APA, Harvard, Vancouver, ISO, and other styles
9

Pettersson, Anders. "High-Dimensional Classification Models with Applications to Email Targeting." Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-168203.

Full text
Abstract:
Email communication is valuable for any modern company, since it offers an easy mean for spreading important information or advertising new products, features or offers and much more. To be able to identify which customers that would be interested in certain information would make it possible to significantly improve a company's email communication and as such avoiding that customers start ignoring messages and creating unnecessary badwill. This thesis focuses on trying to target customers by applying statistical learning methods to historical data provided by the music streaming company Spotify. An important aspect was the high-dimensionality of the data, creating certain demands on the applied methods. A binary classification model was created, where the target was whether a customer will open the email or not. Two approaches were used for trying to target the costumers, logistic regression, both with and without regularization, and random forest classifier, for their ability to handle the high-dimensionality of the data. Performance accuracy of the suggested models were then evaluated on both a training set and a test set using statistical validation methods, such as cross-validation, ROC curves and lift charts. The models were studied under both large-sample and high-dimensional scenarios. The high-dimensional scenario represents when the number of observations, N, is of the same order as the number of features, p and the large sample scenario represents when N ≫ p. Lasso-based variable selection was performed for both these scenarios, to study the informative value of the features. This study demonstrates that it is possible to greatly improve the opening rate of emails by targeting users, even in the high dimensional scenario. The results show that increasing the amount of training data over a thousand fold will only improve the performance marginally. Rather efficient customer targeting can be achieved by using a few highly informative variables selected by the Lasso regularization.
Företag kan använda e-mejl för att på ett enkelt sätt sprida viktig information, göra reklam för nya produkter eller erbjudanden och mycket mer, men för många e-mejl kan göra att kunder slutar intressera sig för innehållet, genererar badwill och omöjliggöra framtida kommunikation. Att kunna urskilja vilka kunder som är intresserade av det specifika innehållet skulle vara en möjlighet att signifikant förbättra ett företags användning av e-mejl som kommunikationskanal. Denna studie fokuserar på att urskilja kunder med hjälp av statistisk inlärning applicerad på historisk data tillhandahållen av musikstreaming-företaget Spotify. En binärklassificeringsmodell valdes, där responsvariabeln beskrev huruvida kunden öppnade e-mejlet eller inte. Två olika metoder användes för att försöka identifiera de kunder som troligtvis skulle öppna e-mejlen, logistisk regression, både med och utan regularisering, samt random forest klassificerare, tack vare deras förmåga att hantera högdimensionella data. Metoderna blev sedan utvärderade på både ett träningsset och ett testset, med hjälp av flera olika statistiska valideringsmetoder så som korsvalidering och ROC kurvor. Modellerna studerades under både scenarios med stora stickprov och högdimensionella data. Där scenarion med högdimensionella data representeras av att antalet observationer, N, är av liknande storlek som antalet förklarande variabler, p, och scenarion med stora stickprov representeras av att N ≫ p. Lasso-baserad variabelselektion utfördes för båda dessa scenarion för att studera informationsvärdet av förklaringsvariablerna. Denna studie visar att det är möjligt att signifikant förbättra öppningsfrekvensen av e-mejl genom att selektera kunder, även när man endast använder små mängder av data. Resultaten visar att en enorm ökning i antalet träningsobservationer endast kommer förbättra modellernas förmåga att urskilja kunder marginellt.
APA, Harvard, Vancouver, ISO, and other styles
10

Halmann, Marju. "Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710.

Full text
Abstract:
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Random forest classification"

1

Suthaharan, Shan. "Random Forest Learning." In Machine Learning Models and Algorithms for Big Data Classification, 273–88. Boston, MA: Springer US, 2016. http://dx.doi.org/10.1007/978-1-4899-7641-3_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gondane, Rajhans, and V. Susheela Devi. "Classification Using Rough Random Forest." In Mining Intelligence and Knowledge Exploration, 70–80. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-26832-3_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Falkner, Andreas, Gottfried Schenner, and Alexander Schörghuber. "Tailoring Random Forest for Requirements Classification." In Lecture Notes in Computer Science, 405–12. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-59491-6_38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kumar, Arvind, and Nishant Sinha. "Classification of Forest Cover Type Using Random Forests Algorithm." In Advances in Data and Information Sciences, 395–402. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-0694-9_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bouaziz, Ameni, Christel Dartigues-Pallez, Célia da Costa Pereira, Frédéric Precioso, and Patrick Lloret. "Short Text Classification Using Semantic Random Forest." In Data Warehousing and Knowledge Discovery, 288–99. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10160-6_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mourya, Diwaker, and Ashutosh Bhatt. "Classification of Hyperspectral Imagery Using Random Forest." In Communications in Computer and Information Science, 66–74. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8657-1_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Shahhosseini, Mohsen, and Guiping Hu. "Improved Weighted Random Forest for Classification Problems." In Advances in Intelligent Systems and Computing, 42–56. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-66501-2_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Adhikary, Sunit Kumar, and Sourish Gunesh Dhekane. "Hyperspectral Image Classification Using Semi-supervised Random Forest." In Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB), 1067–75. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-00665-5_102.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Hongmin, Guoqi Li, and Luping Shi. "Classification of Spatiotemporal Events Based on Random Forest." In Advances in Brain Inspired Cognitive Systems, 138–48. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-49685-6_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Upadhyay, Anand, Umesh Palival, and Sumit Jaiswal. "Early Brain Tumor Detection Using Random Forest Classification." In Advances in Intelligent Systems and Computing, 258–64. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-49339-4_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Random forest classification"

1

Gondane, Rajhans, and V. Susheela Devi. "Classification Using Probabilistic Random Forest." In 2015 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2015. http://dx.doi.org/10.1109/ssci.2015.35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Honghai. "Pattern Classification with Random Decision Forest." In 2012 International Conference on Industrial Control and Electronics Engineering (ICICEE). IEEE, 2012. http://dx.doi.org/10.1109/icicee.2012.42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Weedon, Martyn, Dimitris Tsaptsinos, and James Denholm-Price. "Random forest explorations for URL classification." In 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). IEEE, 2017. http://dx.doi.org/10.1109/cybersa.2017.8073403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kouzani, A. Z., S. Nahavandi, and K. Khoshmanesh. "Face classification by a random forest." In TENCON 2007 - 2007 IEEE Region 10 Conference. IEEE, 2007. http://dx.doi.org/10.1109/tencon.2007.4428937.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Monno, Laura, Roberto Bellotti, Piero Calvini, Roberta Monge, Giovanni B. Frisoni, and Michela Pievani. "Hippocampal segmentation by Random Forest classification." In 2011 IEEE International Symposium on Medical Measurements and Applications (MeMeA). IEEE, 2011. http://dx.doi.org/10.1109/memea.2011.5966763.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Feng, Wenxian, Chenkai Ma, Guozhang Zhao, and Rui Zhang. "FSRF:An Improved Random Forest for Classification." In 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA). IEEE, 2020. http://dx.doi.org/10.1109/aeeca49918.2020.9213456.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Liu, Bozhi, and Guoping Qiu. "Illuminant classification based on random forest." In 2015 14th IAPR International Conference on Machine Vision Applications (MVA). IEEE, 2015. http://dx.doi.org/10.1109/mva.2015.7153144.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Vijayakumari, B., and M. Manikumaran. "Pathological lung classification using random forest classifier." In 2017 International Conference on Intelligent Computing and Control (I2C2). IEEE, 2017. http://dx.doi.org/10.1109/i2c2.2017.8321922.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zawbaa, Hossam M., Maryam Hazman, Mona Abbass, and Aboul Ella Hassanien. "Automatic fruit classification using random forest algorithm." In 2014 14th International Conference on Hybrid Intelligent Systems (HIS). IEEE, 2014. http://dx.doi.org/10.1109/his.2014.7086191.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Alam, Mohammed S., and Son T. Vuong. "Random Forest Classification for Detecting Android Malware." In 2013 IEEE International Conference on Green Computing and Communications (GreenCom) and IEEE Internet of Things(iThings) and IEEE Cyber, Physical and Social Computing(CPSCom). IEEE, 2013. http://dx.doi.org/10.1109/greencom-ithings-cpscom.2013.122.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Random forest classification"

1

Idakwo, Gabriel, Sundar Thangapandian, Joseph Luttrell, Zhaoxian Zhou, Chaoyang Zhang, and Ping Gong. Deep learning-based structure-activity relationship modeling for multi-category toxicity classification : a case study of 10K Tox21 chemicals with high-throughput cell-based androgen receptor bioassay data. Engineer Research and Development Center (U.S.), July 2021. http://dx.doi.org/10.21079/11681/41302.

Full text
Abstract:
Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for in silico predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p < 0.001, ANOVA) by 22–27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography