Dissertations / Theses on the topic 'Random Forests'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Random Forests.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gómez, Silvio Normey. "Random forests estocástico." Pontifícia Universidade Católica do Rio Grande do Sul, 2012. http://hdl.handle.net/10923/1598.
Full textIn the Data Mining area experiments have been carried out using Ensemble Classifiers. We experimented Random Forests to evaluate the performance when randomness is applied. The results of this experiment showed us that the impact of randomness is much more relevant in Random Forests when compared with other algorithms, e. g., Bagging and Boosting. The main purpose of this work is to decrease the effect of randomness in Random Forests. To achieve the main purpose we implemented an extension of this method named Stochastic Random Forests and specified the strategy to increase the performance and stability combining the results. At the end of this work the improvements achieved are presented.
Na área de Mineração de Dados, experimentos vem sendo realizados utilizando Conjuntos de Classificadores. Estes experimentos são baseados em comparações empíricas que sofrem com a falta de cuidados no que diz respeito à questões de aleatoriedade destes métodos. Experimentamos o Random Forests para avaliar a eficiência do algoritmo quando submetido a estas questões. Estudos sobre os resultados mostram que a sensibilidade do Random Forests é significativamente maior quando comparado com a de outros métodos encontrados na literatura, como Bagging e Boosting. O proposito desta dissertação é diminuir a sensibilidade do Random Forests quando submetido a aleatoriedade. Para alcançar este objetivo, implementamos uma extensão do método, que chamamos de Random Forests Estocástico. Logo especificamos como podem ser alcançadas melhorias no problema encontrado no algoritmo combinando seus resultados. Por último, um estudo é apresentado mostrando as melhorias atingidas no problema de sensibilidade.
Abdulsalam, Hanady. "Streaming Random Forests." Thesis, Kingston, Ont. : [s.n.], 2008. http://hdl.handle.net/1974/1321.
Full textLinusson, Henrik. "Multi-Output Random Forests." Thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-17167.
Full textProgram: Magisterutbildning i informatik
G?mez, Silvio Normey. "Random forests estoc?stico." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2012. http://tede2.pucrs.br/tede2/handle/tede/5226.
Full textIn the Data Mining area experiments have been carried out using Ensemble Classifiers. We experimented Random Forests to evaluate the performance when randomness is applied. The results of this experiment showed us that the impact of randomness is much more relevant in Random Forests when compared with other algorithms, e.g., Bagging and Boosting. The main purpose of this work is to decrease the effect of randomness in Random Forests. To achieve the main purpose we implemented an extension of this method named Stochastic Random Forests and specified the strategy to increase the performance and stability combining the results. At the end of this work the improvements achieved are presented
Na ?rea de Minera??o de Dados, experimentos vem sendo realizados utilizando Conjuntos de Classificadores. Estes experimentos s?o baseados em compara??es emp?ricas que sofrem com a falta de cuidados no que diz respeito ? quest?es de aleatoriedade destes m?todos. Experimentamos o Random Forests para avaliar a efici?ncia do algoritmo quando submetido a estas quest?es. Estudos sobre os resultados mostram que a sensibilidade do Random Forests ? significativamente maior quando comparado com a de outros m?todos encontrados na literatura, como Bagging e Boosting. O proposito desta disserta??o ? diminuir a sensibilidade do Random Forests quando submetido a aleatoriedade. Para alcan?ar este objetivo, implementamos uma extens?o do m?todo, que chamamos de Random Forests Estoc?stico. Logo especificamos como podem ser alcan?adas melhorias no problema encontrado no algoritmo combinando seus resultados. Por ?ltimo, um estudo ? apresentado mostrando as melhorias atingidas no problema de sensibilidade
Lapajne, Mikael Hellborg, and Daniel Slat. "Random Forests for CUDA GPUs." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2953.
Full textMikael: +46768539263, Daniel: +46703040693
Diyar, Jamal. "Post-Pruning of Random Forests." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-15904.
Full textSammanfattning Kontext. Ensemble metoder fortsätter att få mer uppmärksamhet inom maskininlärning. Då maskininlärningstekniker som genererar en enskild klassificerare eller prediktor har visat tecken på begränsad kapacitet i vissa sammanhang, har ensemble metoder vuxit fram som alternativa metoder för att åstadkomma bättre prediktiva prestanda. En av de mest intressanta och effektiva ensemble algoritmerna som har introducerats under de senaste åren är Random Forests. För att säkerställa att Random Forests uppnår en hög prediktiv noggrannhet behöver oftast ett stort antal träd användas. Resultatet av att använda ett större antal träd för att öka den prediktiva noggrannheten är en komplex modell som kan vara svår att tolka eller analysera. Problemet med det stora antalet träd ställer dessutom högre krav på såväl lagringsutrymmet som datorkraften. Syfte. Denna uppsats utforskar möjligheten att automatiskt förenkla modeller som är genererade av Random Forests i syfte att reducera storleken på modellen, öka dess tolkningsbarhet, samt bevara eller förbättra den prediktiva noggrannheten. Syftet med denna uppsats är tvåfaldigt. Vi kommer först att jämföra och empiriskt utvärdera olika beskärningstekniker. Den andra delen av uppsatsen undersöker sambandet mellan den prediktiva noggrannheten och modellens tolkningsbarhet. Metod. Den primära forskningsmetoden som har använts för att genomföra den studien är experiment. Alla beskärningstekniker är implementerade i Python. För att träna, utvärdera, samt validera de olika modellerna, har fem olika datamängder använts. Resultat. Det finns inte någon signifikant skillnad i det prediktiva prestanda mellan de jämförda teknikerna och ingen av de undersökta beskärningsteknikerna är överlägsen på alla plan. Resultat från experimenten har också visat att sambandet mellan tolkningsbarhet och noggrannhet är proportionellt, i alla fall för de studerade konfigurationerna. Det vill säga, en positiv förändring i modellens tolkningsbarhet åtföljs av en negativ förändring i modellens noggrannhet. Slutsats. Det är möjligt att reducera storleken på en komplex Random Forests modell samt bibehålla eller förbättra den prediktiva noggrannheten. Dessutom beror valet av beskärningstekniken på användningsområdet och mängden träningsdata tillgänglig. Slutligen kan modeller som är signifikant förenklade vara mindre noggranna men å andra sidan tenderar de att uppfattas som mer förståeliga.
Xiong, Kuangnan. "Roughened Random Forests for Binary Classification." Thesis, State University of New York at Albany, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3624962.
Full textBinary classification plays an important role in many decision-making processes. Random forests can build a strong ensemble classifier by combining weaker classification trees that are de-correlated. The strength and correlation among individual classification trees are the key factors that contribute to the ensemble performance of random forests. We propose roughened random forests, a new set of tools which show further improvement over random forests in binary classification. Roughened random forests modify the original dataset for each classification tree and further reduce the correlation among individual classification trees. This data modification process is composed of artificially imposing missing data that are missing completely at random and subsequent missing data imputation.
Through this dissertation we aim to answer a few important questions in building roughened random forests: (1) What is the ideal rate of missing data to impose on the original dataset? (2) Should we impose missing data on both the training and testing datasets, or only on the training dataset? (3) What are the best missing data imputation methods to use in roughened random forests? (4) Do roughened random forests share the same ideal number of covariates selected at each tree node as the original random forests? (5) Can roughened random forests be used in medium- to high- dimensional datasets?
Strobl, Carolin, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis. "Conditional Variable Importance for Random Forests." BioMed Central Ltd, 2008. http://dx.doi.org/10.1186/1471-2105-9-307.
Full textSorice, Domenico <1995>. "Random forests in time series analysis." Master's Degree Thesis, Università Ca' Foscari Venezia, 2020. http://hdl.handle.net/10579/17482.
Full textHapfelmeier, Alexander. "Analysis of missing data with random forests." Diss., lmu, 2012. http://nbn-resolving.de/urn:nbn:de:bvb:19-150588.
Full textWonkye, Yaa Tawiah. "Innovations of random forests for longitudinal data." Bowling Green State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1563054152739397.
Full textAuret, Lidia. "Process monitoring and fault diagnosis using random forests." Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/5360.
Full textDissertation presented for the Degree of DOCTOR OF PHILOSOPHY (Extractive Metallurgical Engineering) in the Department of Process Engineering at the University of Stellenbosch
ENGLISH ABSTRACT: Fault diagnosis is an important component of process monitoring, relevant in the greater context of developing safer, cleaner and more cost efficient processes. Data-driven unsupervised (or feature extractive) approaches to fault diagnosis exploit the many measurements available on modern plants. Certain current unsupervised approaches are hampered by their linearity assumptions, motivating the investigation of nonlinear methods. The diversity of data structures also motivates the investigation of novel feature extraction methodologies in process monitoring. Random forests are recently proposed statistical inference tools, deriving their predictive accuracy from the nonlinear nature of their constituent decision tree members and the power of ensembles. Random forest committees provide more than just predictions; model information on data proximities can be exploited to provide random forest features. Variable importance measures show which variables are closely associated with a chosen response variable, while partial dependencies indicate the relation of important variables to said response variable. The purpose of this study was therefore to investigate the feasibility of a new unsupervised method based on random forests as a potentially viable contender in the process monitoring statistical tool family. The hypothesis investigated was that unsupervised process monitoring and fault diagnosis can be improved by using features extracted from data with random forests, with further interpretation of fault conditions aided by random forest tools. The experimental results presented in this work support this hypothesis. An initial study was performed to assess the quality of random forest features. Random forest features were shown to be generally difficult to interpret in terms of geometry present in the original variable space. Random forest mapping and demapping models were shown to be very accurate on training data, and to extrapolate weakly to unseen data that do not fall within regions populated by training data. Random forest feature extraction was applied to unsupervised fault diagnosis for process data, and compared to linear and nonlinear methods. Random forest results were comparable to existing techniques, with the majority of random forest detections due to variable reconstruction errors. Further investigation revealed that the residual detection success of random forests originates from the constrained responses and poor generalization artifacts of decision trees. Random forest variable importance measures and partial dependencies were incorporated in a visualization tool to allow for the interpretation of fault conditions. A dynamic change point detection application with random forests proved more successful than an existing principal component analysis-based approach, with the success of the random forest method again residing in reconstruction errors. The addition of random forest fault diagnosis and change point detection algorithms to a suite of abnormal event detection techniques is recommended. The distance-to-model diagnostic based on random forest mapping and demapping proved successful in this work, and the theoretical understanding gained supports the application of this method to further data sets.
AFRIKAANSE OPSOMMING: Foutdiagnose is ’n belangrike komponent van prosesmonitering, en is relevant binne die groter konteks van die ontwikkeling van veiliger, skoner en meer koste-effektiewe prosesse. Data-gedrewe toesigvrye of kenmerkekstraksie-benaderings tot foutdiagnose benut die vele metings wat op moderne prosesaanlegte beskikbaar is. Party van die huidige toesigvrye benaderings word deur aannames rakende liniariteit belemmer, wat as motivering dien om nie-liniêre metodes te ondersoek. Die diversiteit van datastrukture is ook verdere motivering vir ondersoek na nuwe kenmerkekstraksiemetodes in prosesmonitering. Lukrake-woude is ’n nuwe statistiese inferensie-tegniek, waarvan die akkuraatheid toegeskryf kan word aan die nie-liniêre aard van besluitnemingsboomlede en die bekwaamheid van ensembles. Lukrake-woudkomitees verskaf meer as net voorspellings; modelinligting oor datapuntnabyheid kan benut word om lukrakewoudkenmerke te verskaf. Metingbelangrikheidsaanduiers wys watter metings in ’n noue verhouding met ’n gekose uitsetveranderlike verkeer, terwyl parsiële afhanklikhede aandui wat die verhouding van ’n belangrike meting tot die gekose uitsetveranderlike is. Die doel van hierdie studie was dus om die uitvoerbaarheid van ’n nuwe toesigvrye metode vir prosesmonitering gebaseer op lukrake-woude te ondersoek. Die ondersoekte hipotese lui: toesigvrye prosesmonitering en foutdiagnose kan verbeter word deur kenmerke te gebruik wat met lukrake-woude geëkstraheer is, waar die verdere interpretasie van foutkondisies deur addisionele lukrake-woude-tegnieke bygestaan word. Eksperimentele resultate wat in hierdie werkstuk voorgelê is, ondersteun hierdie hipotese. ’n Intreestudie is gedoen om die gehalte van lukrake-woudkenmerke te assesseer. Daar is bevind dat dit moeilik is om lukrake-woudkenmerke in terme van die geometrie van die oorspronklike metingspasie te interpreteer. Verder is daar bevind dat lukrake-woudkartering en -dekartering baie akkuraat is vir opleidingsdata, maar dat dit swak ekstrapolasie-eienskappe toon vir ongesiene data wat in gebiede buite dié van die opleidingsdata val. Lukrake-woudkenmerkekstraksie is in toesigvrye-foutdiagnose vir gestadigde-toestandprosesse toegepas, en is met liniêre en nie-liniêre metodes vergelyk. Resultate met lukrake-woude is vergelykbaar met dié van bestaande metodes, en die meerderheid lukrake-woudopsporings is aan metingrekonstruksiefoute toe te skryf. Verdere ondersoek het getoon dat die sukses van res-opsporing op die beperkte uitsetwaardes en swak veralgemenende eienskappe van besluitnemingsbome berus. Lukrake-woude-metingbelangrikheidsaanduiers en parsiële afhanklikhede is ingelyf in ’n visualiseringstegniek wat vir die interpretasie van foutkondisies voorsiening maak. ’n Dinamiese aanwending van veranderingspuntopsporing met lukrake-woude is as meer suksesvol bewys as ’n bestaande metode gebaseer op hoofkomponentanalise. Die sukses van die lukrake-woudmetode is weereens aan rekonstruksie-reswaardes toe te skryf. ’n Voorstel wat na aanleiding van hierde studie gemaak is, is dat die lukrake-woudveranderingspunt- en foutopsporingsmetodes by ’n soortgelyke stel metodes gevoeg kan word. Daar is in hierdie werk bevind dat die afstand-vanaf-modeldiagnostiek gebaseer op lukrake-woudkartering en -dekartering suksesvol is vir foutopsporing. Die teoretiese begrippe wat ontsluier is, ondersteun die toepassing van hierdie metodes op verdere datastelle.
Fawagreh, Khaled. "On pruning and feature engineering in Random Forests." Thesis, Robert Gordon University, 2016. http://hdl.handle.net/10059/2113.
Full textMerrill, Andrew C. "Investigations of Variable Importance Measures Within Random Forests." DigitalCommons@USU, 2009. https://digitalcommons.usu.edu/etd/7078.
Full textQuach, Anna. "Extensions and Improvements to Random Forests for Classification." DigitalCommons@USU, 2017. https://digitalcommons.usu.edu/etd/6755.
Full textParfionovas, Andrejus. "Enhancement of Random Forests Using Trees with Oblique Splits." DigitalCommons@USU, 2013. http://digitalcommons.usu.edu/etd/1508.
Full textTang, Ying. "Real-time automatic face tracking using adaptive random forests." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=95172.
Full textLa localisation est traitée comme étant un problème de classification binaire à base de pixels dans cette thèse. Un ensemble de fort classificateur, obtenu à l'aide d'une combinaison pesée de plusieurs forêts (faibles classificateurs) aléatoires, est entraîné sur des vecteurs figurant des pixels. Le classificateur fort est ensuite utilisé pour classifier les pixels appartenant à la face ou au fond dans la prochaine image. Les marges de classifications sont utilisées pour créer une carte de confiance dont le sommet indique où est la nouvelle face. Le sommet est localisé par Camshift qui ajuste la grandeur de la face à localiser. Les forêts aléatoires dans l'ensemble sont mises à jours avec AdaBoost en entraînant des nouvelles forêts aléatoires pour remplacer certaines vieilles forêts pour s'adapter aux changements entre deux images. La précision de localisation est surveillée par une variable appelée note de classification. Si la note détecte une anomalie, le système arrêtera la localisation et redémarrera en réinitialisant en utilisant un détecteur de face Viola-Jones. Le localisateur est testé sur plusieurs séquences et s'est prouvé d'une performance robuste dans divers scénarios et illumination. Le localisateur peut agir bien à travers plusieurs changement complexes de la face, une courte période d'occlusion et la perte de la localisation.
Michaelson, Jacob. "Applications and extensions of Random Forests in genetic and environmental studies." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-64099.
Full textSandsveden, Daniel. "Evaluation of Random Forests for Detection and Localization of Cattle Eyes." Thesis, Linköpings universitet, Datorseende, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-121540.
Full textReiter, Richard M. "Prediction of recurrence in thin melanoma using trees and random forests /." Electronic version (PDF), 2005. http://dl.uncw.edu/etd/2005/reiterr/richardreiter.html.
Full textHansson, Kim, and Erik Hörlin. "Active learning via Transduction in Regression Forests." Thesis, Blekinge Tekniska Högskola, Institutionen för kreativa teknologier, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-10935.
Full textHapfelmeier, Alexander [Verfasser], and Kurt [Akademischer Betreuer] Ulm. "Analysis of missing data with random forests / Alexander Hapfelmeier. Betreuer: Kurt Ulm." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2012. http://d-nb.info/102904032X/34.
Full textMatheson, David. "An empirical study of practical, theoretical and online variants of random forests." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/46586.
Full textAdriansson, Nils, and Ingrid Mattsson. "Forecasting GDP Growth, or How Can Random Forests Improve Predictions in Economics?" Thesis, Uppsala universitet, Statistiska institutionen, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-243028.
Full textMohammed, D. Y. "Overlapped speech and music segmentation using singular spectrum analysis and random forests." Thesis, University of Salford, 2017. http://usir.salford.ac.uk/43773/.
Full textSamarakoon, Prasad. "Random Regression Forests for Fully Automatic Multi-Organ Localization in CT Images." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM039/document.
Full textLocating an organ in a medical image by bounding that particular organ with respect to an entity such as a bounding box or sphere is termed organ localization. Multi-organ localization takes place when multiple organs are localized simultaneously. Organ localization is one of the most crucial steps that is involved in all the phases of patient treatment starting from the diagnosis phase to the final follow-up phase. The use of the supervised machine learning technique called random forests has shown very encouraging results in many sub-disciplines of medical image analysis. Similarly, Random Regression Forests (RRF), a specialization of random forests for regression, have produced the state of the art results for fully automatic multi-organ localization.Although, RRF have produced state of the art results in multi-organ segmentation, the relative novelty of the method in this field still raises numerous questions about how to optimize its parameters for consistent and efficient usage. The first objective of this thesis is to acquire a thorough knowledge of the inner workings of RRF. After achieving the above mentioned goal, we proposed a consistent and automatic parametrization of RRF. Then, we empirically proved the spatial indenpendency hypothesis used by RRF. Finally, we proposed a novel RRF specialization called Light Random Regression Forests for multi-organ localization
Stum, Alexander Knell. "Random Forests Applied as a Soil Spatial Predictive Model in Arid Utah." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/736.
Full textedu, rdlyons@indiana. "Markov Chain Intersections and the Loop--Erased Walk." ESI preprints, 2001. ftp://ftp.esi.ac.at/pub/Preprints/esi1058.ps.
Full textHudec, Vladimír. "Klasifikační metody pro data z mikročipů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236982.
Full textLi, Ke. "Customer Relationship Management: from Conversion to Churn to Winback." Diss., Temple University Libraries, 2013. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/221333.
Full textPh.D.
With the grant of a big CRM dataset from a large media company, this dissertation examines four different categories of factors that could impact three stages of customer relationship management, namely customer acquisition, retention, and winback of lost customers. Specifically, with the aid of machine learning method of random forests and text mining technique, this study identify among the factors of customer heterogeneity (e.g. in usage of self-care service channels, duration of service, responsiveness to marketing actions), firm's marketing initiatives (e.g. the volume of the marketing communications, the depth of the promotion, the different communication channels they use, and the marketing penetration in different geographical areas), customer self-reported deactivation reasons, as well as the call centers notes in text form, which factors play bigger roles than others during each of the three stages of CRM. Furthermore, the authors also examine how these factors evolve throughout these three stages of CRM in terms of their effects on shaping customers' decision making of whether to convert to paid customer, to churn, or to reactivate their service with the company. The findings help managers better allocate their resources in the processes of acquiring, retaining and winning back customers.
Temple University--Theses
Hjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.
Full textRandom Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.
Persson, Karl. "Predicting movie ratings : A comparative study on random forests and support vector machines." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11119.
Full textPauly, Olivier Verfasser], Nassir [Akademischer Betreuer] [Navab, and Nicholas [Akademischer Betreuer] Ayache. "Random Forests for Medical Applications / Olivier Pauly. Gutachter: Nicholas Ayache. Betreuer: Nassir Navab." München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1030099510/34.
Full textKimes, Ryan Vincent. "Quantifying the Effects of Correlated Covariates on Variable Importance Estimates from Random Forests." VCU Scholars Compass, 2006. http://scholarscompass.vcu.edu/etd/1433.
Full textVaratharajah, Thujeepan, and Eriksson Victor. "A comparative study on artificial neural networks and random forests for stock market prediction." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186452.
Full textDenna studie undersöker hur väl två olika modeller inom maskininlärning (ML) kan förutspå aktiemarknaden och jämför sedan resultaten av dessa. De valda modellerna baseras på artificiella neurala nätverk (ANN) samt random forests (RF). Modellerna tränas upp med två separata datamängder och prognoserna sker på nästföljande dags stängningskurs. Indatan för modellerna består av 6 olika finansiella nyckeltal som är baserade på stängningskursen för de senaste 5, 10 och 20 dagarna. Prestandan utvärderas genom att analysera och jämföra värden som root mean squared error (RMSE) samt mean average percentage error (MAPE) för testperioden. Även specifika trender i delmängder av testperioden undersöks för att utvärdera följdriktigheten av modellerna. Resultaten visade att ANN-modellen presterade bättre än RF-modellen då den sett över hela testperioden visade mindre fel jämfört med de faktiska värdena och gjorde därmed mer träffsäkra prognoser.
Petersson, Andreas. "Data mining file sharing metadata : A comparison between Random Forests Classificiation and Bayesian Networks." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11180.
Full textPetersson, Andreas. "Data mining file sharing metadata : A comparison between Random Forests Classification and Bayesian Networks." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11285.
Full textKanavati, Fahdi. "Efficient extraction of semantic information from medical images in large datasets using random forests." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/58017.
Full textPasquale, Daniel L. (Daniel Louis). "Characterizing drag and velocity within model mangrove forests of ordered and random tree arrangement." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/111525.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (page 50).
Changes in velocity and drag force on model mangrove trees within 13 different simulated mangrove forest segments in a flume were investigated. The simulated forests were composed of 1/12 scale model Rhizophora mangrove trees placed at three densities: low (3.42 trees/m²), medium (6.34 trees/m²), and high (9.27 trees/m²). For the low tree density cases, one forest with ordered tree placement and six forests with random tree placement were studied. For the medium and high tree density cases, one ordered tree arrangement and two random tree arrangements were studied. Spatial arrangements of the forests were described using the mean distance to nearest neighbor (NN) for all trees in a particular forest. The forest arrangements were also described using the spatial aggregation index developed by Clark and Evans. [9] For forests of ordered tree arrangement, depth-averaged velocity was found to decrease from the leading edge to the trailing edge of the forest segment at each density, and the reduction in velocity moving through the forest was greater for denser forests. Vertical profiles of velocity show that a region of high velocity developed above the root zone when moving from the leading edge to the trailing edge of the forest. This effect was more pronounced in the forests with random tree arrangement and low mean NN distance. For all spatial arrangements, the drag force acting on an individual tree decreased from the leading edge to the trailing edge of the forest. Larger decreases in drag force occurred within denser forests. Mangrove tree drag coefficient values were found to be similar or slightly higher for trees within forests of random arrangement compared to trees within forests of ordered arrangement, but further study examining a greater amount of random tree arrangements is needed. This study describes changes in the vulnerability of a mangrove forest that could occur if mangrove trees were removed from the forest by natural or human causes.
by Daniel L. Pasquale.
M. Eng.
Julock, Gregory Alan. "The Effectiveness of a Random Forests Model in Detecting Network-Based Buffer Overflow Attacks." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/190.
Full textHerlitz, Mattias. "Analyzing the Tobii Real-world-mapping tool and improving its workflow using Random Forests." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-228474.
Full textTobii Pro Glasses 2 används för att spela in tittdata vid marknadsundersökningar och vetenskapliga experiment. Tittpunkterna mappas från den inspelade filmen till en bild med intresseareor (AOI). De flesta viktiga mätvärdena handlar om fixationer, som uppkommer när en person betraktar samma ställe under en kort period. Metoden som främst används idag är att mappa tittpunkter manuellt, men ett snabbare sätt är att genom automatisk mappning använda Real World Mapping-verktyget (RWM). RWM:s tillförlitlighet undersöktes genom att analysera fixationer från flera inspelningar med hjälp av beslutsträd. En metod för att klassificera gazepunkter som korrekt eller icke-korrekt mappade skapades med hjälp av Random Forests (RF). Resultaten visar att RWM inte är särskilt bra på att mappa fixationer, varken att finna dem eller mappa dem till korrekt AOI. Det visade sig att RWM fungerar bättre vid begränsade rörelser och då AOI:erna är korrekt utformade, vilket kan agera som riktlinjer för den som utför ett experiment. RWM borde dock förbättras. RF-klassificeringen gav bra resultat på flera test set där tittpunkterna är mappade på en bild av RWM, och på tittpunkter som inte var mappade av RWM men som var i avseende av tid nära tittpunkter som är mappade. Tittpunkter som är långt ifrån mappade tittpunkter hade dåliga testresultat. Slutsatsen var att relevanta tittpunkter borde klassificeras med RF för att mappa om felaktigt mappade tittpunkter. Om RWM inte mappar stora segment tittpunkter så borde visuell klassificering användas.
Brokamp, Richard C. "Land Use Random Forests for Estimation of Exposure to Elemental Components of Particulate Matter." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1463130851.
Full textWilliams, Paige T. "Mapping Smallholder Forest Plantations in Andhra Pradesh, India using Multitemporal Harmonized Landsat Sentinel-2 S10 Data." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/104234.
Full textThe objective of this study was to develop a method by which smallholder forest plantations can be mapped accurately in Andhra Pradesh, India using multitemporal (intra- and inter-annual) visible (red, green, blue) and near-infrared (VNIR) bands from the European Space Agency satellite Sentinel-2. Dependency on and scarcity of wood products have driven the deforestation and degradation of natural forests in Southeast Asia. At the same time, forest plantations have been established both within and outside of forests, with the latter (as contiguous blocks) being the focus of this study. The ecosystem services provided by natural forests are different from those of plantations. As such, being able to separate natural forests from plantations is important. Unfortunately, there are constraints to accurately mapping planted forests in Andhra Pradesh (and other similar landscapes in South and Southeast Asia) using remotely sensed data due to the plantations' small size (average 2 hectares), short rotation ages (often 4-7 years for timber species), and spectral (reflectance from satellite imagery) similarities to croplands and natural forests. The East and West Godavari districts of Andhra Pradesh were selected as the area for a case study. Cloud-free Harmonized Landsat Sentinel-2 (HLS) S10 images were acquired over six dates, from different seasons, as follows: December 28, 2015; November 22, 2016; November 2, 2017; December 22, 2017; March 1, 2018; and June 15, 2018. Cloud-free satellite data are not available during the monsoon season (July to September) in this coastal region. In situ data on forest plantations, provided by collaborators, was supplemented with additional training data points (X and Y locations with land cover class) representing other land cover subclasses in the region: agriculture, water, aquaculture, mangrove, palm, forest plantation, ground, natural forest, shrub/scrub, sand, and urban, with a total of 2,230 training points. These high-quality samples were then aggregated into three land use classes: non-forest, natural forest, and forest plantations. Image classification used random forests within the Julia DecisionTree package on a thirty-band stack that was comprised of the VNIR bands and NDVI (calculation related to greenness, i.e. higher value = more vegetation) images for all dates. The median classification accuracy from the 5-fold cross validation was 94.3%. Our results, predicated on high quality training data, demonstrate that (mostly smallholder) forest plantations can be separated from natural forests even using only the Sentinel 2 VNIR bands when multitemporal data (across both years and seasons) are used.
ARAÚJO, Gilderlanio Santana de. "Uso de random forests e redes biológicas na associação de poliformismos à doença de Alzheimer." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/18012.
Full textMade available in DSpace on 2016-10-18T19:17:10Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao -Gilderlanio Santana de Araujo.pdf: 9533988 bytes, checksum: 951b1cf090729a87ebf3a8741ff00ad4 (MD5) Previous issue date: 2013-03-07
FACEPE
O desenvolvimento de técnicas de genotipagem de baixo custo (SNP arrays) e as anotações de milhares de polimorfismos de nucleotídeo único (SNPs) em bancos de dados públicos têm originado um crescente número de estudos de associação em escala genômica (do inglês, Genome-Wide Associations Studies - GWAS). Nesses estudos, um enorme número de SNPs (centenas de milhares) são avaliados com métodos estatísticos univariados de forma a encontrar SNPs associados a um determinado fenótipo. Testes univariados são incapazes de capturar relações de alta ordem entre os SNPs, algo comum em doenças genéticas complexas e são afetados pela alta correlação entre SNPs na mesma região genômica. Métodos de aprendizado de máquina, como o Random Forest (RF), têm sido aplicados em dados de GWAS para realizar a previsão de riscos de doenças e capturar os SNPs associados às mesmas. Apesar de RF ser um método com reconhecido desempenho em dados de alta dimensionalidade e na captura de relações não-lineares, o uso de todos os SNPs presentes em um estudo GWAS é computacionalmente inviável. Neste estudo propomos o uso de redes biológicas para a seleção inicial de SNPs candidatos a serem usados pela RF. A partir de um conjunto inicial de genes já relacionados à doença na literatura, usamos ferramentas de redes de interação gene-gene, para encontrar novos genes que possam estar associados a doença. Logo, é possível extrair um número reduzido de SNPs tornando a aplicação do método RF viável. Os experimentos realizados nesse estudo concentram-se em investigar quais polimorfismos podem influenciar na suscetibilidade à doença de Alzheimer (DA) e ao comprometimento cognitivo leve (MCI). O resultado final das análises é a delineação de uma metodologia para o uso de RF, para a análise de dados de GWAS, assim como a caracterização de potenciais fatores de riscos da DA.
The development of low cost genotyping techniques (SNP arrays) and annotations of thousands of single nucleotide polymorphisms (SNPs) in public databases has led to an increasing number of Genome-Wide Associations Studies (GWAS). In these studies, a large number of SNPs (hundreds of thousands) are evaluated with univariate statistical methods in order to find SNPs associated with a particular phenotype. Univariate tests are unable to capture high-order relationships among SNPs, which are common in complex genetic diseases, and are affected by the high correlation between SNPs at the same genomic region. Machine learning methods, such as the Random Forest (RF), have been applied to GWAS data to perform the prediction of the risk of diseases and capture a set of SNPs associated with them. Although, RF is a method with recognized performance in high dimensional data and capacity to capture non-linear relationships, the use of all SNPs present in GWAS data is computationally intractable. In this study we propose the use of biological networks for the initial selection of candidate SNPs to be used by RF. From an initial set of genes already related to a disease based on the literature, we use tools for construct gene-gene interaction networks, to find novel genes that might be associated with disease. Therefore, it is possible to extract a small number of SNPs making the method RF feasible. The experiments conducted in this study focus on investigating which polymorphisms may influence the susceptibility of Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This work presents a delineation of a methodology on using RF for analysis of GWAS data, and characterization of potential risk factors for AD.
Strobl, Carolin, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. "Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2006. http://epub.wu.ac.at/1274/1/document.pdf.
Full textSeries: Research Report Series / Department of Statistics and Mathematics
Geremia, Ezequiel. "Spatial random forests for brain lesions segmentation in MRIs and model-based tumor cell extrapolation." Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00838795.
Full textBylund, Rebecca, and Höök Malin J-son. "Går det prediktera demens? : En jämförande studie mellan Logistisk regression, Elastic Net och Random Forests." Thesis, Umeå universitet, Statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-149728.
Full textAl, Maathidi M. M. "Optimal feature selection and machine learning for high-level audio classification : a random forests approach." Thesis, University of Salford, 2017. http://usir.salford.ac.uk/44338/.
Full textAichele, Figueroa Diego Andrés. "Detección de anomalías en componentes mecánicos en base a Deep Learning y Random Cut Forests." Tesis, Universidad de Chile, 2019. http://repositorio.uchile.cl/handle/2250/170571.
Full textDentro del área de mantenimiento, el monitorear un equipo puede ser de gran utilidad ya que permite advertir cualquier anomalía en el funcionamiento interno de éste, y así, se puede corregir cualquier desperfecto antes de que se produzca una falla de mayor gravedad. En data mining, detección de anomalías es el ejercicio de identificar elementos anómalos, es decir, aquellos elementos que difieren a lo común dentro de un set de datos. Detección de anomalías tiene aplicación en diferentes dominios, por ejemplo, hoy en día se utiliza en bancos para detectar compras fraudulentas y posibles estafas a través de un patrón de comportamiento del usuario, por ese motivo se necesitan abarcar grandes cantidades de datos por lo que su desarrollo en aprendizajes de máquinas probabilísticas es imprescindible. Cabe destacar que se ha desarrollado una variedad de algoritmos para encontrar anomalías, una de las más famosas es el Isolated Forest dentro de los árboles de decisión. Del algoritmo de Isolated Forest han derivado distintos trabajos que proponen mejoras para éste, como es el Robust Random Cut Forest el cual, por un lado permite mejorar la precisión para buscar anomalías y, también, entrega la ventaja de poder realizar un estudio dinámico de datos y buscar anomalías en tiempo real. Por otro lado, presenta la desventaja de que entre más atributos contengan los sets de datos más tiempo de cómputo tendrá para detectar una anomalía. Por ende, se utilizará un método de reducción de atributos, también conocido como reducción de dimensión, por último se estudiará como afectan tanto en efectividad y eficiencia al algoritmo sin reducir la dimensión de los datos. En esta memoria se analiza el algoritmo Robust Random Cut Forest para finalmente entregar una posible mejora a éste. Para poner en prueba el algoritmo se realiza un experimento de barras de acero, donde se obtienen como resultado sus vibraciones al ser excitado por un ruido blanco. Estos datos se procesan en tres escenarios distintos: Sin reducción de dimensiones, análisis de componentes principales(principal component analysis) y autoencoder. En base a esto, el primer escenario (sin reducción de dimensiones) servirá para establecer un punto de orientación, para ver como varían el escenario dos y tres en la detección de anomalía, en efectividad y eficiencia. %partida para detección de anomalía, luego se ver si esta mejora Luego, se realiza el estudio en el marco de tres escenarios para detectar puntos anómalos; En los resultados se observa una mejora al reducir las dimensiones en cuanto a tiempo de cómputo (eficiencia) y en precisión (efectividad) para encontrar una anomalía, finalmente los mejores resultados son con análisis de componentes principales (principal component analysis).
Goodwin, Christopher C. H. "The Influence of Cost-sharing Programs on Southern Non-industrial Private Forests." Thesis, Virginia Tech, 2001. http://hdl.handle.net/10919/30895.
Full textMaster of Science