Dissertations / Theses on the topic 'Ensemble learning methods'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Ensemble learning methods.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Abbasian, Houman. "Inner Ensembles: Using Ensemble Methods in Learning Step." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31127.
Full textVelka, Elina. "Loss Given Default Estimation with Machine Learning Ensemble Methods." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279846.
Full textDenna uppsats undersöker och jämför tre maskininlärningsmetoder som estimerar förlust vid fallissemang (Loss Given Default, LGD). LGD kan ses som motsatsen till återhämtningsgrad, dvs. andelen av det utstående lånet som långivaren inte skulle återfå ifall kunden skulle fallera. Maskininlärningsmetoder som undersöks i detta arbete är decision trees, random forest och boosted metoder. Alla metoder fungerade väl vid estimering av lån som antingen inte återbetalas, dvs. LGD = 1 (100%), eller av lån som betalas i sin helhet, LGD = 0 (0%). En tydlig minskning i modellernas träffsäkerhet påvisades när modellerna kördes med ett dataset där observationer med LGD = 1 var borttagna. Random forest modeller byggda på ett obalanserat träningsdataset presterade bättre än de övriga modellerna på testset som inkluderade observationer där LGD = 1. Då observationer med LGD = 1 var borttagna visade det sig att random forest modeller byggda på ett balanserat träningsdataset presterade bättre än de övriga modellerna. Boosted modeller visade den svagaste träffsäkerheten av de tre metoderna som blev undersökta i denna studie. Totalt sett visade studien att random forest modeller byggda på ett obalanserat träningsdataset presterade en aning bättre än decision tree modeller, men beräkningstiden (kostnaden) var betydligt längre när random forest modeller kördes. Därför skulle decision tree modeller föredras vid estimering av förlust vid fallissemang.
Conesa, Gago Agustin. "Methods to combine predictions from ensemble learning in multivariate forecasting." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-103600.
Full textKanneganti, Alekhya. "Using Ensemble Machine Learning Methods in Estimating Software Development Effort." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20691.
Full textBustos, Ricardo Gacitua. "OntoLancs : An evaluation framework for ontology learning by ensemble methods." Thesis, Lancaster University, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.533089.
Full textElahi, Haroon. "A Boosted-Window Ensemble." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5658.
Full textKing, Michael Allen. "Ensemble Learning Techniques for Structured and Unstructured Data." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/51667.
Full textPh. D.
Nguyen, Thanh Tien. "Ensemble Learning Techniques and Applications in Pattern Classification." Thesis, Griffith University, 2017. http://hdl.handle.net/10072/366342.
Full textThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
Shi, Zhe. "Semi-supervised Ensemble Learning Methods for Enhanced Prognostics and Health Management." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1522420632837268.
Full textSlawek, Janusz. "Inferring Gene Regulatory Networks from Expression Data using Ensemble Methods." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3396.
Full textDe, Giorgi Marcello. "Tree ensemble methods for Predictive Maintenance: a case study." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22282/.
Full textLund, William B. "Ensemble Methods for Historical Machine-Printed Document Recognition." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4024.
Full textDarwiche, Aiman A. "Machine Learning Methods for Septic Shock Prediction." Diss., NSUWorks, 2018. https://nsuworks.nova.edu/gscis_etd/1051.
Full textFrery, Jordan. "Ensemble Learning for Extremely Imbalced Data Flows." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSES034.
Full textMachine learning is the study of designing algorithms that learn from trainingdata to achieve a specific task. The resulting model is then used to predict overnew (unseen) data points without any outside help. This data can be of manyforms such as images (matrix of pixels), signals (sounds,...), transactions (age,amount, merchant,...), logs (time, alerts, ...). Datasets may be defined to addressa specific task such as object recognition, voice identification, anomaly detection,etc. In these tasks, the knowledge of the expected outputs encourages a supervisedlearning approach where every single observed data is assigned to a label thatdefines what the model predictions should be. For example, in object recognition,an image could be associated with the label "car" which suggests that the learningalgorithm has to learn that a car is contained in this picture, somewhere. This is incontrast with unsupervised learning where the task at hand does not have explicitlabels. For example, one popular topic in unsupervised learning is to discoverunderlying structures contained in visual data (images) such as geometric formsof objects, lines, depth, before learning a specific task. This kind of learning isobviously much harder as there might be potentially an infinite number of conceptsto grasp in the data. In this thesis, we focus on a specific scenario of thesupervised learning setting: 1) the label of interest is under represented (e.g.anomalies) and 2) the dataset increases with time as we receive data from real-lifeevents (e.g. credit card transactions). In fact, these settings are very common inthe industrial domain in which this thesis takes place
Vandoni, Jennifer. "Ensemble Methods for Pedestrian Detection in Dense Crowds." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS116/document.
Full textThis study deals with pedestrian detection in high- density crowds from a mono-camera system. The detections can be then used both to obtain robust density estimation, and to initialize a tracking algorithm. One of the most difficult challenges is that usual pedestrian detection methodologies do not scale well to high-density crowds, for reasons such as absence of background, high visual homogeneity, small size of the objects, and heavy occlusions. We cast the detection problem as a Multiple Classifier System (MCS), composed by two different ensembles of classifiers, the first one based on SVM (SVM-ensemble) and the second one based on CNN (CNN-ensemble), combined relying on the Belief Function Theory (BFT) to exploit their strengths for pixel-wise classification. SVM-ensemble is composed by several SVM detectors based on different gradient, texture and orientation descriptors, able to tackle the problem from different perspectives. BFT allows us to take into account the imprecision in addition to the uncertainty value provided by each classifier, which we consider coming from possible errors in the calibration procedure and from pixel neighbor's heterogeneity in the image space. However, scarcity of labeled data for specific dense crowd contexts reflects in the impossibility to obtain robust training and validation sets. By exploiting belief functions directly derived from the classifiers' combination, we propose an evidential Query-by-Committee (QBC) active learning algorithm to automatically select the most informative training samples. On the other side, we explore deep learning techniques by casting the problem as a segmentation task with soft labels, with a fully convolutional network designed to recover small objects thanks to a tailored use of dilated convolutions. In order to obtain a pixel-wise measure of reliability about the network's predictions, we create a CNN- ensemble by means of dropout at inference time, and we combine the different obtained realizations in the context of BFT. Finally, we show that the output map given by the MCS can be employed to perform people counting. We propose an evaluation method that can be applied at every scale, providing also uncertainty bounds on the estimated density
Michelen, Strofer Carlos Alejandro. "Machine Learning and Field Inversion approaches to Data-Driven Turbulence Modeling." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103155.
Full textDoctor of Philosophy
The Reynolds-averaged Navier-Stokes (RANS) equations are widely used to simulate fluid flows in engineering applications despite their known inaccuracy in many flows of practical interest. The uncertainty in the RANS equations is known to stem from the Reynolds stress tensor for which no universally applicable turbulence model exists. The computational cost of more accurate methods for fluid flow simulation, however, means RANS simulations will likely continue to be a major tool in engineering applications and there is still a need for improved RANS turbulence modeling. This dissertation explores two different approaches to use available experimental data to improve RANS predictions by improving the uncertain Reynolds stress tensor field. The first approach is using machine learning to learn a data-driven turbulence model from a set of training data. This model can then be applied to predict new flows in place of traditional turbulence models. To this end, this dissertation presents a novel framework for training deep neural networks using experimental measurements of velocity and pressure. When using velocity and pressure data, gradient-based training of the neural network requires the sensitivity of the RANS equations to the learned Reynolds stress. Two different methods, the continuous adjoint and ensemble approximation, are used to obtain the required sensitivity. The second approach explored in this dissertation is field inversion, whereby available data for a flow of interest is used to infer a Reynolds stress field that leads to improved RANS solutions for that same flow. Here, the field inversion is done via the ensemble Kalman inversion (EKI), a Monte Carlo Bayesian procedure, and the focus is on improving the inference by enforcing known physical constraints on the inferred Reynolds stress field. To this end, a method for enforcing boundary conditions on the inferred field is presented. While further development is needed, the two data-driven approaches explored and improved upon here demonstrate the potential for improved practical RANS predictions.
Sirin, Volkan. "Machine Learning Methods For Opponent Modeling In Games Of Imperfect Information." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614630/index.pdf.
Full textDutra, Calainho Felipe. "Evaluation of Calibration Methods to Adjust for Infrequent Values in Data for Machine Learning." Thesis, Högskolan Dalarna, Mikrodataanalys, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:du-28134.
Full textNOTARO, MARCO. "HIERARCHICAL ENSEMBLE METHODS FOR ONTOLOGY-BASED PREDICTIONS IN COMPUTATIONAL BIOLOGY." Doctoral thesis, Università degli Studi di Milano, 2019. http://hdl.handle.net/2434/606185.
Full textThe standardized annotation of biomedical related objects, often organized in dedicated catalogues, strongly promoted the organization of biological concepts into controlled vocabularies, i.e. ontologies by which related terms of the underlying biological domain are structured according to a predefined hierarchy. Indeed large ontologies have been developed by the scientific community to structure and organize the gene and protein taxonomy of all the living organisms from Archea to Metazoa, i.e. the Gene Ontology, or human specific ontologies, such as the Human Phenotype Ontology, that provides a structured taxonomy of the abnormal human phenotypes associated with diseases. These ontologies, offering a coded and well-defined classification space for biological entities such as genes and proteins, favor the development of machine learning methods able to predict features of biological objects like the association between a human gene and a disease, with the aim to drive wet lab research allowing a reduction of the costs and a more effective usage of the available research funds. Despite the soundness of the aforementioned objectives, the resulting multi-label classification problems raise so complex machine learning issues that until recently the far common approach was the “flat” prediction, i.e. simply training a classifier for each term in the controlled vocabulary and ignoring the relationships between terms. This approach was not only justified by the need to reduce the computational complexity of the learning task, but also by the somewhat “unstable” nature of the terms composing the controlled vocabularies, because they were (and are) updated on a monthly basis in a process performed by expert curators and based on biomedical literature, and wet and in-silico experiments. In this context, two main general classes of classifiers have been proposed in literature. On the one hand, “hierarchy-unaware” learning methods predict labels in a “flat” way without exploiting the inherent structure of the annotation space. On the other hand, “hierarchy-aware” learning methods can improve the accuracy and the precision of the predictions by considering the hierarchical relationships between ontology terms. Moreover these methods can guarantee the consistency of the predicted labels according to the “true path rule”, that is the biological and logical rule that governs the internal coherence of biological ontologies. To properly handle the hierarchical relationships linking the ontology terms, two main classes of structured output methods have been proposed in literature: the first one is based on kernelized methods for structured output spaces, the second on hierarchical ensemble methods for ontology-based predictions. However both these approaches suffer of significant drawbacks. The kernel-based methods for structured output space are computationally intensive and do not scale well when applied to complex multi-label bio-ontologies. Most hierarchical ensemble methods have been conceived for tree-structured taxonomies and the few ones specifically developed for the prediction in DAG-structured output spaces are, in most cases, unable to improve prediction performances over flat methods. To overcome these limitations, in this thesis novel “ontology-aware” ensemble methods have been developed, able to handle DAG-structured ontologies, leveraging previous results obtained with “true-path-rule”-based hierarchical learning algorithms. These methods are highly modular in the sense that they adopt a “two-step” learning strategy: in the first step they learn separately each term of the ontology using flat methods, and in the second they properly combine the flat predictions according to the hierarchy of the classes. The main contributions of this thesis are both methodological and experimental. From a methodological standpoint, novel hierarchical ensemble methods are proposed, including: a) HTD (Hierarchical Top-Down algorithm for DAG structured ontologies); b) TPR-DAG (True Path Rule ensemble for DAG) with several variants; c) ISO-TPR, a novel ensemble method that combines the True Path Rule approach with Isotonic Regression. For all these methods a formal proof of their consistency, i.e. the guarantee of providing predictions that “respect” the hierarchical relationships between classes, is provided. From an experimental standpoint, extensive genome and ontology-wide results show that the proposed methods: a) are competitive with state-of-the-art prediction algorithms; b) are able to improve flat machine learning classifiers, if the base learners can provide non random predictions; c) are able to predict new associations between genes and human abnormal phenotypes, a crucial step to discover novel genes associated with human diseases ranging from genetic disorders to cancer; d) scale nicely with large datasets and bio-ontologies. Finally HEMDAG, a novel R library implementing the proposed hierarchical ensemble methods has been developed and publicly delivered.
Banfield, Robert E. "Learning on complex simulations." [Tampa, Fla.] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0002112.
Full textKankanala, Padmavathy. "Machine learning methods for the estimation of weather and animal-related power outages on overhead distribution feeders." Diss., Kansas State University, 2013. http://hdl.handle.net/2097/16914.
Full textDepartment of Electrical and Computer Engineering
Sanjoy Das and Anil Pahwa
Because a majority of day-to-day activities rely on electricity, it plays an important role in daily life. In this digital world, most of the people’s life depends on electricity. Without electricity, the flip of a switch would no longer produce instant light, television or refrigerators would be nonexistent, and hundreds of conveniences often taken for granted would be impossible. Electricity has become a basic necessity, and so any interruption in service due to disturbances in power lines causes a great inconvenience to customers. Customers and utility commissions expect a high level of reliability. Power distribution systems are geographically dispersed and exposure to environment makes them highly vulnerable part of power systems with respect to failures and interruption of service to customers. Following the restructuring and increased competition in the electric utility industry, distribution system reliability has acquired larger significance. Better understanding of causes and consequences of distribution interruptions is helpful in maintaining distribution systems, designing reliable systems, installing protection devices, and environmental issues. Various events, such as equipment failure, animal activity, tree fall, wind, and lightning, can negatively affect power distribution systems. Weather is one of the primary causes affecting distribution system reliability. Unfortunately, as weather-related outages are highly random, predicting their occurrence is an arduous task. To study the impact of weather on overhead distribution system several models, such as linear and exponential regression models, neural network model, and ensemble methods are presented in this dissertation. The models were extended to study the impact of animal activity on outages in overhead distribution system. Outage, lightning, and weather data for four different cities in Kansas of various sizes from 2005 to 2011 were provided by Westar Energy, Topeka, and state climate office at Kansas State University weather services. Models developed are applied to estimate daily outages. Performance tests shows that regression and neural network models are able to estimate outages well but failed to estimate well in lower and upper range of observed values. The introduction of committee machines inspired by the ‘divide & conquer” principle overcomes this problem. Simulation results shows that mixture of experts model is more effective followed by AdaBoost model in estimating daily outages. Similar results on performance of these models were found for animal-caused outages.
Memari, Majid. "Predicting the Stock Market Using News Sentiment Analysis." OpenSIUC, 2018. https://opensiuc.lib.siu.edu/theses/2442.
Full textJaber, Ghazal. "An approach for online learning in the presence of concept changes." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00907486.
Full textLi, Yichao. "Algorithmic Methods for Multi-Omics Biomarker Discovery." Ohio University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1541609328071533.
Full textLundberg, Jacob. "Resource Efficient Representation of Machine Learning Models : investigating optimization options for decision trees in embedded systems." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-162013.
Full textBörthas, Lovisa, and Sjölander Jessica Krange. "Machine Learning Based Prediction and Classification for Uplift Modeling." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-266379.
Full textBehovet av att kunna modellera den verkliga vinsten av riktad marknadsföring har lett till den idag vanligt förekommande metoden inkrementell responsanalys. För att kunna utföra denna typ av metod krävs förekomsten av en existerande testgrupp samt kontrollgrupp och målet är således att beräkna differensen mellan de positiva utfallen i de två grupperna. Sannolikheten för de positiva utfallen för de två grupperna kan effektivt estimeras med statistiska maskininlärningsmetoder. De inkrementella responsanalysmetoderna som undersöks i detta projekt är subtraktion av två modeller, att modellera den inkrementella responsen direkt samt en klassvariabeltransformation. De statistiska maskininlärningsmetoderna som tillämpas är random forests och neurala nätverk samt standardmetoden logistisk regression. Datan är samlad från ett väletablerat detaljhandelsföretag och målet är därmed att undersöka vilken inkrementell responsanalysmetod och maskininlärningsmetod som presterar bäst givet datan i detta projekt. De mest avgörande aspekterna för att få ett bra resultat visade sig vara variabelselektionen och mängden kontrolldata i varje dataset. För att få ett lyckat resultat bör valet av maskininlärningsmetod vara random forests vilken används för att modellera den inkrementella responsen direkt, eller logistisk regression tillsammans med en klassvariabeltransformation. Neurala nätverksmetoder är känsliga för ojämna klassfördelningar och klarar därmed inte av att erhålla stabila modeller med den givna datan. Vidare presterade subtraktion av två modeller dåligt på grund av att var modell tenderade att fokusera för mycket på att modellera klassen i båda dataseten separat, istället för att modellera differensen mellan dem. Slutsatsen är således att en metod som modellerar den inkrementella responsen direkt samt en relativt stor kontrollgrupp är att föredra för att få ett stabilt resultat.
Fiterau, Madalina. "Discovering Compact and Informative Structures through Data Partitioning." Research Showcase @ CMU, 2015. http://repository.cmu.edu/dissertations/792.
Full textAl-Mter, Yusur. "Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166139.
Full textKueterman, Nathan. "Comparative Study of Classification Methods for the Mitigation of Class Imbalance Issues in Medical Imaging Applications." University of Dayton / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1591611376235015.
Full textPereira, Vinicius Gomes. "Using supervised machine learning and sentiment analysis techniques to predict homophobia in portuguese tweets." reponame:Repositório Institucional do FGV, 2018. http://hdl.handle.net/10438/24301.
Full textApproved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2018-07-11T12:40:51Z (GMT) No. of bitstreams: 1 DissertacaoFinal.pdf: 2029614 bytes, checksum: 3eda3dc97f25c0eecd86608653150d82 (MD5)
Made available in DSpace on 2018-07-16T17:48:51Z (GMT). No. of bitstreams: 1 DissertacaoFinal.pdf: 2029614 bytes, checksum: 3eda3dc97f25c0eecd86608653150d82 (MD5) Previous issue date: 2018-04-16
Este trabalho estuda a identificação de tweets homofóbicos, utilizando uma abordagem de processamento de linguagem natural e aprendizado de máquina. O objetivo é construir um modelo preditivo que possa detectar, com razoável precisão, se um Tweet contém conteúdo ofensivo a indivı́duos LGBT ou não. O banco de dados utilizado para treinar os modelos preditivos foi construı́do agregando tweets de usuários que interagiram com polı́ticos e/ou partidos polı́ticos no Brasil. Tweets contendo termos relacionados a LGBTs ou que têm referências a indivı́duos LGBT foram coletados e classificados manualmente. Uma grande parte deste trabalho está na construção de features que capturam com precisão não apenas o texto do tweet, mas também caracterı́sticas especı́ficas dos usuários e de expressões coloquiais do português. Em particular, os usos de palavrões e vocabulários especı́ficos são um forte indicador de tweets ofensivos. Naturalmente, n-gramas e esquemas de frequência de termos também foram considerados como caracterı́sticas do modelo. Um total de 12 conjuntos de recursos foram construı́dos. Uma ampla gama de técnicas de aprendizado de máquina foi empregada na tarefa de classificação: Naive Bayes, regressões logı́sticas regularizadas, redes neurais feedforward, XGBoost (extreme gradient boosting), random forest e support vector machines. Depois de estimar e ajustar cada modelo, eles foram combinados usando voting e stacking. Voting utilizando 10 modelos obteve o melhor resultado, com 89,42% de acurácia.
This work studies the identification of homophobic tweets from a natural language processing and machine learning approach. The goal is to construct a predictive model that can detect, with reasonable accuracy, whether a Tweet contains offensive content to LGBT or not. The database used to train the predictive models was constructed aggregating tweets from users that have interacted with politicians and/or political parties in Brazil. Tweets containing LGBT-related terms or that have references to open LGBT individuals were collected and manually classified. A large part of this work is in constructing features that accurately capture not only the text of the tweet but also specific characteristics of the users and language choices. In particular, the uses of swear words and strong vocabulary is a quite strong predictor of offensive tweets. Naturally, n-grams and term weighting schemes were also considered as features of the model. A total of 12 sets of features were constructed. A broad range of machine learning techniques were employed in the classification task: naive Bayes, regularized logistic regressions, feedforward neural networks, extreme gradient boosting (XGBoost), random forest and support vector machines. After estimating and tuning each model, they were combined using voting and stacking. Voting using 10 models obtained the best result, with 89.42% accuracy.
Zhao, Xiaochuang. "Ensemble Learning Method on Machine Maintenance Data." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/6056.
Full textEzzeddine, Diala. "A contribution to topological learning and its application in Social Networks." Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO22011/document.
Full textSupervised Learning is a popular field of Machine Learning that has made recent progress. In particular, many methods and procedures have been developed to solve the classification problem. Most classical methods in Supervised Learning use the density estimation of data to construct their classifiers.In this dissertation, we show that the topology of data can be a good alternative in constructing classifiers. We propose using topological graphs like Gabriel graphs (GG) and Relative Neighborhood Graphs (RNG) that can build the topology of data based on its neighborhood structure. To apply this concept, we create a new method called Random Neighborhood Classification (RNC).In this method, we use topological graphs to construct classifiers and then apply Ensemble Methods (EM) to get all relevant information from the data. EM is well known in Machine Learning, generates many classifiers from data and then aggregates these classifiers into one. Aggregate classifiers have been shown to be very efficient in many studies, because it leverages relevant and effective information from each generated classifier. We first compare RNC to other known classification methods using data from the UCI Irvine repository. We find that RNC works very well compared to very efficient methods such as Random Forests and Support Vector Machines. Most of the time, it ranks in the top three methods in efficiency. This result has encouraged us to study the efficiency of RNC on real data like tweets. Twitter, a microblogging Social Network, is especially useful to mine opinion on current affairs and topics that span the range of human interest, including politics. Mining political opinion from Twitter poses peculiar challenges such as the versatility of the authors when they express their political view, that motivate this study. We define a new attribute, called couple, that will be very helpful in the process to study the tweets opinion. A couple is an author that talk about a politician. We propose a new procedure that focuses on identifying the opinion on tweet using couples. We think that focusing on the couples's opinion expressed by several tweets can overcome the problems of analysing each single tweet. This approach can be useful to avoid the versatility, language ambiguity and many other artifacts that are easy to understand for a human being but not automatically for a machine.We use classical Machine Learning techniques like KNN, Random Forests (RF) and also our method RNC. We proceed in two steps : First, we build a reference set of classified couples using Naive Bayes. We also apply a second alternative method to Naive method, sampling plan procedure, to compare and evaluate the results of Naive method. Second, we evaluate the performance of this approach using proximity measures in order to use RNC, RF and KNN. The expirements used are based on real data of tweets from the French presidential election in 2012. The results show that this approach works well and that RNC performs very good in order to classify opinion in tweets.Topological Learning seems to be very intersting field to study, in particular to address the classification problem. Many concepts to get informations from topological graphs need to analyse like the ones described by Aupetit, M. in his work (2005). Our work show that Topological Learning can be an effective way to perform classification problem
Liu, Xuan. "An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/30702.
Full textFarrash, Majed. "Machine learning ensemble method for discovering knowledge from big data." Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/59367/.
Full textKoco, Sokol. "Méthodes ensembliste pour des problèmes de classification multi-vues et multi-classes avec déséquilibres." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4101/document.
Full textNowadays, in many fields, such as bioinformatics or multimedia, data may be described using different sets of features, also called views. For a given classification task, we distinguish two types of views:strong views, which are suited for the task, and weak views suited for a (small) part of the task; in multi-class learning, a view can be strong with respect to some (few) classes and weak for the rest of the classes: these are imbalanced views. The works presented in this thesis fall in the supervised learning setting and their aim is to address the problem of multi-view learning under strong, weak and imbalanced views, regrouped under the notion of uneven views. The first contribution of this thesis is a multi-view learning algorithm based on the same framework as AdaBoost.MM. The second part of this thesis proposes a unifying framework for imbalanced classes supervised methods (some of the classes are more represented than others). In the third part of this thesis, we tackle the uneven views problem through the combination of the imbalanced classes framework and the between-views cooperation used to take advantage of the multiple views. In order to test the proposed methods on real-world data, we consider the task of phone calls classifications, which constitutes the subject of the ANR DECODA project. Each part of this thesis deals with different aspects of the problem
Ferreira, Ednaldo José. "Método baseado em rotação e projeção otimizadas para a construção de ensembles de modelos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-27062012-161603/.
Full textThe development of new techniques capable of inducing predictive models with low generalization errors has been a constant in machine learning and other related areas. In this context, the composition of an ensemble of models should be highlighted due to its theoretical and empirical potential to minimize the generalization error. Several methods for building ensembles are found in the literature. Among them, the rotation-based (RB) has become known for outperforming other traditional methods. RB method applies the principal components analysis (PCA) for feature extraction as a rotation strategy to provide diversity and accuracy among base models. However, this strategy does not ensure that the resulting direction is appropriate for the supervised learning technique (SLT). Moreover, the RB method is not suitable for rotation-invariant SLTs and also it has not been evaluated with stable ones, which makes RB inappropriate and/or restricted to the use with only some SLTs. This thesis proposes a new approach for feature extraction based on concatenation of rotation and projection optimized for the SLT (called optimized roto-projection). The approach uses a metaheuristic to optimize the parameters from the roto-projection transformation, minimizing the error of the director technique of the optimization process. More emphatically, it is proposed the optimized roto-projection as a fundamental part of a new ensemble method, called optimized roto-projection ensemble (ORPE). The results show that the optimized roto-projection can reduce the dimensionality and the complexities of the data and model. Moreover, optimized roto-projection can increase the performance of the SLT subsequently applied. The ORPE outperformed, with statistical significance, RB and others using stable and unstable SLTs for classification and regression with databases from public and private domains. The ORPE method was unrestricted and highly effective holding the first position in every dominance rankings
Hadjem, Medina. "Contribution à l'analyse et à la détection automatique d'anomalies ECG dans le cas de l'ischémie myocardique." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB011.
Full textRecent advances in sensing and miniaturization of ultra-low power devices allow for more intelligent and wearable health monitoring sensor-based systems. The sensors are capable of collecting vital signs, such as heart rate, temperature, oxygen saturation, blood pressure, ECG, EMG, etc., and communicate wirelessly the collected data to a remote device and/or smartphone. Nowadays, these aforementioned advances have led a large research community to have interest in the design and development of new biomedical data analysis systems, particularly electrocardiogram (ECG) analysis systems. Aimed at contributing to this broad research area, we have mainly focused in this thesis on the automatic analysis and detection of coronary heart diseases, such as Ischemia and Myocardial Infarction (MI), that are well known to be the leading death causes worldwide. Toward this end, and because the ECG signals are deemed to be very noisy and not stationary, our challenge was first to extract the relevant parameters without losing their main features. This particular issue has been widely addressed in the literature and does not represent the main purpose of this thesis. However, as it is a prerequisite, it required us to understand the state of the art proposed methods and select the most suitable one for our work. Based on the ECG parameters extracted, particularly the ST segment and the T wave parameters, we have contributed with two different approaches to analyze the ECG records: (1) the first analysis is performed in the time series level, in order to detect abnormal elevations of the ST segment and the T wave, known to be an accurate predictor of ischemia or MI; (2) the second analysis is performed at the ECG beat level to automatically classify the ST segment and T wave anomalies within different categories. This latter approach is the most commonly used in the literature. However, lacking a performance comparison standard in the state of the art existing works, we have carried out our own comparison of the actual classification methods by taking into account diverse ST and T anomaly classes, several performance evaluation parameters, as well as several ECG signal leads. To obtain more realistic performances, we have also performed the same study in the presence of other frequent cardiac anomalies, such as arrhythmia. Based on this substantial comparative study, we have proposed a new classification approach of seven ST-T anomaly classes, by using a hybrid of the boosting and the random under sampling methods, our goal was ultimately to reach the best tradeoff between true-positives and false-positives
Silva, Bernardes Juliana. "Evolution et apprentissage automatique pour l'annotation fonctionnelle et la classification des homologies lointains en protéines." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00684155.
Full textFaußer, Stefan Artur [Verfasser]. "Large state spaces and large data: Utilizing neural network ensembles in reinforcement learning and kernel methods for clustering / Stefan Artur Faußer." Ulm : Universität Ulm. Fakultät für Ingenieurwissenschaften und Informatik, 2015. http://d-nb.info/1074196201/34.
Full textHronský, Patrik. "Bioinformatický nástroj pro predikci rozpustnosti proteinů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255363.
Full textParadeda, Raul Benites. "Utilizando Pesos est?ticos e din?micos em sistemas multi-classificadores com diferentes n?veis de diversidade." Universidade Federal do Rio Grande do Norte, 2007. http://repositorio.ufrn.br:8080/jspui/handle/123456789/17963.
Full textAlthough some individual techniques of supervised Machine Learning (ML), also known as classifiers, or algorithms of classification, to supply solutions that, most of the time, are considered efficient, have experimental results gotten with the use of large sets of pattern and/or that they have a expressive amount of irrelevant data or incomplete characteristic, that show a decrease in the efficiency of the precision of these techniques. In other words, such techniques can t do an recognition of patterns of an efficient form in complex problems. With the intention to get better performance and efficiency of these ML techniques, were thought about the idea to using some types of LM algorithms work jointly, thus origin to the term Multi-Classifier System (MCS). The MCS s presents, as component, different of LM algorithms, called of base classifiers, and realized a combination of results gotten for these algorithms to reach the final result. So that the MCS has a better performance that the base classifiers, the results gotten for each base classifier must present an certain diversity, in other words, a difference between the results gotten for each classifier that compose the system. It can be said that it does not make signification to have MCS s whose base classifiers have identical answers to the sames patterns. Although the MCS s present better results that the individually systems, has always the search to improve the results gotten for this type of system. Aim at this improvement and a better consistency in the results, as well as a larger diversity of the classifiers of a MCS, comes being recently searched methodologies that present as characteristic the use of weights, or confidence values. These weights can describe the importance that certain classifier supplied when associating with each pattern to a determined class. These weights still are used, in associate with the exits of the classifiers, during the process of recognition (use) of the MCS s. Exist different ways of calculating these weights and can be divided in two categories: the static weights and the dynamic weights. The first category of weights is characterizes for not having the modification of its values during the classification process, different it occurs with the second category, where the values suffers modifications during the classification process. In this work an analysis will be made to verify if the use of the weights, statics as much as dynamics, they can increase the perfomance of the MCS s in comparison with the individually systems. Moreover, will be made an analysis in the diversity gotten for the MCS s, for this mode verify if it has some relation between the use of the weights in the MCS s with different levels of diversity
Apesar de algumas t?cnicas individuais de Aprendizado de M?quina (AM) supervisionado, tamb?mconhecidos como classificadores, ou algoritmos de classifica??o, fornecerem solu??es que, na maioria das vezes, s?o consideradas eficientes, h? resultados experimentais obtidos com a utiliza??o de grandes conjuntos de padr?es e/ou que apresentam uma quantidade expressiva de dados incompletos ou caracter?sticas irrelevantes, que mostram uma queda na efic?cia da precis?o dessas t?cnicas. Ou seja, tais t?cnicas n?o conseguem realizar um reconhecimento de padr?es de uma forma eficiente em problemas complexos. Com o intuito de obter um melhor desempenho e efic?cia dessas t?cnicas de AM, pensouse na id?ia de fazer com que v?rios tipos de algoritmos de AM consigam trabalhar conjuntamente, dando assim origem ao termo Sistema Multi-Classificador (SMC). Os SMC s apresentam, como componentes, diferentes algoritmos de AM, chamados de classificadores base, e realizam uma combina??o dos resultados obtidos por estes algoritmos para atingir o resultado final. Para que o SMC tenha um desempenho melhor que os classificadores base, os resultados obtidos por cada classificador base devem apresentar uma determinada diversidade, ou seja, uma diferen?a entre os resultados obtidos por cada classificador que comp?em o sistema. Pode-se dizer que n?o faz sentido ter SMC s cujos classificadores base possuam respostas id?nticas aos padr?es apresentados. Apesar dos SMC s apresentarem melhores resultados que os sistemas executados individualmente, h? sempre a busca para melhorar os resultados obtidos por esse tipo de sistema. Visando essa melhora e uma maior consist?ncia nos resultados, assim como uma maior diversidade dos classificadores de um SMC, v?m sendo recentemente pesquisadas metodologias que apresentam como caracter?sticas o uso de pesos, ou valores de con- fian?a. Esses pesos podem descrever a import?ncia que um determinado classificador forneceu ao associar cada padr?o a uma determinada classe. Esses pesos ainda s?o utilizados, em conjunto com as sa?das dos classificadores, durante o processo de reconhecimento (uso) dos SMC s. Existem diferentes maneiras de se calcular esses pesos e podem ser divididas em duas categorias: os pesos est?ticos e os pesos din?micos. A primeira categoria de pesos se caracteriza por n?o haver a modifica??o de seus valores no decorrer do processo de classifica??o, ao contr?rio do que ocorre com a segunda categoria, onde os valores sofrem modifica??es no decorrer do processo de classifica??o. Neste trabalho ser? feito uma an?lise para verificar se o uso dos pesos, tanto est?ticos quanto din?micos, conseguem aumentar o desempenho dos SMC s em compara??o com estes sistemas executados individualmente. Al?m disso, ser? feita uma an?lise na diversidade obtida pelos SMC s, para dessa forma verificar se h? alguma rela??o entre o uso dos pesos nos SMC s com diferentes n?veis de diversidade
ILARDI, DAVIDE. "Data-driven solutions to enhance planning, operation and design tools in Industry 4.0 context." Doctoral thesis, Università degli studi di Genova, 2023. https://hdl.handle.net/11567/1104513.
Full textSantis, Rodrigo Barbosa de. "Previsão de falta de materiais no contexto de gestão inteligente de inventário: uma aplicação de aprendizado desbalanceado." Universidade Federal de Juiz de Fora (UFJF), 2018. https://repositorio.ufjf.br/jspui/handle/ufjf/6861.
Full textApproved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2018-06-27T11:12:01Z (GMT) No. of bitstreams: 1 rodrigobarbosadesantis.pdf: 2597054 bytes, checksum: b19542ca0e9312572d8ffa5896d735db (MD5)
Made available in DSpace on 2018-06-27T11:12:01Z (GMT). No. of bitstreams: 1 rodrigobarbosadesantis.pdf: 2597054 bytes, checksum: b19542ca0e9312572d8ffa5896d735db (MD5) Previous issue date: 2018-03-26
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Falta de materiais é um problema comum na cadeia de suprimentos, impactando o nível de serviço e eficiência de um sistema de inventário. A identificação de materiais com grande riscos de falta antes da ocorrência do evento pode apresentar uma enorme oportunidade de melhoria no desempenho geral de uma empresa. No entanto, a complexidade deste tipo de problema é alta, devido ao desbalanceamento das classes de itens faltantes e não faltantes no inventário, que podem chegar a razões de 1 para 100. No presente trabalho, algoritmos de classificação são investigados para proposição de um modelo preditivo para preencher esta lacuna na literatura. Algumas métricas específicas como a área abaixo das curvas de Característica Operacionais do Receptor e de Precisão-Abrangência, bem como técnicas de amostragem e comitês de aprendizado são aplicados nesta tarefa. O modelo proposto foi testado em dois estudos de caso reais, nos quais verificou-se que adoção da ferramenta pode contribuir com o aumento do nível de serviço em uma cadeia de suprimentos.
Material backorder (or stockout) is a common supply chain problem, impacting the inventory system service level and effectiveness. Identifying materials with the highest chances of shortage prior its occurrence can present a high opportunity to improve the overall company’s performance. However, the complexity of this sort of problem is high, due to class imbalance between missing items and not missing ones in inventory, which can achieve proportions of 1 to 100. In this work, machine learning classifiers are investigated in order to fulfill this gap in literature. Specific metrics such as area under the Receiver Operator Characteristic and precision-recall curves, sampling techniques and ensemble learning are employed to this particular task. The proposed model was tested in two real case-studies, in which it was verified that the use of the tool may contribute with the improvemnet of the service level in the supply chain.
"Optimizing Performance Measures in Classification Using Ensemble Learning Methods." Master's thesis, 2017. http://hdl.handle.net/2286/R.I.44123.
Full textDissertation/Thesis
Masters Thesis Computer Science 2017
Gao, Zi-yuan, and 高子元. "Learning with Multiple Labels and Ensemble Methods for Tweets Polarity Classification System." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/b3rp9c.
Full text國立中山大學
資訊工程學系研究所
106
In this paper, we focus on Twitter sentiment analysis, which is a task in the SemEval-2018 workshop. Given a tweet, classify it into one of seven ordinal classes. The method described in this paper is based on the previous work in the SemEval-2018 competition. We implement a system of learning with multiple labels. There are five sub-models in the system, namely three class model, negative class model, neutral class model, positive class model, and seven class model. Different labels are used in different sub-models to learn the polar representation of tweets. In the competition, we got the Pearson correlation coefficient of 0.638 on test data (ranked at 21/36). In order to improve the system performance, we change the usage of data, add the class weights, add lexicon features, and train own word vector. With our methods, we raise the Pearson correlation coefficient 0.137 on the test data. We also construct a lexicon model, which has recurrent neural network and lexicon score. The Pearson correlation coefficient of the lexicon model on the development set is about 0.1 higher than the traditional sentiment analysis. In the system of learning with multiple labels, we experiment with four ensemble methods, including weighted average, majority decision, voting and stacking ensemble. Finally, we retrain the DeepMoji model with transfer learning, and weighted average with polar classification results of system. The Pearson correlation coefficient is 0.806 on the test data and could be ranked 4th in the competition.
Balasubramanyam, Rashmi. "Supervised Classification of Missense Mutations as Pathogenic or Tolerated using Ensemble Learning Methods." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/3804.
Full textBalasubramanyam, Rashmi. "Supervised Classification of Missense Mutations as Pathogenic or Tolerated using Ensemble Learning Methods." Thesis, 2017. http://etd.iisc.ernet.in/2005/3804.
Full textSeca, Marta Sofia Lopes. "Explorations of the semantic learning machine neuroevolution algorithm: dynamic training data use and ensemble construction methods." Master's thesis, 2020. http://hdl.handle.net/10362/99078.
Full textAs the world’s technology evolves, the power to implement new and more efficient algorithms increases but so does the complexity of the problems at hand. Neuroevolution algorithms fit in this context in the sense that they are able to evolve Artificial Neural Networks (ANNs). The recently proposed Neuroevolution algorithm called Semantic Learning Machine (SLM) has the advantage of searching over unimodal error landscapes in any Supervised Learning task where the error is measured as a distance to the known targets. The absence of local optima in the search space results in a more efficient learning when compared to other neuroevolution algorithms. This work studies how different approaches of dynamically using the training data affect the generalization of the SLM algorithm. Results show that these methods can be useful in offering different alternatives to achieve a superior generalization. These approaches are evaluated experimentally in fifteen real-world binary classification data sets. Across these fifteen data sets, results show that the SLM is able to outperform the Multilayer Perceptron (MLP) in 13 out of the 15 considered problems with statistical significance after parameter tuning was applied to both algorithms. Furthermore, this work also considers how different ensemble construction methods such as a simple averaging approach, Bagging and Boosting affect the resulting generalization of the SLM and MLP algorithms. Results suggest that the stochastic nature of the SLM offers enough diversity to the base learner in a way that a simple averaging method can be competitive when compared to more complex techniques like Bagging and Boosting.
À medida que a tecnologia evolui, a possibilidade de implementar algoritmos novos e mais eficientes aumenta, no entanto, a complexidade dos problemas com que nos deparamos também se torna maior. Algoritmos de Neuroevolution encaixam-se neste contexto, na medida em que são capazes de evoluir Artificial Neural Networks (ANNs). O algoritmo de Neuroevolution recentemente proposto chamado Semantic Learning Machine (SLM) tem a vantagem de procurar sobre landscapes de erros unimodais em qualquer problema de Supervised Learning, onde o erro é medido como a distância aos alvos conhecidos. A não existência de local optima no espaço de procura resulta numa aprendizagem mais eficiente quando comparada com outros algoritmos de Neuroevolution. Este trabalho estuda como métodos diferentes de uso dinâmico de dados de treino afeta a generalização do algoritmo SLM. Os resultados mostram que estes métodos são úteis a oferecer uma alternativa que atinge uma generalização competitiva. Estes métodos são testados em quinze problemas reais de classificação binária. Nestes quinze problemas, o algoritmo SLM mostra superioridade ao Multilayer Perceptron (MLP) em treze deles com significância estatística depois de ser aplicado parameter tuning em ambos os algoritmos. Para além disso, este trabalho também considera como diferentes métodos de construção de ensembles, tal como um simples método de averaging, Bagging e Boosting afetam os valores de generalização dos algoritmos SLM e MLP. Os resultados sugerem que a natureza estocástica da SLM oferece diversidade suficiente aos base learners de maneira a que o método mais simples de construção de ensembles se torne competitivo quando comparado com técnicas mais complexas como Bagging e Boosting.
Reichenbach, Jonas. "Credit scoring with advanced analytics: applying machine learning methods for credit risk assessment at the Frankfurter sparkasse." Master's thesis, 2018. http://hdl.handle.net/10362/49557.
Full textThe need for controlling and managing credit risk obliges financial institutions to constantly reconsider their credit scoring methods. In the recent years, machine learning has shown improvement over the common traditional methods for the application of credit scoring. Even small improvements in prediction quality are of great interest for the financial institutions. In this thesis classification methods are applied to the credit data of the Frankfurter Sparkasse to score their credits. Since recent research has shown that ensemble methods deliver outstanding prediction quality for credit scoring, the focus of the model investigation and application is set on such methods. Additionally, the typical imbalanced class distribution of credit scoring datasets makes us consider sampling techniques, which compensate the imbalances for the training dataset. We evaluate and compare different types of models and techniques according to defined metrics. Besides delivering a high prediction quality, the model’s outcome should be interpretable as default probabilities. Hence, calibration techniques are considered to improve the interpretation of the model’s scores. We find ensemble methods to deliver better results than the best single model. Specifically, the method of the Random Forest delivers the best performance on the given data set. When compared to the traditional credit scoring methods of the Frankfurter Sparkasse, the Random Forest shows significant improvement when predicting a borrower’s default within a 12-month period. The Logistic Regression is used as a benchmark to validate the performance of the model.
Haque, Mohammad Nazmul. "Genetic algorithm-based ensemble methods for large-scale biological data classification." Thesis, 2017. http://hdl.handle.net/1959.13/1335393.
Full textWe study the search for the best ensemble combinations from the wide variety of heterogeneous base classifiers. The number of possible ways to create the ensemble with a large number of base classifiers is exponential to the base classifiers pool size. To search for the best combinations from that wide search space is not suitable for exhaustive search because of it's exponential growth with the ensemble size. Hence, we employed a genetic algorithm to find the best ensemble combinations from a pool of heterogeneous base classifiers. The classification decisions of base classifiers are combined using the popular majority vote approach. We used random sub-sampling for balancing the class distributions in the class-imbalanced datasets. The empirical result on benchmarking and real-world datasets apparently outperformed the performances of base classifiers and other state-of-the-art ensemble methods. Afterwards, we evaluated the performance of an ensemble of classifiers combination search in a weighted voting approach using the differential evolution (DE) algorithm to find if employing weights could increase the generalisation performances of ensembles. The weights optimised by DE also outperformed both of the base classifiers and other ensembles for benchmarking and real-world biological datasets. Finally, we extend the majority voting-based ensemble of classifiers combination search with multi-objective settings. The search space is spread over the all possible ensemble combinations created with 29 heterogeneous base classifiers and the selection of feature subset from six feature selection methods as wrapper approach. The optimisation of two objectives, the maximisation of training MCC scores and maximisation of the diversity among base classifiers, with NSGA-II, a popular multi-objective genetic algorithm, is used for simultaneously finding the best feature set and the ensemble combinations. We analyse the Pareto front of solutions obtained by NSGA-II for their generalisation performances. Datasets taken from UCI machine learning repository and NIPS2003 feature selection challenges have been used to investigate the performance of proposed method. The experimental outcomes suggest that the proposed multiobjective-based NSGA-II found the better feature set and the best ensemble combination that produces better generalisation performances in compared to other ensemble of classifiers methods.