To see the other types of publications on this topic, follow the link: Ensemble learning.

Dissertations / Theses on the topic 'Ensemble learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Ensemble learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Abbasian, Houman. "Inner Ensembles: Using Ensemble Methods in Learning Step." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31127.

Full text
Abstract:
A pivotal moment in machine learning research was the creation of an important new research area, known as Ensemble Learning. In this work, we argue that ensembles are a very general concept, and though they have been widely used, they can be applied in more situations than they have been to date. Rather than using them only to combine the output of an algorithm, we can apply them to decisions made inside the algorithm itself, during the learning step. We call this approach Inner Ensembles. The motivation to develop Inner Ensembles was the opportunity to produce models with the similar advantages as regular ensembles, accuracy and stability for example, plus additional advantages such as comprehensibility, simplicity, rapid classification and small memory footprint. The main contribution of this work is to demonstrate how broadly this idea can be applied, and highlight its potential impact on all types of algorithms. To support our claim, we first provide a general guideline for applying Inner Ensembles to different algorithms. Then, using this framework, we apply them to two categories of learning methods: supervised and un-supervised. For the former we chose Bayesian network, and for the latter K-Means clustering. Our results show that 1) the overall performance of Inner Ensembles is significantly better than the original methods, and 2) Inner Ensembles provide similar performance improvements as regular ensembles.
APA, Harvard, Vancouver, ISO, and other styles
2

Henley, Jennie. "The learning ensemble : musical learning through participation." Thesis, Birmingham City University, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.527426.

Full text
Abstract:
This thesis is an examination of the learning processes employed by adults who learn to play an instrument within an ensemble. The alms of the research were threefold. Firstly, to discover how a person learns in a group and what the role of the soclo-cultural environment is In learning. Secondly, to Investigate the role that Identity plays In learning and whether the students regard themselves as musicians. Rnally, to explore the role of the performance in the musical learning process. The research has been carried out using case-study research and a four-year autoethnographlc study. The theoretical framework Is provided by literature from the fields of cultural psychology, music psychology and adult learning. Activity Theory has been used as the main analytical tool. The discussion firstly considers the leaming process in order to construct an activity system of muslcalleamlng within an ensemble. Then, using this activity system, the motivational factors inherent In the learning ensemble and the role of Identity In generating motivation are considered. Through analysing motivation and Identity In relation to the activity system, I have demonstrated how the activity system can be developed into a three-dimensional system by Incorporating Identity as a constituent, thus stabilising the activity system. A three-dimensional system then allows for multiple activities to be analysed through the construction of activity constellations. The result of this study is a model of partidpative learning. Partidpative learning takes Into consideration the purpose of learning and the soclo-cultural environment so that musical leaming Is embedded In social music making. This then provides music education with a new model for leamlng a musical Instrument.
APA, Harvard, Vancouver, ISO, and other styles
3

Shoemaker, Larry. "Ensemble Learning With Imbalanced Data." Scholar Commons, 2010. http://scholarcommons.usf.edu/etd/3589.

Full text
Abstract:
We describe an ensemble approach to learning salient spatial regions from arbitrarily partitioned simulation data. Ensemble approaches for anomaly detection are also explored. The partitioning comes from the distributed processing requirements of large-scale simulations. The volume of the data is such that classifiers can train only on data local to a given partition. Since the data partition reflects the needs of the simulation, the class statistics can vary from partition to partition. Some classes will likely be missing from some or even most partitions. We combine a fast ensemble learning algorithm with scaled probabilistic majority voting in order to learn an accurate classifier from such data. Since some simulations are difficult to model without a considerable number of false positive errors, and since we are essentially building a search engine for simulation data, we order predicted regions to increase the likelihood that most of the top-ranked predictions are correct (salient). Results from simulation runs of a canister being torn and from a casing being dropped show that regions of interest are successfully identified in spite of the class imbalance in the individual training sets. Lift curve analysis shows that the use of data driven ordering methods provides a statistically significant improvement over the use of the default, natural time step ordering. Significant time is saved for the end user by allowing an improved focus on areas of interest without the need to conventionally search all of the data. We have also found that using random forests weighted and distance-based outlier ensemble methods for supervised learning of anomaly detection provide significant accuracy improvements when compared to existing methods on the same dataset. Further, distance-based outlier and local outlier factor ensemble methods for unsupervised learning of anomaly detection also compare favorably to existing methods.
APA, Harvard, Vancouver, ISO, and other styles
4

Rooney, Niall. "Ensemble meta-learning for regression." Thesis, University of Ulster, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Raharjo, Agus Budi. "Reliability in ensemble learning and learning from crowds." Electronic Thesis or Diss., Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0606.

Full text
Abstract:
La combinaison de plusieurs annotateurs experts est considérée pour prendre des décisions fiables dans le cas de données non étiquetées, bien que l’estimation des annotations d’experts ne soit pas une tâche facile en raison de divers niveaux de performance. Dans l’apprentissage supervisé, la performance contrastée des annotateurs peut se produire dans l’apprentissage ensembliste ou lorsque les vérités terrains sont absente. Dans l’apprentissage ensembliste, lorsque les données d'entraînement sont disponibles, différents classificateurs de base comme annotateurs fournissent des prévisions incertaines dans la phase de test. Alors que dans un cas où il n’y a pas des vérités terrains dans la phase d'entraînement, nous considérons les étiquettes proposées par plusieurs annotateurs sur les foules comme une pseudo-vérité de fond. Dans cette thèse, la première contribution basée sur le vote pondéré dans l’apprentissage ensembliste est proposée pour fournir des prédictions de combinaison fiables. Notre algorithme transforme les scores de confiance obtenus pendant le processus d'apprentissage en scores fiables. Lorsqu’il est difficile de trouver des experts comme les vérités terrains, une approche fondée sur l'estimation du maximum de vraisemblance et l'espérance-maximisation est proposée comme deuxième contribution pour sélectionner des annotateurs fiables. De plus, nous optimisons le temps de calcul de nos cadres afin d’adapter un grand nombre de données. Enfin, nos contributions visent à fournir des décisions fiables compte tenu des prédictions incertaines des classificateurs dans l’apprentissage ensembliste ou des annotations incertaines dans l’apprentissage de la foule<br>The combination of several human expert labels is generally used to make reliable decisions. However, using humans or learning systems to improve the overall decision is a crucial problem. Indeed, several human experts or machine learning have not necessarily the same performance. Hence, a great effort is made to deal with this performance problem in the presence of several actors, i.e., humans or classifiers. In this thesis, we present the combination of reliable classifiers in ensemble learning and learning from crowds. The first contribution is a method, based on weighted voting, which allows selecting a reliable combination of classifications. Our algorithm RelMV transforms confidence scores, obtained during the training phase, into reliable scores. By using these scores, it determines a set of reliable candidates through both static and dynamic selection process. When it is hard to find expert labels as ground truth, we propose an approach based on Bayesian and expectation-maximization (EM) as our second contribution. The aim is to evaluate the reliability degree of each annotator and to aggregate the appropriate labels carefully. We optimize the computation time of the algorithm in order to adapt a large number of data collected from crowds. The obtained outcomes show better accuracy, stability, and computation time compared to the previous methods. Also, we conduct an experiment considering the melanoma diagnosis problem using a real-world medical dataset consisting of a set of skin lesions images, which is annotated by multiple dermatologists
APA, Harvard, Vancouver, ISO, and other styles
6

Sinsel, Erik W. "Ensemble learning for ranking interesting attributes." Morgantown, W. Va. : [West Virginia University Libraries], 2005. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=4400.

Full text
Abstract:
Thesis (M.S.)--West Virginia University, 2005.<br>Title from document title page. Document formatted into pages; contains viii, 81 p. : ill. Includes abstract. Includes bibliographical references (p. 72-74).
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Shuo. "Ensemble diversity for class imbalance learning." Thesis, University of Birmingham, 2011. http://etheses.bham.ac.uk//id/eprint/1793/.

Full text
Abstract:
This thesis studies the diversity issue of classification ensembles for class imbalance learning problems. Class imbalance learning refers to learning from imbalanced data sets, in which some classes of examples (minority) are highly under-represented comparing to other classes (majority). The very skewed class distribution degrades the learning ability of many traditional machine learning methods, especially in the recognition of examples from the minority classes, which are often deemed to be more important and interesting. Although quite a few ensemble learning approaches have been proposed to handle the problem, no in-depth research exists to explain why and when they can be helpful. Our objectives are to understand how ensemble diversity affects the classification performance for a class imbalance problem according to single-class and overall performance measures, and to make best use of diversity to improve the performance. As the first stage, we study the relationship between ensemble diversity and generalization performance for class imbalance problems. We investigate mathematical links between single-class performance and ensemble diversity. It is found that how the single-class measures change along with diversity falls into six different situations. These findings are then verified in class imbalance scenarios through empirical studies. The impact of diversity on overall performance is also investigated empirically. Strong correlations between diversity and the performance measures are found. Diversity shows a positive impact on the recognition of the minority class and benefits the overall performance of ensembles in class imbalance learning. Our results help to understand if and why ensemble diversity can help to deal with class imbalance problems. Encouraged by the positive role of diversity in class imbalance learning, we then focus on a specific ensemble learning technique, the negative correlation learning (NCL) algorithm, which considers diversity explicitly when creating ensembles and has achieved great empirical success. We propose a new learning algorithm based on the idea of NCL, named AdaBoost.NC, for classification problems. An ``ambiguity" term decomposed from the 0-1 error function is introduced into the training framework of AdaBoost. It demonstrates superiority in both effectiveness and efficiency. Its good generalization performance is explained by theoretical and empirical evidences. It can be viewed as the first NCL algorithm specializing in classification problems. Most existing ensemble methods for class imbalance problems suffer from the problems of overfitting and over-generalization. To improve this situation, we address the class imbalance issue by making use of ensemble diversity. We investigate the generalization ability of NCL algorithms, including AdaBoost.NC, to tackle two-class imbalance problems. We find that NCL methods integrated with random oversampling are effective in recognizing minority class examples without losing the overall performance, especially the AdaBoost.NC tree ensemble. This is achieved by providing smoother and less overfitting classification boundaries for the minority class. The results here show the usefulness of diversity and open up a novel way to deal with class imbalance problems. Since the two-class imbalance is not the only scenario in real-world applications, multi-class imbalance problems deserve equal attention. To understand what problems multi-class can cause and how it affects the classification performance, we study the multi-class difficulty by analyzing the multi-minority and multi-majority cases respectively. Both lead to a significant performance reduction. The multi-majority case appears to be more harmful. The results reveal possible issues that a class imbalance learning technique could have when dealing with multi-class tasks. Following this part of analysis and the promising results of AdaBoost.NC on two-class imbalance problems, we apply AdaBoost.NC to a set of multi-class imbalance domains with the aim of solving them effectively and directly. Our method shows good generalization in minority classes and balances the performance across different classes well without using any class decomposition schemes. Finally, we conclude this thesis with how the study has contributed to class imbalance learning and ensemble learning, and propose several possible directions for future research that may improve and extend this work.
APA, Harvard, Vancouver, ISO, and other styles
8

Soares, Rodrigo Gabriel Ferreira. "Cluster-based semi-supervised ensemble learning." Thesis, University of Birmingham, 2014. http://etheses.bham.ac.uk//id/eprint/4818/.

Full text
Abstract:
Semi-supervised classification consists of acquiring knowledge from both labelled and unlabelled data to classify test instances. The cluster assumption represents one of the potential relationships between true classes and data distribution that semi-supervised algorithms assume in order to use unlabelled data. Ensemble algorithms have been widely and successfully employed in both supervised and semi-supervised contexts. In this Thesis, we focus on the cluster assumption to study ensemble learning based on a new cluster regularisation technique for multi-class semi-supervised classification. Firstly, we introduce a multi-class cluster-based classifier, the Cluster-based Regularisation (Cluster- Reg) algorithm. ClusterReg employs a new regularisation mechanism based on posterior probabilities generated by a clustering algorithm in order to avoid generating decision boundaries that traverses high-density regions. Such a method possesses robustness to overlapping classes and to scarce labelled instances on uncertain and low-density regions, when data follows the cluster assumption. Secondly, we propose a robust multi-class boosting technique, Cluster-based Boosting (CBoost), which implements the proposed cluster regularisation for ensemble learning and uses ClusterReg as base learner. CBoost is able to overcome possible incorrect pseudo-labels and produces better generalisation than existing classifiers. And, finally, since there are often datasets with a large number of unlabelled instances, we propose the Efficient Cluster-based Boosting (ECB) for large multi-class datasets. ECB extends CBoost and has lower time and memory complexities than state-of-the-art algorithms. Such a method employs a sampling procedure to reduce the training set of base learners, an efficient clustering algorithm, and an approximation technique for nearest neighbours to avoid the computation of pairwise distance matrix. Hence, ECB enables semi-supervised classification for large-scale datasets.
APA, Harvard, Vancouver, ISO, and other styles
9

Miskin, James William. "Ensemble learning for independent component analysis." Thesis, University of Cambridge, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621116.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lind, Simon. "Distributed Ensemble Learning With Apache Spark." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-274323.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Alabdulrahman, Rabaa. "A Comparative Study of Ensemble Active Learning." Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31805.

Full text
Abstract:
Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting.
APA, Harvard, Vancouver, ISO, and other styles
12

Elahi, Haroon. "A Boosted-Window Ensemble." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5658.

Full text
Abstract:
Context. The problem of obtaining predictions from stream data involves training on the labeled instances and suggesting the class values for the unseen stream instances. The nature of the data-stream environments makes this task complicated. The large number of instances, the possibility of changes in the data distribution, presence of noise and drifting concepts are just some of the factors that add complexity to the problem. Various supervised-learning algorithms have been designed by putting together efficient data-sampling, ensemble-learning, and incremental-learning methods. The performance of the algorithm is dependent on the chosen methods. This leaves an opportunity to design new supervised-learning algorithms by using different combinations of constructing methods. Objectives. This thesis work proposes a fast and accurate supervised-learning algorithm for performing predictions on the data-streams. This algorithm is called as Boosted-Window Ensemble (BWE), which is invented using the mixture-of-experts technique. BWE uses Sliding Window, Online Boosting and incremental-learning for data-sampling, ensemble-learning, and maintaining a consistent state with the current stream data, respectively. In this regard, a sliding window method is introduced. This method uses partial-updates for sliding the window on the data-stream and is called Partially-Updating Sliding Window (PUSW). The investigation is carried out to compare two variants of sliding window and three different ensemble-learning methods for choosing the superior methods. Methods. The thesis uses experimentation approach for evaluating the Boosted-Window Ensemble (BWE). CPU-time and the Prediction accuracy are used as performance indicators, where CPU-time is the execution time in seconds. The benchmark algorithms include: Accuracy-Updated Ensemble1 (AUE1), Accuracy-Updated Ensemble2 (AUE2), and Accuracy-Weighted Ensemble (AWE). The experiments use nine synthetic and five real-world datasets for generating performance estimates. The Asymptotic Friedman test and the Wilcoxon Signed-Rank test are used for hypothesis testing. The Wilcoxon-Nemenyi-McDonald-Thompson test is used for performing post-hoc analysis. Results. The hypothesis testing suggests that: 1) both for the synthetic and real-wrold datasets, the Boosted Window Ensemble (BWE) has significantly lower CPU-time values than two benchmark algorithms (Accuracy-updated Ensemble1 (AUE1) and Accuracy-weighted Ensemble (AWE). 2) BWE returns similar prediction accuracy as AUE1 and AWE for synthetic datasets. 3) BWE returns similar prediction accuracy as the three benchmark algorithms for the real-world datasets. Conclusions. Experimental results demonstrate that the proposed algorithm can be as accurate as the state-of-the-art benchmark algorithms, while obtaining predictions from the stream data. The results further show that the use of Partially-Updating Sliding Window has resulted in lower CPU-time for BWE as compared with the chunk-based sliding window method used in AUE1, AUE2, and AWE.
APA, Harvard, Vancouver, ISO, and other styles
13

Zhao, Xiaochuang. "Ensemble Learning Method on Machine Maintenance Data." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/6056.

Full text
Abstract:
In the industry, a lot of companies are facing the explosion of big data. With this much information stored, companies want to make sense of the data and use it to help them for better decision making, especially for future prediction. A lot of money can be saved and huge revenue can be generated with the power of big data. When building statistical learning models for prediction, companies in the industry are aiming to build models with efficiency and high accuracy. After the learning models have been developed for production, new data will be generated. With the updated data, the models have to be updated as well. Due to this nature, the model performs best today doesn’t mean it will necessarily perform the same tomorrow. Thus, it is very hard to decide which algorithm should be used to build the learning model. This paper introduces a new method that ensembles the information generated by two different classification statistical learning algorithms together as inputs for another learning model to increase the final prediction power. The dataset used in this paper is NASA’s Turbofan Engine Degradation data. There are 49 numeric features (X) and the response Y is binary with 0 indicating the engine is working properly and 1 indicating engine failure. The model’s purpose is to predict whether the engine is going to pass or fail. The dataset is divided in training set and testing set. First, training set is used twice to build support vector machine (SVM) and neural network models. Second, it used the trained SVM and neural network model taking X of the training set as input to predict Y1 and Y2. Then, it takes Y1 and Y2 as inputs to build the Penalized Logistic Regression model, which is the ensemble model here. Finally, use the testing set follow the same steps to get the final prediction result. The model accuracy is calculated using overall classification accuracy. The result shows that the ensemble model has 92% accuracy. The prediction accuracies of SVM, neural network and ensemble models are compared to prove that the ensemble model successfully captured the power of the two individual learning model.
APA, Harvard, Vancouver, ISO, and other styles
14

Li, Xia. "Travel time prediction using ensemble learning algorithms." Thesis, University of Nottingham, 2018. http://eprints.nottingham.ac.uk/53358/.

Full text
Abstract:
In the research area of travel time prediction, the existing studies mainly focus on aggregated travel time prediction (without distinguishing vehicle types) or travel time prediction for passenger vehicles. The travel time prediction for freight transportation has not received enough attention from researchers. Only a few relevant studies can be found in the literature, and the proposed methods are usually very simple and lack comparisons with more advanced methods. Although many believed that a better prediction model can be developed using more advanced techniques such as artificial neural networks or autoregressive conditional heteroscedastic models, it is usually difficult and costly to train these models and the model interpretability is poor. There is a demand for `off-the-shelf' methods with good performance, ease of implementation and good model interpretability. Thus, the aims of this thesis are: (1) developing some `off-the-shelf' data-driven methods to predict travel time for freight transportation; (2) creating a comprehensive understanding of how the developed methods can be more effectively applied for general travel time prediction problems. Its two main contributions are: (1) it develops data-driven travel time prediction methods for freight transportation by utilising freight vehicles' trajectory data; (2) it investigates the relation between features and performance and discovers the combinatorial effects of features under the effects of different noise processes and different model fitting strategies. The experimental results show that useful features can be mined from the trajectory data to enhance the travel time prediction for freight transportation. The developed methods outperform some of the state-of-the-art data-driven methods.
APA, Harvard, Vancouver, ISO, and other styles
15

Aldave, Roberto. "Systematic ensemble learning and extensions for regression." Thèse, Université de Sherbrooke, 2015. http://hdl.handle.net/11143/6958.

Full text
Abstract:
Abstract : The objective is to provide methods to improve the performance, or prediction accuracy of standard stacking approach, which is an ensemble method composed of simple, heterogeneous base models, through the integration of the diversity generation, combination and/or selection stages for regression problems. In Chapter 1, we propose to combine a set of level-1 learners into a level-2 learner, or ensemble. We also propose to inject a diversity generation mechanism into the initial cross-validation partition, from which new cross-validation partitions are generated, and sub-sequent ensembles are trained. Then, we propose an algorithm to select best partition, or corresponding ensemble. In Chapter 2, we formulate the partition selection as a Pareto-based multi-criteria optimization problem, as well as an algorithm to make the partition selection iterative with the aim to improve more the ensemble prediction accuracy. In Chapter 3, we propose to generate multiple populations or partitions by injecting a diversity mechanism to the original dataset. Then, an algorithm is proposed to select the best partition among all partitions generated by the multiple populations. All methods designed and implemented in this thesis get encouraging, and favorably results across different dataset against both state-of-the-art models, and ensembles for regression.<br>Résumé : L’objectif est de fournir des techniques permettant d’améliorer la performance de l’algorithme de stacking, une méthode ensembliste composée de modèles de base simples et hétérogènes, à travers l’intégration de la génération de la diversité, la sélection et combinaison des modèles. Dans le chapitre 1, nous proposons de combiner différents sous-ensembles de modèles de base obtenus au primer niveau. Nous proposons un mécanisme pour injecter de la diversité dans la partition croisée initiale, à partir de laquelle de nouvelles partitions de validation croisée sont générées, et les ensembles correspondant sont formés. Ensuite, nous proposons un algorithme pour sélectionner la meilleure partition. Dans le chapitre 2, nous formulons la sélection de la partition comme un problème d’optimisation multi-objectif fondé sur un principe de Pareto, ainsi que d’un algorithme pour faire une application itérative de la sélection avec l’objectif d’améliorer d’avantage la précision d’ensemble. Dans le chapitre 3, nous proposons de générer plusieurs populations en injectant un mécanisme de diversité à l’ensemble de données original. Ensuite, un algorithme est proposé pour sélectionner la meilleur partition entre toutes les partitions produite par les multiples populations. Nous avons obtenu des résultats encourageants avec ces algorithmes lors de comparaisons avec des modèles reconnus sur plusieurs bases de données.
APA, Harvard, Vancouver, ISO, and other styles
16

Frery, Jordan. "Ensemble Learning for Extremely Imbalced Data Flows." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSES034.

Full text
Abstract:
L'apprentissage machine est l'étude de la conception d'algorithmes qui apprennent à partir des données d'apprentissage pour réaliser une tâche spécifique. Le modèle résultant est ensuite utilisé pour prédire de nouveaux points de données (invisibles) sans aucune aide extérieure. Ces données peuvent prendre de nombreuses formes telles que des images (matrice de pixels), des signaux (sons,...), des transactions (âge, montant, commerçant,...), des journaux (temps, alertes, ...). Les ensembles de données peuvent être définis pour traiter une tâche spécifique telle que la reconnaissance d'objets, l'identification vocale, la détection d'anomalies, etc. Dans ces tâches, la connaissance des résultats escomptés encourage une approche d'apprentissage supervisé où chaque donnée observée est assignée à une étiquette qui définit ce que devraient être les prédictions du modèle. Par exemple, dans la reconnaissance d'objets, une image pourrait être associée à l'étiquette "voiture" qui suggère que l'algorithme d'apprentissage doit apprendre qu'une voiture est contenue dans cette image, quelque part. Cela contraste avec l'apprentissage non supervisé où la tâche à accomplir n'a pas d'étiquettes explicites. Par exemple, un sujet populaire dans l'apprentissage non supervisé est de découvrir les structures sous-jacentes contenues dans les données visuelles (images) telles que les formes géométriques des objets, les lignes, la profondeur, avant d'apprendre une tâche spécifique. Ce type d'apprentissage est évidemment beaucoup plus difficile car il peut y avoir un nombre infini de concepts à saisir dans les données. Dans cette thèse, nous nous concentrons sur un scénario spécifique du cadre d'apprentissage supervisé : 1) l'étiquette d'intérêt est sous-représentée (p. ex. anomalies) et 2) l'ensemble de données augmente avec le temps à mesure que nous recevons des données d'événements réels (p. ex. transactions par carte de crédit). En fait, ces deux problèmes sont très fréquents dans le domaine industriel dans lequel cette thèse se déroule<br>Machine learning is the study of designing algorithms that learn from trainingdata to achieve a specific task. The resulting model is then used to predict overnew (unseen) data points without any outside help. This data can be of manyforms such as images (matrix of pixels), signals (sounds,...), transactions (age,amount, merchant,...), logs (time, alerts, ...). Datasets may be defined to addressa specific task such as object recognition, voice identification, anomaly detection,etc. In these tasks, the knowledge of the expected outputs encourages a supervisedlearning approach where every single observed data is assigned to a label thatdefines what the model predictions should be. For example, in object recognition,an image could be associated with the label "car" which suggests that the learningalgorithm has to learn that a car is contained in this picture, somewhere. This is incontrast with unsupervised learning where the task at hand does not have explicitlabels. For example, one popular topic in unsupervised learning is to discoverunderlying structures contained in visual data (images) such as geometric formsof objects, lines, depth, before learning a specific task. This kind of learning isobviously much harder as there might be potentially an infinite number of conceptsto grasp in the data. In this thesis, we focus on a specific scenario of thesupervised learning setting: 1) the label of interest is under represented (e.g.anomalies) and 2) the dataset increases with time as we receive data from real-lifeevents (e.g. credit card transactions). In fact, these settings are very common inthe industrial domain in which this thesis takes place
APA, Harvard, Vancouver, ISO, and other styles
17

Kuntala, Prashant Kumar. "Optimizing Biomarkers From an Ensemble Learning Pipeline." Ohio University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1503592057943043.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Lin, Hsuan-Tien Abu-Mostafa Yaser S. "Infinite ensemble learning with Support Vector Machines /." Diss., Pasadena, Calif. : California Institute of Technology, 2005. http://resolver.caltech.edu/CaltechETD:etd-05262005-030549.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Zanda, Manuela. "A probabilistic perspective on ensemble diversity." Thesis, University of Manchester, 2010. https://www.research.manchester.ac.uk/portal/en/theses/a-probabilistic-perspective-on-ensemble-diversity(06296f74-806a-42dc-a65f-f7607f67d9f5).html.

Full text
Abstract:
We study diversity in classifier ensembles from a broader perspectivethan the 0/1 loss function, the main reason being that the bias-variance decomposition of the 0/1 loss function is not unique, and therefore the relationship between ensemble accuracy and diversity is still unclear. In the parallel field of regression ensembles, where the loss function of interest is the mean squared error, this decomposition not only exists, but it has been shown that diversity can be managed via the Negative Correlation (NC) framework. In the field of probabilistic modelling the expected value of the negative log-likelihood loss function is given by its conditional entropy; this result suggests that interaction information might provide some insight into the trade off between accuracy and diversity. Our objective is to improve our understanding of classifier diversity by focusing on two different loss functions - the mean squared error and the negative log-likelihood. In a study of mean squared error functions, we reformulate the Tumer & Ghosh model for the classification error as a regression problem, and we show how the NC learning framework can be deployed to manage diversity in classification problems. In an empirical study of classifiers that minimise the negative log-likelihood loss function, we discuss model diversity as opposed to error diversity in ensembles of Naive Bayes classifiers. We observe that diversity in low-variance classifiers has to be structurally inferred. We apply interaction information to the problem of monitoring diversity in classifier ensembles. We present empirical evidence that interaction information can capture the trade-off between accuracy and diversity, and that diversity occurs at different levels of interactions between base classifiers. We use interaction information properties to build ensembles of structurally diverse averaged Augmented Naive Bayes classifiers. Our empirical study shows that this novel ensemble approach is computationally more efficient than an accuracy based approach and at the same time it does not negatively affect the ensemble classification performance.
APA, Harvard, Vancouver, ISO, and other styles
20

Ngo, Khai Thoi. "Stacking Ensemble for auto_ml." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/83547.

Full text
Abstract:
Machine learning has been a subject undergoing intense study across many different industries and academic research areas. Companies and researchers have taken full advantages of various machine learning approaches to solve their problems; however, vast understanding and study of the field is required for developers to fully harvest the potential of different machine learning models and to achieve efficient results. Therefore, this thesis begins by comparing auto ml with other hyper-parameter optimization techniques. auto ml is a fully autonomous framework that lessens the knowledge prerequisite to accomplish complicated machine learning tasks. The auto ml framework automatically selects the best features from a given data set and chooses the best model to fit and predict the data. Through multiple tests, auto ml outperforms MLP and other similar frameworks in various datasets using small amount of processing time. The thesis then proposes and implements a stacking ensemble technique in order to build protection against over-fitting for small datasets into the auto ml framework. Stacking is a technique used to combine a collection of Machine Learning models’ predictions to arrive at a final prediction. The stacked auto ml ensemble results are more stable and consistent than the original framework; across different training sizes of all analyzed small datasets.<br>Master of Science
APA, Harvard, Vancouver, ISO, and other styles
21

Ontañón, Villar Santi. "Ensemble Case Based Learning for Multi-Agent Systems." Doctoral thesis, Universitat Autònoma de Barcelona, 2005. http://hdl.handle.net/10803/3050.

Full text
Abstract:
Esta monografía presenta un marco de trabajo para el aprendizaje en un escenario de datos distribuidos y con control descentralizado. Hemos basado nuestro marco de trabajo en Sistemas Multi-Agente (MAS) para poder tener control descentralizado, y en Razonamiento Basado en Casos (CBR), dado que su naturaleza de aprendizaje perezoso lo hacen adecuado para sistemas multi-agentes dinámicos. Además, estamos interesados en agentes autónomos que funcionen como ensembles. Un ensemble de agentes soluciona problemas de la siguiente manera: cada agente individual soluciona el problema actual individualmente y hace su predicción, entonces todas esas predicciones se agregan para formar una predicción global. Así pues, en este trabajo estamos interesados en desarrollar estrategias de aprendizaje basadas en casos y en ensembles para sistemas multi-agente.<br/>Concretamente, presentaremos un marco de trabajo llamado Razonamiento Basado en Casos Multi-Agente (MAC), una aproximación al CBR basada en agentes. Cada agente individual en un sistema MAC es capaz de aprender y solucionar problemas individualmente utilizando CBR con su base de casos individual. Además, cada base de<br/>casos es propiedad de un agente individual, y cualquier información de dicha base de casos será revelada o compartida únicamente si el agente lo decide así. Por tanto, este marco de trabajo preserva la privacidad de los datos y la autonomía de los agentes para revelar información.<br/>Ésta tesis se centra en desarrollar estrategias para que agentes individuales con capacidad de aprender puedan incrementar su rendimiento tanto cuando trabajan individualmente como cuando trabajan como un ensemble. Además, las decisiones en un sistema MAC se toman de manera descentralizada, dado que cada agente tiene autonomía de decisión. Por tanto, las técnicas desarrolladas en este marco de trabajo consiguen un incremento del rendimiento como resultado de decisiones individuales tomadas de manera descentralizada. Concretamente, presentaremos tres tipos de estrategias: estrategias para crear ensembles de agentes, estrategias para realizar retención de casos en sistemas multi-agente, y estrategias para realizar redistribución de casos.<br>This monograph presents a framework for learning in a distributed data scenario with decentralized decision making. We have based our framework in Multi-Agent Systems (MAS) in order to have decentralized decision making, and in Case-Based Reasoning (CBR), since the lazy learning nature of CBR is suitable for dynamic multi-agent systems. Moreover, we are interested in autonomous agents that collaboratively<br/>work as ensembles. An ensemble of agents solves problems in the following way: each individual agent solves the problem at hand individually and makes its individual prediction, then all those predictions are aggregated to form a global prediction. Therefore, in this work we are interested in developing ensemble case based<br/>learning strategies for multi-agent systems.<br/>Specifically, we will present the Multi-Agent Case Based Reasoning (MAC) framework, a multi-agent approach to CBR. Each individual agent in a MAC system is capable of individually learn and solve problems using CBR with an individual case base. Moreover, each case base is owned and managed by an individual agent, and any information is disclosed or shared only if the agent decides so. Thus, this framework preserves the privacy of data, and the autonomy to disclose data.<br/>The focus of this thesis is to develop strategies so that individual learning agents improve their performance both individually and as an ensemble. Moreover, decisions in the MAC framework are made in a decentralized way since each individual agent has decision autonomy. Therefore, techniques developed in this framework achieve an improvement of individual and ensemble performance as a result of individual decisions made in a decentralized way. Specifically, we will present three kind of strategies: strategies to form ensembles of agents, strategies to perform case retention in multi-agent systems, and strategies to perform case redistribution.
APA, Harvard, Vancouver, ISO, and other styles
22

Nnamoko, N. A. "Ensemble-based supervised learning for predicting diabetes onset." Thesis, Liverpool John Moores University, 2018. http://researchonline.ljmu.ac.uk/8337/.

Full text
Abstract:
The research presented in this thesis aims to address the issue of undiagnosed diabetes cases. The current state of knowledge is that one in seventy people in the United Kingdom are living with undiagnosed diabetes, and only one in a hundred people could identify the main signs of diabetes. Some of the tools available for predicting diabetes are either too simplistic and/or rely on superficial data for inference. On the positive side, the National Health Service (NHS) are improving data recording in this domain by offering health check to adults aged 40 - 70. Data from such programme could be utilised to mitigate the issue of superficial data; but also help to develop a predictive tool that facilitates a change from the current reactive care, onto one that is proactive. This thesis presents a tool based on a machine learning ensemble for predicting diabetes onset. Ensembles often perform better than a single classifier, and accuracy and diversity have been highlighted as the two vital requirements for constructing good ensemble classifiers. Experiments in this thesis explore the relationship between diversity from heterogeneous ensemble classifiers and the accuracy of predictions through feature subset selection in order to predict diabetes onset. Data from a national health check programme (similar to NHS health check) was used. The aim is to predict diabetes onset better than other similar studies within the literature. For the experiments, predictions from five base classifiers (Sequential Minimal Optimisation (SMO), Radial Basis Function (RBF), Naïve Bayes (NB), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and C4.5 decision tree), performing the same task, are exploited in all possible combinations to construct 26 ensemble models. The training data feature space was searched to select the best feature subset for each classifier. Selected subsets are used to train the classifiers and their predictions are combined using k-Nearest Neighbours algorithm as meta-classifier. Results are analysed using four performance metrics (accuracy, sensitivity, specificity and AUC) to determine (i) if ensembles always perform better than single classifier; and (ii) the impact of diversity (from heterogeneous classifiers) and accuracy (through feature subset selection) on ensemble performance. At base classification level, RBF produced better results than the other four classifiers with 78%accuracy, 82% sensitivity, 73% specificity and 85% AUC. A comparative study shows that RBF model is more accurate than 9 ensembles, more sensitive than 13 ensembles, more specific than 9 ensembles; and produced better AUC than 25 ensembles. This means that ensembles do not always perform better than its constituent classifiers. Of those ensembles that performed better than RBF, the combination of C4.5, RIPPER and NB produced the highest results with 83% accuracy, 87% sensitivity, 79% specificity, and 86% AUC. When compared to the RBF model, the result shows 5.37% accuracy improvement which is significant (p = 0.0332). The experiments show how data from medical health examination can be utilised to address the issue of undiagnosed cases of diabetes. Models constructed with such data would facilitate the much desired shift from preventive to proactive care for individuals at high risk of diabetes. From the machine learning view point, it was established that ensembles constructed based on diverse and accurate base learners, have the potential to produce significant improvement in accuracy, compared to its individual constituent classifiers. In addition, the ensemble presented in this thesis is at least 1% and at most 23% more accurate than similar research studies found within the literature. This validates the superiority of the method implemented.
APA, Harvard, Vancouver, ISO, and other styles
23

King, Michael Allen. "Ensemble Learning Techniques for Structured and Unstructured Data." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/51667.

Full text
Abstract:
This research provides an integrated approach of applying innovative ensemble learning techniques that has the potential to increase the overall accuracy of classification models. Actual structured and unstructured data sets from industry are utilized during the research process, analysis and subsequent model evaluations. The first research section addresses the consumer demand forecasting and daily capacity management requirements of a nationally recognized alpine ski resort in the state of Utah, in the United States of America. A basic econometric model is developed and three classic predictive models evaluated the effectiveness. These predictive models were subsequently used as input for four ensemble modeling techniques. Ensemble learning techniques are shown to be effective. The second research section discusses the opportunities and challenges faced by a leading firm providing sponsored search marketing services. The goal for sponsored search marketing campaigns is to create advertising campaigns that better attract and motivate a target market to purchase. This research develops a method for classifying profitable campaigns and maximizing overall campaign portfolio profits. Four traditional classifiers are utilized, along with four ensemble learning techniques, to build classifier models to identify profitable pay-per-click campaigns. A MetaCost ensemble configuration, having the ability to integrate unequal classification cost, produced the highest campaign portfolio profit. The third research section addresses the management challenges of online consumer reviews encountered by service industries and addresses how these textual reviews can be used for service improvements. A service improvement framework is introduced that integrates traditional text mining techniques and second order feature derivation with ensemble learning techniques. The concept of GLOW and SMOKE words is introduced and is shown to be an objective text analytic source of service defects or service accolades.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
24

Nguyen, Thanh Tien. "Ensemble Learning Techniques and Applications in Pattern Classification." Thesis, Griffith University, 2017. http://hdl.handle.net/10072/366342.

Full text
Abstract:
It is widely known that the best classifier for a given problem is often problem dependent and there is no one classification algorithm that is the best for all classification tasks. A natural question that arise is: can we combine multiple classification algorithms to achieve higher classification accuracy than a single one? That is the idea behind a class of methods called ensemble method. Ensemble method is defined as the combination of several classifiers with the aim of achieving lower classification error rate than using a single classifier. Ensemble methods have been applying to various applications ranging from computer aided medical diagnosis, computer vision, software engineering, to information retrieval. In this study, we focus on heterogeneous ensemble methods in which a fixed set of diverse learning algorithms are learned on the same training set to generate the different classifiers and the class prediction is then made based on the output of these classifiers (called Level1 data or meta-data). The research on heterogeneous ensemble methods is mainly focused on two aspects: (i) to propose efficient classifiers combining methods on meta-data to achieve high accuracy, and (ii) to optimize the ensemble by performing feature and classifier selection. Although various approaches related to heterogeneous ensemble methods have been proposed, some research gaps still exist First, in ensemble learning, the meta-data of an observation reflects the agreement and disagreement between the different base classifiers.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>School of Information and Communication Technology<br>Science, Environment, Engineering and Technology<br>Full Text
APA, Harvard, Vancouver, ISO, and other styles
25

Oxenstierna, Johan. "Predicting house prices using Ensemble Learning with Cluster Aggregations." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-345157.

Full text
Abstract:
The purpose of this investigation, as prescribed by Valueguard AB, was to evaluate the utility of Machine Learning (ML) models to estimate prices on samples of their housing dataset. Specifically,the aim was to minimize the Median Absolute Percent Error (MDAPE) of the predictions. Valueguard were particularly interested in models where the dataset is clustered by coordinates and/or attributes in various ways to see if this can improve results. Ensemble Learning models with cluster aggregations were built and compared against similar model counterparts which do not partition the data. The weak learners were either lazy kNN learners (k nearest neighbors), or eager ANN learners (artificial neural networks) and the test set objects were either classified to single weak learners or tuned to multiple weak learners. The best results were achieved by the cluster aggregation model where test objects were tuned to multiple weak learners and it also showed the most potential for improvement.
APA, Harvard, Vancouver, ISO, and other styles
26

Minku, Leandro Lei. "Online ensemble learning in the presence of concept drift." Thesis, University of Birmingham, 2011. http://etheses.bham.ac.uk//id/eprint/1334/.

Full text
Abstract:
In online learning, each training example is processed separately and then discarded. Environments that require online learning are often non-stationary and their underlying distributions may change over time (concept drift). Even though ensembles of learning machines have been used for handling concept drift, there has been no deep study of why they can be helpful for dealing with drifts and which of their features can contribute for that. The thesis mainly investigates how ensemble diversity affects accuracy in online learning in the presence of concept drift and how to use diversity in order to improve accuracy in changing environments. This is the first diversity study in the presence of concept drift. The main contributions of the thesis are: - An analysis of negative correlation in online learning. - A new concept drift categorisation to allow principled studies of drifts. - A better understanding of when, how and why ensembles of learning machines can help to handle concept drift in online learning. - Knowledge of how to use information learnt from the old concept to aid the learning of the new concept. - A new approach called Diversity for Dealing with Drifts (DDD), which is accurate both in the presence and absence of drifts.
APA, Harvard, Vancouver, ISO, and other styles
27

Bass, Gideon. "Ensemble supervised and unsupervised learning with Kepler variable stars." Thesis, George Mason University, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10027479.

Full text
Abstract:
<p>Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. I present a study on variable stars in the Kepler field using these techniques, and the novel work of unsupervised learning. I use new methods of characterization and multiple independent classifiers to produce an ensemble classifier that equals or matches existing classification abilities. I also explore the possibilities of unsupervised learning in making novel feature discovery in stars.
APA, Harvard, Vancouver, ISO, and other styles
28

Velka, Elina. "Loss Given Default Estimation with Machine Learning Ensemble Methods." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279846.

Full text
Abstract:
This thesis evaluates the performance of three machine learning methods in prediction of the Loss Given Default (LGD). LGD can be seen as the opposite of the recovery rate, i.e. the ratio of an outstanding loan that the loan issuer would not be able to recover in case the customer would default. The methods investigated are decision trees, random forest and boosted methods. All of the methods investigated performed well in predicting the cases were the loan is not recovered, LGD = 1 (100%), or the loan is totally recovered, LGD = 0 (0% ). When the performance of the models was evaluated on a dataset where the observations with LGD = 1 were removed, a significant decrease in performance was observed. The random forest model built on an unbalanced training dataset showed better performance on the test dataset that included values LGD = 1 and the random forest model built on a balanced training dataset performed better on the test set where the observations of LGD = 1 were removed. Boosted models evaluated in this study showed less accurate predictions than other methods used. Overall, the performance of random forest models showed slightly better results than the performance of decision tree models, although the computational time (the cost) was considerably longer when running the random forest models. Therefore decision tree models would be suggested for prediction of the Loss Given Default.<br>Denna uppsats undersöker och jämför tre maskininlärningsmetoder som estimerar förlust vid fallissemang (Loss Given Default, LGD). LGD kan ses som motsatsen till återhämtningsgrad, dvs. andelen av det utstående lånet som långivaren inte skulle återfå ifall kunden skulle fallera. Maskininlärningsmetoder som undersöks i detta arbete är decision trees, random forest och boosted metoder. Alla metoder fungerade väl vid estimering av lån som antingen inte återbetalas, dvs. LGD = 1 (100%), eller av lån som betalas i sin helhet, LGD = 0 (0%). En tydlig minskning i modellernas träffsäkerhet påvisades när modellerna kördes med ett dataset där observationer med LGD = 1 var borttagna. Random forest modeller byggda på ett obalanserat träningsdataset presterade bättre än de övriga modellerna på testset som inkluderade observationer där LGD = 1. Då observationer med LGD = 1 var borttagna visade det sig att random forest modeller byggda på ett balanserat träningsdataset presterade bättre än de övriga modellerna. Boosted modeller visade den svagaste träffsäkerheten av de tre metoderna som blev undersökta i denna studie. Totalt sett visade studien att random forest modeller byggda på ett obalanserat träningsdataset presterade en aning bättre än decision tree modeller, men beräkningstiden (kostnaden) var betydligt längre när random forest modeller kördes. Därför skulle decision tree modeller föredras vid estimering av förlust vid fallissemang.
APA, Harvard, Vancouver, ISO, and other styles
29

Hess, Andreas. "Supervised and unsupervised ensemble learning for the semantic web." [Mainz] [A. Hess], 2006. http://d-nb.info/99714971X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Stensved, Catharina. "Ensemble och tillåtande rum." Thesis, Kungl. Musikhögskolan, Institutionen för musik, pedagogik och samhälle, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kmh:diva-3605.

Full text
Abstract:
Denna studie syftar till att ur ett sociokulturellt perspektiv undersöka musiklärares erfarenheter och resonemang om att skapa tillåtande rum i ensembleundervisningen med fokus på relationsbyggande, kommunikation, individanpassning och gruppdynamik. Tre verksamma musiklärare deltog i en fokusgruppsintervju som spelades in och transkriberades till text, därefter skedde en bearbetning och analys av det insamlade materialet. Intervjuutsagorna utmynnade i fem större teman med tillhörande underkategorier. Resultatet visade att fokusgruppens arbetssätt och syn på tillåtande rum bland annat innefattar: Relationsbyggande där läraren har ett professionellt förhållningssätt genom att vara personlig men aldrig privat och visar intresse för eleverna utanför klassrummet samt är generös med att visa sina egna misstag, individanpassning som sker på olika sätt i undervisningen och möjliggör förutsättningar för varje elev att utvecklas på sin nivå, kommunikation där läraren undviker negativt laddade ord, adresserar positiv feedback när den ges och inkluderar eleverna i att ge varandra feedback samt ser till att den verbala och icke-verbala kommunikationen rimmar med varandra, gruppdynamik där eleverna själva får utforma riktlinjer i ensemblesalen samt ge gruppen tid att tillsammans testa gränser och misslyckas utan att behöva känna pressen att det varje gång ska visas upp på konsert. De faktorer som står i vägen för att rummet ska bli tillåtande är enligt lärarna: tidsbrist, yttre omständigheter så som psykisk ohälsa i elevgruppen samt styrdokumenten.
APA, Harvard, Vancouver, ISO, and other styles
31

McElwee, Steven M. "Probabilistic Clustering Ensemble Evaluation for Intrusion Detection." Thesis, Nova Southeastern University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10844875.

Full text
Abstract:
<p> Intrusion detection is the practice of examining information from computers and networks to identify cyberattacks. It is an important topic in practice, since the frequency and consequences of cyberattacks continues to increase and affect organizations. It is important for research, since many problems exist for intrusion detection systems. Intrusion detection systems monitor large volumes of data and frequently generate false positives. This results in additional effort for security analysts to review and interpret alerts. After long hours spent reviewing alerts, security analysts become fatigued and make bad decisions. There is currently no approach to intrusion detection that reduces the workload of human analysts by providing a probabilistic prediction that a computer is experiencing a cyberattack. </p><p> This research addressed this problem by estimating the probability that a computer system was being attacked, rather than alerting on individual events. This research combined concepts from cyber situation awareness by applying clustering ensembles, probability analysis, and active learning. The unique contribution of this research is that it provides a higher level of meaning for intrusion alerts than traditional approaches. </p><p> Three experiments were conducted in the course of this research to demonstrate the feasibility of these concepts. The first experiment evaluated cluster generation approaches that provided multiple perspectives of network events using unsupervised machine learning. The second experiment developed and evaluated a method for detecting anomalies from the clustering results. This experiment also determined the probability that a computer system was being attacked. Finally, the third experiment integrated active learning into the anomaly detection results and evaluated its effectiveness in improving the accuracy. </p><p> This research demonstrated that clustering ensembles with probabilistic analysis were effective for identifying normal events. Abnormal events remained uncertain and were assigned a belief. By aggregating the belief to find the probability that a computer system was under attack, the resulting probability was highly accurate for the source IP addresses and reasonably accurate for the destination IP addresses. Active learning, which simulated feedback from a human analyst, eliminated the residual error for the destination IP addresses with a low number of events that required labeling.</p><p>
APA, Harvard, Vancouver, ISO, and other styles
32

Hsu, Samantha. "CLEAVER: Classification of Everyday Activities Via Ensemble Recognizers." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1960.

Full text
Abstract:
Physical activity can have immediate and long-term benefits on health and reduce the risk for chronic diseases. Valid measures of physical activity are needed in order to improve our understanding of the exact relationship between physical activity and health. Activity monitors have become a standard for measuring physical activity; accelerometers in particular are widely used in research and consumer products because they are objective, inexpensive, and practical. Previous studies have experimented with different monitor placements and classification methods. However, the majority of these methods were developed using data collected in controlled, laboratory-based settings, which is not reliably representative of real life data. Therefore, more work is required to validate these methods in free-living settings. For our work, 25 participants were directly observed by trained observers for two two-hour activity sessions over a seven day timespan. During the sessions, the participants wore accelerometers on the wrist, thigh, and chest. In this thesis, we tested a battery of machine learning techniques, including a hierarchical classification schema and a confusion matrix boosting method to predict activity type, activity intensity, and sedentary time in one-second intervals. To do this, we created a dataset containing almost 100 hours worth of observations from three sets of accelerometer data from an ActiGraph wrist monitor, a BioStampRC thigh monitor, and a BioStampRC chest monitor. Random forest and k-nearest neighbors are shown to consistently perform the best out of our traditional machine learning techniques. In addition, we reduce the severity of error from our traditional random forest classifiers on some monitors using a hierarchical classification approach, and combat the imbalanced nature of our dataset using a multi-class (confusion matrix) boosting method. Out of the three monitors, our models most accurately predict activity using either or both of the BioStamp accelerometers (with the exception of the chest BioStamp predicting sedentary time). Our results show that we outperform previous methods while still predicting behavior at a more granular level.
APA, Harvard, Vancouver, ISO, and other styles
33

Thompson, Simon Giles. "Distributed boosting algorithms." Thesis, University of Portsmouth, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.285529.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Xuan. "An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/30702.

Full text
Abstract:
We propose a new ensemble algorithm: the meta-boosting algorithm. This algorithm enables the original Adaboost algorithm to improve the decisions made by different WeakLearners utilizing the meta-learning approach. Better accuracy results are achieved since this algorithm reduces both bias and variance. However, higher accuracy also brings higher computational complexity, especially on big data. We then propose the parallelized meta-boosting algorithm: Parallelized-Meta-Learning (PML) using the MapReduce programming paradigm on Hadoop. The experimental results on the Amazon EC2 cloud computing infrastructure show that PML reduces the computation complexity enormously while retaining lower error rates than the results on a single computer. As we know MapReduce has its inherent weakness that it cannot directly support iterations in an algorithm, our approach is a win-win method, since it not only overcomes this weakness, but also secures good accuracy performance. The comparison between this approach and a contemporary algorithm AdaBoost.PL is also performed.
APA, Harvard, Vancouver, ISO, and other styles
35

Conesa, Gago Agustin. "Methods to combine predictions from ensemble learning in multivariate forecasting." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-103600.

Full text
Abstract:
Making predictions nowadays is of high importance for any company, whether small or large, as thanks to the possibility to analyze the data available, new market opportunities can be found, risks and costs can be reduced, among others. Machine learning algorithms for time series can be used for predicting future values of interest. However, choosing the appropriate algorithm and tuning its metaparameters require a great level of expertise. This creates an adoption barrier for small and medium enterprises which could not afford hiring a machine learning expert to their IT team. For these reasons, this project studies different possibilities to make good predictions based on machine learning algorithms, but without requiring great theoretical knowledge from the users. Moreover, a software package that implements the prediction process has been developed. The software is an ensemble method that first predicts a value taking into account different algorithms at the same time, and then it combines their results considering also the previous performance of each algorithm to obtain a final prediction of the value. Moreover, the solution proposed and implemented in this project can also predict according to a concrete objective (e.g., optimize the prediction, or do not exceed the real value) because not every prediction problem is subject to the same constraints. We have experimented and validated the implementation with three different cases. In all of them, a better performance has been obtained in comparison with each of the algorithms involved, reaching improvements of 45 to 95%.
APA, Harvard, Vancouver, ISO, and other styles
36

Oskooi, Behzad. "Using Ensemble Learning to Improve Classification Accuracy in Medical Data." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-177257.

Full text
Abstract:
Currently, electronic medical instruments are widely used in hospitals, medical polyclinics and doctors' offices to gather vital information about patients' bodies. Experts interpret medical data to distinguish the causes of illnesses. EEG is an example of a form of medical information that has many features. If the number of samples of patients is enlarged the volume of EEG data can increase dramatically and consequently exceed the limited capacity that can possibly be classified.  In order to solve the problems posed by the limitations of the current classification ability, SVMs are used. In some applications such as cognitive science, the accuracy rate of SVM classifiers is low. This fact is due to the complexity of the problem. The low accuracy rate may be caused by inappropriate feature space or the inability of classifiers to generalize results. SVM Ensembles can vastly improve generalization as, although some classifiers are not trained well enough to excel globally, they can at least achieve an acceptable local performance. This study's intention was to investigate the enhancement of classifier performance possible by applying SVM ensembles to classify two groups of data that were gathered during a type of healing operation known as Reiki, performed by a professional, and a placebo with an ordinary person pretending to perform it. Genetic algorithm is also the applied to this data to find the best features and feature combinations that reduce training time whilst increasing the correction classification rate.
APA, Harvard, Vancouver, ISO, and other styles
37

Farrash, Majed. "Machine learning ensemble method for discovering knowledge from big data." Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/59367/.

Full text
Abstract:
Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.
APA, Harvard, Vancouver, ISO, and other styles
38

Jurek, Anna. "A new approach to classifier ensemble learning based on clustering." Thesis, Ulster University, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.591073.

Full text
Abstract:
The problem of combining multiple classifiers, referred to as a classifier ensemble, is one sub-domain of machine learning which has received significant attention in recent years. A classifier ensemble is an integration of classification models, referred to as base classifiers, whose individual decisions are combined in order to obtain a final prediction. The aim of using a classifier ensemble is to provide an overall level of performance which is superior to the performance of any of the single base classifiers. This Thesis studies the problem of constructing a classifier ensemble from different perspectives with the aim of improving the overall level of performance. Two novel ensemble techniques were introduced and evaluated within this study. The first approach, referred to as Classification by Cluster Analysis, was proposed as an alternative solution to the Stacking technique. The new method applies a clustering technique for the purpose of combining base classifier outputs. This approach offers benefits with reduced classification time compared with existing ensemble methods. In addition, it outperformed other ensemble methods in terms of classification accuracy. As an extension to the concept the method was adapted to incorporate semisupervised learning which is subsequently considered as a new research direclion within the domain of ensemble learning. The second method, referred to as Cluster-Based Classifier Ensemble, was proposed as an alternative to the Nearest Neighbour classifier ensemble. It applies a clustering technique for the purpose of generating base classifiers. A new combining function was proposed to be applied with the method as an alternative to the conventional majority voting technique. The new approach outperforms existing ensemble methods in terms of accuracy and efficiency. Both methods were evaluated in an activity recognition problem considered within the work as a case study. The effectiveness of the two methods was further supported by the findings from an experimental evaluation with a real world data set.
APA, Harvard, Vancouver, ISO, and other styles
39

Valenzuela, Russell. "Predicting National Basketball Association Game Outcomes Using Ensemble Learning Techniques." Thesis, California State University, Long Beach, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10980443.

Full text
Abstract:
<p> There have been a number of studies that try to predict sporting event outcomes. Most previous research has involved results in football and college basketball. Recent years has seen similar approaches carried out in professional basketball. This thesis attempts to build upon existing statistical techniques and apply them to the National Basketball Association using a synthesis of algorithms as motivation. A number of ensemble learning methods will be utilized and compared in hopes of improving the accuracy of single models. Individual models used in this thesis will be derived from Logistic Regression, Na&iuml;ve Bayes, Random Forests, Support Vector Machines, and Artificial Neural Networks while aggregation techniques include Bagging, Boosting, and Stacking. Data from previous seasons and games from both?players and teams will be used to train models in R.</p><p>
APA, Harvard, Vancouver, ISO, and other styles
40

Kanneganti, Alekhya. "Using Ensemble Machine Learning Methods in Estimating Software Development Effort." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20691.

Full text
Abstract:
Background: Software Development Effort Estimation is a process that focuses on estimating the required effort to develop a software project with a minimal budget. Estimating effort includes interpretation of required manpower, resources, time and schedule. Project managers are responsible for estimating the required effort. A model that can predict software development effort efficiently comes in hand and acts as a decision support system for the project managers to enhance the precision in estimating effort. Therefore, the context of this study is to increase the efficiency in estimating software development effort. Objective: The main objective of this thesis is to identify an effective ensemble method to build and implement it, in estimating software development effort. Apart from this, parameter tuning is also implemented to improve the performance of the model. Finally, we compare the results of the developed model with the existing models. Method: In this thesis, we have adopted two research methods. Initially, a Literature Review was conducted to gain knowledge on the existing studies, machine learning techniques, datasets, ensemble methods that were previously used in estimating Software Development Effort. Then a controlled Experiment was conducted in order to build an ensemble model and to evaluate the performance of the ensemble model for determining if the developed model has a better performance when compared to the existing models.   Results: After conducting literature review and collecting evidence, we have decided to build and implement stacked generalization ensemble method in this thesis, with the help of individual machine learning techniques like Support vector regressor (SVR), K-Nearest Neighbors regressor (KNN), Decision Tree Regressor (DTR), Linear Regressor (LR), Multi-Layer Perceptron Regressor (MLP) Random Forest Regressor (RFR), Gradient Boosting Regressor (GBR), AdaBoost Regressor (ABR), XGBoost Regressor (XGB). Likewise, we have decided to implement Randomized Parameter Optimization and SelectKbest function to implement feature section. Datasets like COCOMO81, MAXWELL, ALBERCHT, DESHARNAIS were used. Results of the experiment show that the developed ensemble model performs at its best, for three out of four datasets. Conclusion: After evaluating and analyzing the results obtained, we can conclude that the developed model works well with the datasets that have continuous, numeric type of values. We can also conclude that the developed ensemble model outperforms other existing models when implemented with COCOMO81, MAXWELL, ALBERCHT datasets.
APA, Harvard, Vancouver, ISO, and other styles
41

Yang, Yun. "Unsupervised ensemble learning and its application to temporal data clustering." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/unsupervised-ensemble-learning-and-its-application-to-temporal-data-clustering(2d7a98f3-8349-4f61-9dc3-74a586c50c79).html.

Full text
Abstract:
Temporal data clustering can provide underpinning techniques for the discovery of intrinsic structures and can condense or summarize information contained in temporal data, demands made in various fields ranging from time series analysis to understanding sequential data. In the context of the treatment of data dependency in temporal data, existing temporal data clustering algorithms can be classified in three categories: model-based, temporal-proximity and feature-based clustering. However, unlike static data, temporal data have many distinct characteristics, including high dimensionality, complex time dependency, and large volume, all of which make the clustering of temporal data more challenging than conventional static data clustering. A large of number of recent studies have shown that unsupervised ensemble approaches improve clustering quality by combining multiple clustering solutions into a single consolidated clustering ensemble that has the best performance among given clustering solutions. This thesis systemically reviews existing temporal clustering and unsupervised ensemble learning techniques and proposes three unsupervised ensemble learning approaches for temporal data clustering. The first approach is based on the ensemble of HMM k-models clustering, associated with agglomerative clustering refinement, for solving problems with finding the intrinsic number of clusters, model initialization sensitivity and computational cost, problems which exist in most forms of model-based clustering. Secondly, we propose a sampling-based clustering ensemble approach namely the iteratively constructed clustering ensemble. Our approach iteratively constructs multiple partitions on the subset of whole input instances selected by a smart weighting scheme, combining the strength of both boosting and bagging approaches whilst attempting to simultaneously avoid their drawbacks. Finally, we propose a weighted ensemble learning approach to temporal data clustering which combines partitions obtained by different representations of temporal data. As a result, this approach has the capability to capture the properties of temporal data and the synergy created by reconciling diverse partitions due to combining different representations. The proposed weighted function has out-standing ability in automatic model selection and appropriate grouping for complex temporal data.
APA, Harvard, Vancouver, ISO, and other styles
42

Bustos, Ricardo Gacitua. "OntoLancs : An evaluation framework for ontology learning by ensemble methods." Thesis, Lancaster University, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.533089.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Gharroudi, Ouadie. "Ensemble multi-label learning in supervised and semi-supervised settings." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE1333/document.

Full text
Abstract:
L'apprentissage multi-label est un problème d'apprentissage supervisé où chaque instance peut être associée à plusieurs labels cibles simultanément. Il est omniprésent dans l'apprentissage automatique et apparaît naturellement dans de nombreuses applications du monde réel telles que la classification de documents, l'étiquetage automatique de musique et l'annotation d'images. Nous discutons d'abord pourquoi les algorithmes multi-label de l'etat-de-l'art utilisant un comité de modèle souffrent de certains inconvénients pratiques. Nous proposons ensuite une nouvelle stratégie pour construire et agréger les modèles ensemblistes multi-label basés sur k-labels. Nous analysons ensuite en profondeur l'effet de l'étape d'agrégation au sein des approches ensemblistes multi-label et étudions comment cette agrégation influece les performances de prédictive du modèle enfocntion de la nature de fonction cout à optimiser. Nous abordons ensuite le problème spécifique de la selection de variables dans le contexte multi-label en se basant sur le paradigme ensembliste. Trois méthodes de sélection de caractéristiques multi-label basées sur le paradigme des forêts aléatoires sont proposées. Ces méthodes diffèrent dans la façon dont elles considèrent la dépendance entre les labels dans le processus de sélection des varibales. Enfin, nous étendons les problèmes de classification et de sélection de variables au cadre d'apprentissage semi-supervisé. Nous proposons une nouvelle approche de sélection de variables multi-label semi-supervisée basée sur le paradigme de l'ensemble. Le modèle proposé associe des principes issues de la co-training en conjonction avec une métrique interne d'évaluation d'importnance des varaibles basée sur les out-of-bag. Testés de manière satisfaisante sur plusieurs données de référence, les approches développées dans cette thèse sont prometteuses pour une variété d'ap-plications dans l'apprentissage multi-label supervisé et semi-supervisé. Testés de manière satisfaisante sur plusieurs jeux de données de référence, les approches développées dans cette thèse affichent des résultats prometteurs pour une variété domaine d'applications de l'apprentissage multi-label supervisé et semi-supervisé<br>Multi-label learning is a specific supervised learning problem where each instance can be associated with multiple target labels simultaneously. Multi-label learning is ubiquitous in machine learning and arises naturally in many real-world applications such as document classification, automatic music tagging and image annotation. In this thesis, we formulate the multi-label learning as an ensemble learning problem in order to provide satisfactory solutions for both the multi-label classification and the feature selection tasks, while being consistent with respect to any type of objective loss function. We first discuss why the state-of-the art single multi-label algorithms using an effective committee of multi-label models suffer from certain practical drawbacks. We then propose a novel strategy to build and aggregate k-labelsets based committee in the context of ensemble multi-label classification. We then analyze the effect of the aggregation step within ensemble multi-label approaches in depth and investigate how this aggregation impacts the prediction performances with respect to the objective multi-label loss metric. We then address the specific problem of identifying relevant subsets of features - among potentially irrelevant and redundant features - in the multi-label context based on the ensemble paradigm. Three wrapper multi-label feature selection methods based on the Random Forest paradigm are proposed. These methods differ in the way they consider label dependence within the feature selection process. Finally, we extend the multi-label classification and feature selection problems to the semi-supervised setting and consider the situation where only few labelled instances are available. We propose a new semi-supervised multi-label feature selection approach based on the ensemble paradigm. The proposed model combines ideas from co-training and multi-label k-labelsets committee construction in tandem with an inner out-of-bag label feature importance evaluation. Satisfactorily tested on several benchmark data, the approaches developed in this thesis show promise for a variety of applications in supervised and semi-supervised multi-label learning
APA, Harvard, Vancouver, ISO, and other styles
44

Tran, Anh-Tuan. "Ensemble learning-based approach for the global minimum variance portfolio." Electronic Thesis or Diss., Université Paris sciences et lettres, 2024. http://www.theses.fr/2024UPSLP010.

Full text
Abstract:
Ensemble Learning a une idée simple selon laquelle la combinaison de plusieurs algorithmes d'apprentissage a tendance à donner un meilleur résultat que n'importe quel algorithme d'apprentissage unique. Empiriquement, la méthode d'ensemble est meilleure si ses modèles de base sont diversifiés même s'il s'agit d'algorithmes aléatoires non intuitifs tels que des arbres de décision aléatoires. En raison de ses avantages, Ensemble Learning est utilisé dans diverses applications telles que les problèmes de détection de fraude. Plus en détail, les avantages d'Ensemble Learning tiennent à deux points : i) combine les points forts de ses modèles de base afin que chaque modèle soit complémentaire l'un de l'autre et ii) neutralise le bruit et les valeurs aberrantes parmi tous les modèles de base puis réduit leurs impacts sur le prévisions finales. Nous utilisons ces deux idées d'Ensemble Learning pour différentes applications dans les secteurs de l'apprentissage automatique et de la finance. Nos principales contributions dans cette thèse sont : i) traiter efficacement un scénario difficile de problème de données de déséquilibre dans l'apprentissage automatique, qui est un problème de données volumineuses extrêmement déséquilibré en utilisant la technique de sous-échantillonnage et l'apprentissage d'ensemble, ii) appliquer de manière appropriée la validation croisée des séries chronologiques et l'Ensemble Learning pour résoudre un problème de sélection d'estimateur de matrice de covariance dans le trading quantitatif et iii) réduire l'impact des valeurs aberrantes dans les estimations de la matrice de covariance afin d'augmenter la stabilité des portefeuilles en utilisant le sous-échantillonnage et l'Ensemble Learning<br>Ensemble Learning has a simple idea that combining several learning algorithms tend to yield a better result than any single learning algorithm. Empirically, the ensemble method is better if its base models are diversified even if they are non-intuitively random algorithms such as random decision trees. Because of its advantages, Ensemble Learning is used in various applications such as fraud detection problems. In more detail, the advantages of Ensemble Learning are because of two points: i) combines the strengths of its base models then each model is complementary to one another and ii) neutralizes the noise and outliers among all base models then reduces their impacts on the final predictions. We use these two ideas of Ensemble Learning for different applications in the Machine Learning and the Finance industry. Our main contributions in this thesis are: i) efficiently deal with a hard scenario of imbalance data problem in the Machine Learning which is extremely imbalance big data problem by using undersampling technique and the Ensemble Learning, ii) appropriately apply time-series Cross-Validation and the Ensemble Learning to resolve a covariance matrix estimator selection problem in Quantitative Trading and iii) reduce the impact of outliers in covariance matrix estimations in order to increase the stability of portfolios by using the undersampling and the Ensemble Learning
APA, Harvard, Vancouver, ISO, and other styles
45

Kehl, Justin. "N-SLOPE: A One-Class Classification Ensemble For Nuclear Forensics." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1871.

Full text
Abstract:
One-class classification is a specialized form of classification from the field of machine learning. Traditional classification attempts to assign unknowns to known classes, but cannot handle novel unknowns that do not belong to any of the known classes. One-class classification seeks to identify these outliers, while still correctly assigning unknowns to classes appropriately. One-class classification is applied here to the field of nuclear forensics, which is the study and analysis of nuclear material for the purpose of nuclear incident investigations. Nuclear forensics data poses an interesting challenge because false positive identification can prove costly and data is often small, high-dimensional, and sparse, which is problematic for most machine learning approaches. A web application is built using the R programming language and the shiny framework that incorporates N-SLOPE: a machine learning ensemble. N-SLOPE combines five existing one-class classifiers with a novel one-class classifier introduced here and uses ensemble learning techniques to combine output. N-SLOPE is validated on three distinct data sets: Iris, Obsidian, and Galaxy Serpent 3, which is an enhanced version of a recent international nuclear forensics exercise. N-SLOPE achieves high classification accuracy on each data set of 100%, 83.33%, and 83.33%, respectively, while minimizing false positive detection rate to 0% across the board and correctly detecting every single novel unknown from each data set. N-SLOPE is shown to be a useful and powerful tool to aid in nuclear forensic investigations.
APA, Harvard, Vancouver, ISO, and other styles
46

Johansson, Alfred. "Ensemble approach to code smell identification : Evaluating ensemble machine learning techniques to identify code smells within a software system." Thesis, Tekniska Högskolan, Jönköping University, JTH, Datateknik och informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-49319.

Full text
Abstract:
The need for automated methods for identifying refactoring items is prelevent in many software projects today. Symptoms of refactoring needs is the concept of code smells within a software system. Recent studies have used single model machine learning to combat this issue. This study aims to test the possibility of improving machine learning code smell detection using ensemble methods. Therefore identifying the strongest ensemble model in the context of code smells and the relative sensitivity of the strongest perfoming ensemble identified. The ensemble models performance was studied by performing experiments using WekaNose to create datasets of code smells and Weka to train and test the models on the dataset. The datasets created was based on Qualitas Corpus curated java project. Each tested ensemble method was then compared to all the other ensembles, using f-measure, accuracy and AUC ROC scores. The tested ensemble methods were stacking, voting, bagging and boosting. The models to implement the ensemble methods with were models that previous studies had identified as strongest performer for code smell identification. The models where Jrip, J48, Naive Bayes and SMO. The findings showed, that compared to previous studies, bagging J48 improved results by 0.5%. And that the nominally implemented baggin of J48 in Weka follows best practices and the model where impacted negatively. However, due to the complexity of stacking and voting ensembles further work is needed regarding stacking and voting ensemble models in the context of code smell identification.
APA, Harvard, Vancouver, ISO, and other styles
47

Gardner, Angelica. "Stronger Together? An Ensemble of CNNs for Deepfakes Detection." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-97643.

Full text
Abstract:
Deepfakes technology is a face swap technique that enables anyone to replace faces in a video, with highly realistic results. Despite its usefulness, if used maliciously, this technique can have a significant impact on society, for instance, through the spreading of fake news or cyberbullying. This makes the ability of deepfakes detection a problem of utmost importance. In this paper, I tackle the problem of deepfakes detection by identifying deepfakes forgeries in video sequences. Inspired by the state-of-the-art, I study the ensembling of different machine learning solutions built on convolutional neural networks (CNNs) and use these models as objects for comparison between ensemble and single model performances. Existing work in the research field of deepfakes detection suggests that escalated challenges posed by modern deepfake videos make it increasingly difficult for detection methods. I evaluate that claim by testing the detection performance of four single CNN models as well as six stacked ensembles on three modern deepfakes datasets. I compare various ensemble approaches to combine single models and in what way their predictions should be incorporated into the ensemble output. The results I found was that the best approach for deepfakes detection is to create an ensemble, though, the ensemble approach plays a crucial role in the detection performance. The final proposed solution is an ensemble of all available single models which use the concept of soft (weighted) voting to combine its base-learners’ predictions. Results show that this proposed solution significantly improved deepfakes detection performance and substantially outperformed all single models.
APA, Harvard, Vancouver, ISO, and other styles
48

Kim, Jinhan. "J-model : an open and social ensemble learning architecture for classification." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/7672.

Full text
Abstract:
Ensemble learning is a promising direction of research in machine learning, in which an ensemble classifier gives better predictive and more robust performance for classification problems by combining other learners. Meanwhile agent-based systems provide frameworks to share knowledge from multiple agents in an open context. This thesis combines multi-agent knowledge sharing with ensemble methods to produce a new style of learning system for open environments. We now are surrounded by many smart objects such as wireless sensors, ambient communication devices, mobile medical devices and even information supplied via other humans. When we coordinate smart objects properly, we can produce a form of collective intelligence from their collaboration. Traditional ensemble methods and agent-based systems have complementary advantages and disadvantages in this context. Traditional ensemble methods show better classification performance, while agent-based systems might not guarantee their performance for classification. Traditional ensemble methods work as closed and centralised systems (so they cannot handle classifiers in an open context), while agent-based systems are natural vehicles for classifiers in an open context. We designed an open and social ensemble learning architecture, named J-model, to merge the conflicting benefits of the two research domains. The J-model architecture is based on a service choreography approach for coordinating classifiers. Coordination protocols are defined by interaction models that describe how classifiers will interact with one another in a peer-to-peer manner. The peer ranking algorithm recommends more appropriate classifiers to participate in an interaction model to boost the success rate of results of their interactions. Coordinated participant classifiers who are recommended by the peer ranking algorithm become an ensemble classifier within J-model. We evaluated J-model’s classification performance with 13 UCI machine learning benchmark data sets and a virtual screening problem as a realistic classification problem. J-model showed better performance of accuracy, for 9 benchmark sets out of 13 data sets, than 8 other representative traditional ensemble methods. J-model gave better results of specificity for 7 benchmark sets. In the virtual screening problem, J-model gave better results for 12 out of 16 bioassays than already published results. We defined different interaction models for each specific classification task and the peer ranking algorithm was used across all the interaction models. Our research contributions to knowledge are as follows. First, we showed that service choreography can be an effective ensemble coordination method for classifiers in an open context. Second, we used interaction models that implement task specific coordinations of classifiers to solve a variety of representative classification problems. Third, we designed the peer ranking algorithm which is generally and independently applicable to the task of recommending appropriate member classifiers from a classifier pool based on an open pool of interaction models and classifiers.
APA, Harvard, Vancouver, ISO, and other styles
49

Segerström, Pierre, and Felix Boltshauser. "Ensemble Learning Applied to Classification of Malignant and Benign Breast Cancer." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302551.

Full text
Abstract:
In this study, we show how ensemble learning can be useful for the future of breast cancer diagnosis. The chosen ensemble learning method was bagging, which made use of the classifiers Support Vector Machine (SVM), Decision Tree (DT) and Naive Bayes (NB) in order to classify mammograms as benign or malignant. The results achieved with bagging were compared to the results of each individual classifier previously mentioned. Overall, the results showed that the benefits of ensemble learning were varying, dependent on certain factors. Affecting aspects were: which classifier that was used, chosen method for extracting input data, but also which tumor types that were used in training and evaluation of each classifier. While classification using DT improved significantly with bagging, SVM and NB gave negligible performance benefits. Finally, this study only scratched the surface of known ensemble learning methods, indicating that there may be a lot of room for future research in the area.<br>I denna rapport visar vi hur samlingsinlärning kan vara användbart för framtida diagnostisering av bröstcancer. Den valda samlingsinlärning-metoden var bagging", vilket tog användning av Support Vector Machine (SVM), Decision Tree (DT) och Naive Bayes (NB) för att klassificera mammogram som godartade eller elakartade. Resultaten som togs fram för bagging"jämfördes avslutligen med resultaten från respektive ovannämnd klassifierare. Generellt visade resultaten att fördelarna med samlingsinlärning var varierande, beroende på vissa faktorer. Påverkande aspekter var: vilken klassifierare som användes, vald metod för extraktion av inmatningsdata, men också vilka tumörtyper som användes för träning och evaluering av respektive klassifierare. Medans klassifikation med DT förbättrades signifikant med bagging", var skillnaderna försumbara med SVM och NB. Slutligen, skrapar denna studie enbart på ytan av kända samlingsinlärning-metoder, vilket indikerar att det kan finnas mycket utrymme för framtida forskning i området.
APA, Harvard, Vancouver, ISO, and other styles
50

Ahmed, Istiak. "An ensemble learning approach based on decision trees and probabilistic argumentation." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-175967.

Full text
Abstract:
This research discusses a decision support system that includes different machine learning approaches (e.g. ensemble learning, decision trees) and a symbolic reasoning approach (e.g. argumentation). The purpose of this study is to define an ensemble learning algorithm based on formal argumentation and decision trees. Using a decision tree algorithmas a base learning algorithm and an argumentation framework as a decision fusion technique of an ensemble architecture, the proposed system produces outcomes. The introduced algorithm is a hybrid ensemble learning approach based on a formal argumentation-based method. It is evaluated with sample data sets (e.g. an open-access data set and an extracted data set from ultrasound images) and it provides satisfactory outcomes. This study approaches the problem that is related to an ensemble learning algorithm and a formal argumentation approach. A probabilistic argumentation framework is implemented as a decision fusion in an ensemble learning approach. An open-access library is also developed for the user. The generic version of the library can be used in different purposes.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography