To see the other types of publications on this topic, follow the link: Boosting and bagging.

Dissertations / Theses on the topic 'Boosting and bagging'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 23 dissertations / theses for your research on the topic 'Boosting and bagging.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Nascimento, Diego Silveira Costa. "Configuração heterogênea de ensembles de classificadores : investigação em bagging, boosting e multiboosting." Universidade de Fortaleza, 2009. http://dspace.unifor.br/handle/tede/83562.

Full text
Abstract:
Made available in DSpace on 2019-03-29T23:22:32Z (GMT). No. of bitstreams: 0 Previous issue date: 2009-12-21
This work presents a study on the characterization and evaluation of six new heterogeneous committees machines algorithms, which are aimed at solving problems of pattern classification. These algorithms are extensions of models which are already found in the literature and have been successfully applied in different fields of research. Following two approaches, evolutionary and constructive, different machine learning algorithms (inductors) can be used for induction of components of the ensemble to be trained by standard Bagging, Boosting or MultiBoosting on the resampled data, aiming at the increasing of the diversity of the resulting composite model. As a means of automatic configuration of different types of components, we adopt a customized genetic algorithm for the first approach and greedy search for the second approach. For purposes of validation of the proposal, an empirical study has been conducted involving 10 different types of inductors and 18 classification problems taken from the UCI repository. The acuity values obtained by the evolutionary and constructive heterogeneous ensembles are analyzed based on those produced by models of homogeneous ensembles composed of the 10 types of inductors we have utilized, and the majority of the results evidence a gain in performance from both approaches. Keywords: Machine learning, Committee machines, Bagging, Wagging, Boosting, MultiBoosting, Genetic algorithm.
Este trabalho apresenta um estudo quanto à caracterização e avaliação de seis novos algoritmos de comitês de máquinas heterogêneos, sendo estes destinados à resolução de problemas de classificação de padrões. Esses algoritmos são extensões de modelos já encontrados na literatura e que vêm sendo aplicados com sucesso em diferentes domínios de pesquisa. Seguindo duas abordagens, uma evolutiva e outra construtiva, diferentes algoritmos de aprendizado de máquina (indutores) podem ser utilizados para fins de indução dos componentes do ensemble a serem treinados por Bagging, Boosting ou MultiBoosting padrão sobre os dados reamostrados, almejando-se o incremento da diversidade do modelo composto resultante. Como meio de configuração automática dos diferentes tipos de componentes, adota-se um algoritmo genético customizado para a primeira abordagem e uma busca de natureza gulosa para a segunda abordagem. Para fins de validação da proposta, foi conduzido um estudo empírico envolvendo 10 diferentes tipos de indutores e 18 problemas de classificação extraídos do repositório UCI. Os valores de acuidade obtidos via ensembles heterogêneos evolutivos e construtivos são analisados com base naqueles produzidos por modelos de ensembles homogêneos compostos pelos 10 tipos de indutores utilizados, sendo que em grande parte dos casos os resultados evidenciam ganhos de desempenho de ambas as abordagens. Palavras-chave: Aprendizado de máquina, Comitês de máquinas, Bagging, Wagging, Boosting, MultiBoosting, Algoritmo genético.
APA, Harvard, Vancouver, ISO, and other styles
2

Rubesam, Alexandre. "Estimação não parametrica aplicada a problemas de classificação via Bagging e Boosting." [s.n.], 2004. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306510.

Full text
Abstract:
Orientador: Ronaldo Dias
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica
Made available in DSpace on 2018-08-03T20:17:33Z (GMT). No. of bitstreams: 1 Rubesam_Alexandre_M.pdf: 3561307 bytes, checksum: 136856548e218dc25a0ba4ee178b63a7 (MD5) Previous issue date: 2004
Resumo: Alguns dos métodos mais modernos e bem sucedidos de classificação são bagging, boosting e SVM (Support Vector M achines ). B agging funciona combinando classificadores ajustados em amostras bootstrap dos dados; boosting funciona aplicando-se seqüencialmente um algoritmo de classificação a versões reponderadas do conjunto de dados de treinamento, dando maior peso às observações classificadas erroneamente no passo anterior, e SVM é um método que transforma os dados originais de maneira não linear para um espaço de dimensão maior, e procura um hiperplano separador neste espaço transformado. N este trabalho estudamos os métodos descritos acima, e propusemos dois métodos de classificação, um baseado em regressão não paramétrica por Hsplines (também proposto aqui) e boosting, e outro que é uma modificação de um algoritmo de boosting baseado no algoritmo MARS. Os métodos foram aplicados em dados simulados e em dados reais
Abstract: Some of the most modern and well succeeded classification methods are bagging, boosting and SVM (Support Vector Machines). Bagging combines classifiers fitted to bootstrap samples of the training data; boosting sequentially applies a classification algorithm to reweighted versions of the training data, increasing in each step the weights of the observations that were misclassified in the previous step, and SVM is a method that transforms the data in a nonlinear way to a space of greater dimension than that of the original data, and searches for a separating hyperplane in this transformed space. In this work we have studied the methods described above. We propose two classification methods: one of them is based on a nonparametric regression method via H-splines (also proposed here) and boosting, and the other is a modification of a boosting algorithm, based on the MARS algorithm. The methods were applied to both simulated and real data
Mestrado
Mestre em Estatística
APA, Harvard, Vancouver, ISO, and other styles
3

Boshoff, Lusilda. "Boosting, bagging and bragging applied to nonparametric regression : an empirical approach / Lusilda Boshoff." Thesis, North-West University, 2009. http://hdl.handle.net/10394/4337.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Llerena, Nils Ever Murrugarra. "Ensembles na classificação relacional." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-18102011-095113/.

Full text
Abstract:
Em diversos domínios, além das informações sobre os objetos ou entidades que os compõem, existem, também, informaçõoes a respeito das relações entre esses objetos. Alguns desses domínios são, por exemplo, as redes de co-autoria, e as páginas Web. Nesse sentido, é natural procurar por técnicas de classificação que levem em conta estas informações. Dentre essas técnicas estão as denominadas classificação baseada em grafos, que visam classificar os exemplos levando em conta as relações existentes entre eles. Este trabalho aborda o desenvolvimento de métodos para melhorar o desempenho de classificadores baseados em grafos utilizando estratégias de ensembles. Um classificador ensemble considera um conjunto de classificadores cujas predições individuais são combinadas de alguma forma. Este classificador normalmente apresenta um melhor desempenho do que seus classificadores individualmente. Assim, foram desenvolvidas três técnicas: a primeira para dados originalmente no formato proposicional e transformados para formato relacional baseado em grafo e a segunda e terceira para dados originalmente já no formato de grafo. A primeira técnica, inspirada no algoritmo de boosting, originou o algoritmo KNN Adaptativo Baseado em Grafos (A-KNN). A segunda ténica, inspirada no algoritmo de Bagging originou trê abordagens de Bagging Baseado em Grafos (BG). Finalmente, a terceira técnica, inspirada no algoritmo de Cross-Validated Committees, originou o Cross-Validated Committees Baseado em Grafos (CVCG). Os experimentos foram realizados em 38 conjuntos de dados, sendo 22 conjuntos proposicionais e 16 conjuntos no formato relacional. Na avaliação foi utilizado o esquema de 10-fold stratified cross-validation e para determinar diferenças estatísticas entre classificadores foi utilizado o método proposto por Demsar (2006). Em relação aos resultados, as três técnicas melhoraram ou mantiveram o desempenho dos classificadores bases. Concluindo, ensembles aplicados em classificadores baseados em grafos apresentam bons resultados no desempenho destes
In many fields, besides information about the objects or entities that compose them, there is also information about the relationships between objects. Some of these fields are, for example, co-authorship networks and Web pages. Therefore, it is natural to search for classification techniques that take into account this information. Among these techniques are the so-called graphbased classification, which seek to classify examples taking into account the relationships between them. This paper presents the development of methods to improve the performance of graph-based classifiers by using strategies of ensembles. An ensemble classifier considers a set of classifiers whose individual predictions are combined in some way. This combined classifier usually performs better than its individual classifiers. Three techniques have been developed: the first applied for originally propositional data transformed to relational format based on graphs and the second and the third applied for data originally in graph format. The first technique, inspired by the boosting algorithm originated the Adaptive Graph-Based K-Nearest Neighbor (A-KNN). The second technique, inspired by the bagging algorithm led to three approaches of Graph-Based Bagging (BG). Finally the third technique, inspired by the Cross- Validated Committees algorithm led to the Graph-Based Cross-Validated Committees (CVCG). The experiments were performed on 38 data sets, 22 datasets in propositional format and 16 in relational format. Evaluation was performed using the scheme of 10-fold stratified cross-validation and to determine statistical differences between the classifiers it was used the method proposed by Demsar (2006). Regarding the results, these three techniques improved or at least maintain the performance of the base classifiers. In conclusion, ensembles applied to graph-based classifiers have good results in the performance of them
APA, Harvard, Vancouver, ISO, and other styles
5

Barrow, Devon K. "Active model combination : an evaluation and extension of bagging and boosting for time series forecasting." Thesis, Lancaster University, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.659174.

Full text
Abstract:
Since the seminal work by Bates and Granger (1969), the practice of combining two or more models, rather than selecting the single best, has consistently been shown to lead to improvements in accuracy. In forecasting, model combination aims to find an optimal weighting given a set of precalculated forecasts. In contrast, machine learning includes methods which simultaneously optimise individual models and the weights used to combine them. Bagging and boosting combine the results of complementary and diverse models generated by actively perturbing, reweighting and resampling training data. Despite large gains in predictive accuracy in classification, limited research assesses their efficacy on time series data. This thesis provides a critical review of, the combination literature, and is the first literature survey of boosting for time series forecasting. The lack of rigorous empirical evidence on forecast accuracy of Bagging and boosting is identified as a major gap. To address this, a rigorous evaluation of Bagging and boosting adhering to recommendations of the forecasting literature is performed using robust error measures on a large set of real time series, exhibiting a representative set of features and dataset properties. Additionally there is a narrow focus on marginal extensions of boosting, and limited evidence of any gains in accuracy. A novel framework is proposed to explore the impact of varying boosting meta-parameters, and to evaluate the empirical accuracy of the resulting 96 boosting variants. The choice of base model and combination size are found to have the largest impact on forecast accuracy. Findings show that boosting overfits to noisy data, however no existing study investigates this crucial issue. New noise robust boosting methods are developed and evaluated for time series forecast models. They are found to significantly improve accuracy above current boosting approaches and Bagging, while neural network model averaging is found to perform best.
APA, Harvard, Vancouver, ISO, and other styles
6

Dang, Yue. "A Comparative Study of Bagging and Boosting of Supervised and Unsupervised Classifiers For Outliers Detection." Wright State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=wright1502475855457354.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bourel, Mathias. "Agrégation de modèles en apprentissage statistique pour l'estimation de la densité et la classification multiclasse." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4076/document.

Full text
Abstract:
Les méthodes d'agrégation en apprentissage statistique combinent plusieurs prédicteurs intermédiaires construits à partir du même jeu de données dans le but d'obtenir un prédicteur plus stable avec une meilleure performance. Celles-ci ont été amplement étudiées et ont données lieu à plusieurs travaux, théoriques et empiriques dans plusieurs contextes, supervisés et non supervisés. Dans ce travail nous nous intéressons dans un premier temps à l'apport de ces méthodes au problème de l'estimation de la densité. Nous proposons plusieurs estimateurs simples obtenus comme combinaisons linéaires d'histogrammes. La principale différence entre ceux-ci est quant à la nature de l'aléatoire introduite à chaque étape de l'agrégation. Nous comparons ces techniques à d'autres approches similaires et aux estimateurs classiques sur un choix varié de modèles, et nous démontrons les propriétés asymptotiques pour un de ces algorithmes (Random Averaged Shifted Histogram). Une seconde partie est consacrée aux extensions du Boosting pour le cas multiclasse. Nous proposons un nouvel algorithme (Adaboost.BG) qui fournit un classifieur final en se basant sur un calcul d'erreur qui prend en compte la marge individuelle de chaque modèle introduit dans l'agrégation. Nous comparons cette méthode à d'autres algorithmes sur plusieurs jeu de données artificiels classiques
Ensemble methods in statistical learning combine several base learners built from the same data set in order to obtain a more stable predictor with better performance. Such methods have been extensively studied in the supervised context for regression and classification. In this work we consider the extension of these approaches to density estimation. We suggest several new algorithms in the same spirit as bagging and boosting. We show the efficiency of combined density estimators by extensive simulations. We give also the theoretical results for one of our algorithms (Random Averaged Shifted Histogram) by mean of asymptotical convergence under milmd conditions. A second part is devoted to the extensions of the Boosting algorithms for the multiclass case. We propose a new algorithm (Adaboost.BG) accounting for the margin of the base classifiers and show its efficiency by simulations and comparing it to the most used methods in this context on several datasets from the machine learning benchmark. Partial theoretical results are given for our algorithm, such as the exponential decrease of the learning set misclassification error to zero
APA, Harvard, Vancouver, ISO, and other styles
8

Siqueira, Vânia Rosatti de. "Um modelo de credit scoring para microcrédito: uma inovação no mercado brasileiro." Universidade Presbiteriana Mackenzie, 2011. http://tede.mackenzie.br/jspui/handle/tede/546.

Full text
Abstract:
Made available in DSpace on 2016-03-15T19:25:42Z (GMT). No. of bitstreams: 1 Vania Rosatti de Siqueira.pdf: 636275 bytes, checksum: a16be8a6db840089b4bb3645148a7376 (MD5) Previous issue date: 2011-02-10
The Grameen Bank experiences with microcredit operations have been imitated in various countries, mainly the ones related to the two great innovations in this market: the credit agent s role and the solidary group mechanism. The massification of the operations and the reduction in their costs become vital for economies of scale to be achieved, as well as a greater appetite for the MFIs to expand their activity in the microcredit market. In this context, the next great innovation in the microcredit market will be the introduction of credit scoring models in such operations. This will speed up the process, reduce the risks and consequently the costs. Historical information about microcredit operations was taken into account for the creation of a credit model. It was then possible to identify key variables that help to distinguish between the good and the bad borrowers. The results show that as machine learning techniques bagging and boosting are added to the traditional methods of credit analysis discriminant analysis and logistic regression , an improvement in the performance of the credit scoring models for microcredit can be achieved.
As experiências do Grameen Bank com operações de microcrédito têm sido reproduzidas em vários países, principalmente as relacionadas com as duas grandes inovações neste mercado: o papel do agente de crédito e o mecanismo de grupo solidário. A massificação das operações e a redução de custos tornam-se imprescindíveis para que haja economia de escala e maior apetite para as IMFs ampliarem sua atuação neste mercado. Neste cenário, a implantação de modelos de credit scoring será a próxima inovação do microcrédito e proporcionará agilidade, redução de riscos e, conseqüentemente, redução dos custos. Com base em informações históricas de operações de microcrédito foi elaborado um modelo de crédito. Foram identificadas variáveis chave que permitem distinguir os bons e maus pagadores. Os resultados mostram que, acoplando-se técnicas de linguagem de máquina bagging e boosting aos métodos tradicionais de análise de crédito análise discriminante e regressão logística , obtém-se melhora na performance dos modelos de credit scoring para microcrédito.
APA, Harvard, Vancouver, ISO, and other styles
9

Lopes, Neilson Soares. "Modelos de classificação de risco de crédito para financiamentos imobiliários: regressão logística, análise discriminante, árvores de decisão, bagging e boosting." Universidade Presbiteriana Mackenzie, 2011. http://tede.mackenzie.br/jspui/handle/tede/527.

Full text
Abstract:
Made available in DSpace on 2016-03-15T19:25:35Z (GMT). No. of bitstreams: 1 Neilson Soares Lopes.pdf: 983372 bytes, checksum: 2233d489295cd76cb2d8dcbd78e1e5de (MD5) Previous issue date: 2011-08-08
Fundo Mackenzie de Pesquisa
This study applied the techniques of traditional parametric discriminant analysis and logistic regression analysis of credit real estate financing transactions where borrowers may or may not have a payroll loan transaction. It was the hit rate compared these methods with the non-parametric techniques based on classification trees, and the methods of meta-learning bagging and boosting that combine classifiers for improved accuracy in the algorithms.In a context of high housing deficit, especially in Brazil, the financing of real estate can still be very encouraged. The impacts of sustainable growth in the mortgage not only bring economic benefits and social. The house is, for most individuals, the largest source of expenditure and the most valuable asset that will have during her lifetime.At the end of the study concluded that the computational techniques of decision trees are more effective for the prediction of payers (94.2% correct), followed by bagging (80.7%) and boosting (or arcing , 75.2%). For the prediction of bad debtors in mortgages, the techniques of logistic regression and discriminant analysis showed the worst results (74.6% and 70.7%, respectively). For the good payers, the decision tree also showed the best predictive power (75.8%), followed by discriminant analysis (75.3%) and boosting (72.9%). For the good paying mortgages, bagging and logistic regression showed the worst results (72.1% and 71.7%, respectively). Logistic regression shows that for a borrower with payroll loans, the chance to be a bad credit is 2.19 higher than if the borrower does not have such type of loan.The presence of credit between the payroll operations of mortgage borrowers also has relevance in the discriminant analysis.
Neste estudo foram aplicadas as técnicas paramétricas tradicionais de análise discriminante e regressão logística para análise de crédito de operações de financiamento imobiliário. Foi comparada a taxa de acertos destes métodos com as técnicas não-paramétricas baseadas em árvores de classificação, além dos métodos de meta-aprendizagem BAGGING e BOOSTING, que combinam classificadores para obter uma melhor precisão nos algoritmos.Em um contexto de alto déficit de moradias, em especial no caso brasileiro, o financiamento de imóveis ainda pode ser bastante fomentado. Os impactos de um crescimento sustentável no crédito imobiliário trazem benefícios não só econômicos como sociais. A moradia é, para grande parte dos indivíduos, a maior fonte de despesas e o ativo mais valioso que terão durante sua vida. Ao final do estudo, concluiu-se que as técnicas computacionais de árvores de decisão se mostram mais efetivas para a predição de maus pagadores (94,2% de acerto), seguida do BAGGING (80,7%) e do BOOSTING (ou ARCING, 75,2%). Para a predição de maus pagadores em financiamentos imobiliários, as técnicas de regressão logística e análise discriminante apresentaram os piores resultados (74,6% e 70,7%, respectivamente). Para os bons pagadores, a árvore de decisão também apresentou o melhor poder preditivo (75,8%), seguida da análise discriminante (75,3%) e do BOOSTING (72,9%). Para os bons pagadores de financiamentos imobiliários, BAGGING e regressão logística apresentaram os piores resultados (72,1% e 71,7%, respectivamente).A regressão logística mostra que, para um tomador com crédito consignado, a chance se ser um mau pagador é 2,19 maior do que se este tomador não tivesse tal modalidade de empréstimo. A presença de crédito consignado entre as operações dos tomadores de financiamento imobiliário também apresenta relevância na análise discriminante.
APA, Harvard, Vancouver, ISO, and other styles
10

Shire, Norah J. "Boosting, Bagging, and Classification Analysis to Improve Noninvasive Liver Fibrosis Prediction in HCV/HIV Coinfected Subjects: An Analysis of the AIDS Clinical Trials Group (ACTG) 5178." Cincinnati, Ohio : University of Cincinnati, 2007. http://rave.ohiolink.edu/etdc/view.cgi?acc_num=ucin1172860066.

Full text
Abstract:
Thesis (Ph.D.)--University of Cincinnati, 2007.
Advisor: Charles Ralph Buncher. Title from electronic thesis title page (viewed April 23, 2009). Keywords: Coinfection; Boosting and bagging; Classification analysis; HIV; Viral hepatitis. Includes abstract. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
11

Kirkby, Richard Brendon. "Improving Hoeffding Trees." The University of Waikato, 2008. http://hdl.handle.net/10289/2568.

Full text
Abstract:
Modern information technology allows information to be collected at a far greater rate than ever before. So fast, in fact, that the main problem is making sense of it all. Machine learning offers promise of a solution, but the field mainly focusses on achieving high accuracy when data supply is limited. While this has created sophisticated classification algorithms, many do not cope with increasing data set sizes. When the data set sizes get to a point where they could be considered to represent a continuous supply, or data stream, then incremental classification algorithms are required. In this setting, the effectiveness of an algorithm cannot simply be assessed by accuracy alone. Consideration needs to be given to the memory available to the algorithm and the speed at which data is processed in terms of both the time taken to predict the class of a new data sample and the time taken to include this sample in an incrementally updated classification model. The Hoeffding tree algorithm is a state-of-the-art method for inducing decision trees from data streams. The aim of this thesis is to improve this algorithm. To measure improvement, a comprehensive framework for evaluating the performance of data stream algorithms is developed. Within the framework memory size is fixed in order to simulate realistic application scenarios. In order to simulate continuous operation, classes of synthetic data are generated providing an evaluation on a large scale. Improvements to many aspects of the Hoeffding tree algorithm are demonstrated. First, a number of methods for handling continuous numeric features are compared. Second, tree prediction strategy is investigated to evaluate the utility of various methods. Finally, the possibility of improving accuracy using ensemble methods is explored. The experimental results provide meaningful comparisons of accuracy and processing speeds between different modifications of the Hoeffding tree algorithm under various memory limits. The study on numeric attributes demonstrates that sacrificing accuracy for space at the local level often results in improved global accuracy. The prediction strategy shown to perform best adaptively chooses between standard majority class and Naive Bayes prediction in the leaves. The ensemble method investigation shows that combining trees can be worthwhile, but only when sufficient memory is available, and improvement is less likely than in traditional machine learning. In particular, issues are encountered when applying the popular boosting method to streams.
APA, Harvard, Vancouver, ISO, and other styles
12

Seck, Djamal. "Arbres de décisions symboliques, outils de validations et d'aide à l'interprétation." Thesis, Paris 9, 2012. http://www.theses.fr/2012PA090067.

Full text
Abstract:
Nous proposons dans cette thèse la méthode STREE de construction d'arbres de décision avec des données symboliques. Ce type de données permet de caractériser des individus de niveau supérieur qui peuvent être des classes ou catégories d’individus ou des concepts au sens des treillis de Galois. Les valeurs des variables, appelées variables symboliques, peuvent être des ensembles, des intervalles ou des histogrammes. Le critère de partitionnement récursif est une combinaison d'un critère par rapport aux variables explicatives et d'un critère par rapport à la variable à expliquer. Le premier critère est la variation de la variance des variables explicatives. Quand il est appliqué seul, STREE correspond à une méthode descendante de classification non supervisée. Le second critère permet de construire un arbre de décision. Il s'agit de la variation de l'indice de Gini si la variable à expliquer est nominale et de la variation de la variance si la variable à expliquer est continue ou bien est une variable symbolique. Les données classiques sont un cas particulier de données symboliques sur lesquelles STREE peut aussi obtenir de bons résultats. Il en ressort de bonnes performances sur plusieurs jeux de données UCI par rapport à des méthodes classiques de Data Mining telles que CART, C4.5, Naive Bayes, KNN, MLP et SVM. STREE permet également la construction d'ensembles d'arbres de décision symboliques soit par bagging soit par boosting. L'utilisation de tels ensembles a pour but de pallier les insuffisances liées aux arbres de décisions eux-mêmes et d'obtenir une décision finale qui est en principe plus fiable que celle obtenue à partir d'un arbre unique
In this thesis, we propose the STREE methodology for the construction of decision trees with symbolic data. This data type allows us to characterize individuals of higher levels which may be classes or categories of individuals or concepts within the meaning of the Galois lattice. The values of the variables, called symbolic variables, may be sets, intervals or histograms. The criterion of recursive partitioning is a combination of a criterion related to the explanatory variables and a criterion related to the dependant variable. The first criterion is the variation of the variance of the explanatory variables. When it is applied alone, STREE acts as a top-down clustering methodology. The second criterion enables us to build a decision tree. This criteron is expressed as the variation of the Gini index if the dependant variable is nominal, and as the variation of the variance if thedependant variable is continuous or is a symbolic variable. Conventional data are a special case of symbolic data on which STREE can also get good results. It has performed well on multiple sets of UCI data compared to conventional methodologies of Data Mining such as CART, C4.5, Naive Bayes, KNN, MLP and SVM. The STREE methodology also allows for the construction of ensembles of symbolic decision trees either by bagging or by boosting. The use of such ensembles is designed to overcome shortcomings related to the decisions trees themselves and to obtain a finaldecision that is in principle more reliable than that obtained from a single tree
APA, Harvard, Vancouver, ISO, and other styles
13

Dias, Alexandra Aparecida Delpósito. "Previsão do incumprimento no crédito a empresas com classificadores múltiplos." Master's thesis, Instituto Superior de Economia e Gestão, 2012. http://hdl.handle.net/10400.5/11023.

Full text
Abstract:
Mestrado em Matemática Financeira
Neste estudo foram implementados modelos de previsão do incumprimento no crédito a empresas baseados em classificadores múltiplos. O desempenho destes modelos foi comparado com o de classificadores individuais. A capacidade preditiva dos modelos foi avaliada através de curvas ROC e da análise de taxas de erro de classificação. Os resultados sugerem que modelos baseados em classificadores múltiplos têm maior precisão na classificação de incumprimento do que classificadores individuais.
This study develops models for predicting credit defaults in the corporate segment using multiple classifiers. The performance of these models was compared with those of individual classifiers. The predictive ability of the competing models was evaluated using ROC curves and error rates of classification. The results suggest that models based on multiple classifiers have a better performance in the classification of credit defaults than individual classifiers.
APA, Harvard, Vancouver, ISO, and other styles
14

Jiang, Fuhua. "SVM-Based Negative Data Mining to Binary Classification." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/8.

Full text
Abstract:
The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner's result if audit acknowledges learner's result or learner agrees with audit's judgment, otherwise returns the booster's result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm.
APA, Harvard, Vancouver, ISO, and other styles
15

Hovorka, Martin. "Meta-learning." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217654.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Zoghi, Zeinab. "Ensemble Classifier Design and Performance Evaluation for Intrusion Detection Using UNSW-NB15 Dataset." University of Toledo / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1596756673292254.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Ulriksson, Marcus, and Shahin Armaki. "Analys av prestations- och prediktionsvariabler inom fotboll." Thesis, Uppsala universitet, Statistiska institutionen, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-324983.

Full text
Abstract:
Uppsatsen ämnar att försöka förklara hur olika variabler angående matchbilden i en fotbollsmatch påverkar slutresultatet. Dessa variabler är uppdelade i prestationsvariabler och kvalitétsvariabler. Prestationsvariablerna är baserade på prestationsindikatorer inspirerat av Hughes och Bartlett (2002). Kvalitétsvariablerna förklarar hur bra de olika lagen är. Som verktyg för att uppnå syftet används olika klassificeringsmodeller utifrån både prestationsvariablerna och kvalitétsvariablerna. Först undersöktes vilka prestationsindikatorer som var viktigast. Den bästa modellen klassificerade cirka 60 % rätt och rensningar och skott på mål var de viktigaste prestationsvariablerna. Sedan undersöktes vilka prediktionsvariabler som var bäst. Den bästa modellen klassificerade rätt slutresultat cirka 88 % av matcherna. Utifrån vad författarna ansågs vara de viktigaste prediktionsvariablerna skapades en prediktionsmodell med färre variabler. Denna lyckades klassificera rätt cirka 86 % av matcherna. Prediktionsmodellen var konstruerad med spelarbetyg, odds på oavgjort och domare.
APA, Harvard, Vancouver, ISO, and other styles
18

Nascimento, Diego Silveira Costa. "Novas abordagens para configura??es autom?ticas dos par?metros de controle em comit?s de classificadores." Universidade Federal do Rio Grande do Norte, 2014. http://repositorio.ufrn.br/handle/123456789/19754.

Full text
Abstract:
Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2016-02-03T20:29:00Z No. of bitstreams: 1 DiegoSilveiraCostaNascimento_TESE.pdf: 3953454 bytes, checksum: 3237fa5d0296298ccc738a2ba7eab05e (MD5)
Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2016-02-03T23:54:37Z (GMT) No. of bitstreams: 1 DiegoSilveiraCostaNascimento_TESE.pdf: 3953454 bytes, checksum: 3237fa5d0296298ccc738a2ba7eab05e (MD5)
Made available in DSpace on 2016-02-03T23:54:37Z (GMT). No. of bitstreams: 1 DiegoSilveiraCostaNascimento_TESE.pdf: 3953454 bytes, checksum: 3237fa5d0296298ccc738a2ba7eab05e (MD5) Previous issue date: 2014-12-05
Significativos avan?os v?m surgindo em pesquisas relacionadas ao tema de Comit?s de Classificadores. Os modelos que mais recebem aten??o na literatura s?o aqueles de natureza est?tica, ou tamb?m conhecidos por ensembles. Dos algoritmos que fazem parte dessa classe, destacam-se os m?todos que utilizam reamostragem dos dados de treinamento: Bagging, Boosting e Multiboosting. A escolha do tipo de arquitetura e dos componentes a serem recrutados n?o ? uma tarefa trivial, e tem motivado, ainda mais, o surgimento de novas propostas na tentativa de se construir tais modelos de forma autom?tica e, muitas delas, s?o baseadas em m?todos de otimiza??o. Muitas dessas contribui??es n?o t?m apresentado resultados satisfat?rios quando aplicadas a problemas mais complexos ou de natureza distinta. Em contrapartida, a tese aqui apresentada prop?e tr?s novas abordagens h?bridas para constru??o autom?tica em ensembles de classificadores: Incremento de Diversidade, Fun??o de Avalia??o Adaptativa e Meta-aprendizado para a elabora??o de sistemas de configura??o autom?tica dos par?metros de controle para os modelos de ensemble. Na primeira abordagem, ? proposta uma solu??o que combina diferentes t?cnicas de diversidade em um ?nico arcabou?o conceitual, na tentativa de se alcan?ar n?veis mais elevados de diversidade em ensemble, e com isso, melhor o desempenho de tais sistemas. J? na segunda abordagem, ? utilizado um algoritmo gen?tico para o design autom?tico de ensembles. A contribui??o consiste em combinar as t?cnicas de filtro e wrapper de forma adaptativa para evoluir uma melhor distribui??o do espa?o de atributos a serem apresentados aos componentes de um ensemble. E por fim, a ?ltima abordagem, que prop?e uma nova t?cnica de recomenda??o de arquitetura e componentes base em ensemble, via t?cnicas de meta-aprendizado tradicional e multirr?tulo. De forma geral os resultados s?o animadores, e corroboram com a tese de que ferramentas h?bridas s?o uma poderosa solu??o na constru??o de ensembles eficazes em problemas de classifica??o de padr?es
Significant advances have emerged in research related to the topic of Classifier Committees. The models that receive the most attention in the literature are those of the static nature, also known as ensembles. The algorithms that are part of this class, we highlight the methods that using techniques of resampling of the training data: Bagging, Boosting and Multiboosting. The choice of the architecture and base components to be recruited is not a trivial task and has motivated new proposals in an attempt to build such models automatically, and many of them are based on optimization methods. Many of these contributions have not shown satisfactory results when applied to more complex problems with different nature. In contrast, the thesis presented here, proposes three new hybrid approaches for automatic construction for ensembles: Increment of Diversity, Adaptive-fitness Function and Meta-learning for the development of systems for automatic configuration of parameters for models of ensemble. In the first one approach, we propose a solution that combines different diversity techniques in a single conceptual framework, in attempt to achieve higher levels of diversity in ensembles, and with it, the better the performance of such systems. In the second one approach, using a genetic algorithm for automatic design of ensembles. The contribution is to combine the techniques of filter and wrapper adaptively to evolve a better distribution of the feature space to be presented for the components of ensemble. Finally, the last one approach, which proposes new techniques for recommendation of architecture and based components on ensemble, by techniques of traditional meta-learning and multi-label meta-learning. In general, the results are encouraging and corroborate with the thesis that hybrid tools are a powerful solution in building effective ensembles for pattern classification problems.
APA, Harvard, Vancouver, ISO, and other styles
19

Thorén, Daniel. "Radar based tank level measurement using machine learning : Agricultural machines." Thesis, Linköpings universitet, Programvara och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176259.

Full text
Abstract:
Agriculture is becoming more dependent on computerized solutions to make thefarmer’s job easier. The big step that many companies are working towards is fullyautonomous vehicles that work the fields. To that end, the equipment fitted to saidvehicles must also adapt and become autonomous. Making this equipment autonomoustakes many incremental steps, one of which is developing an accurate and reliable tanklevel measurement system. In this thesis, a system for tank level measurement in a seedplanting machine is evaluated. Traditional systems use load cells to measure the weightof the tank however, these types of systems are expensive to build and cumbersome torepair. They also add a lot of weight to the equipment which increases the fuel consump-tion of the tractor. Thus, this thesis investigates the use of radar sensors together witha number of Machine Learning algorithms. Fourteen radar sensors are fitted to a tankat different positions, data is collected, and a preprocessing method is developed. Then,the data is used to test the following Machine Learning algorithms: Bagged RegressionTrees (BG), Random Forest Regression (RF), Boosted Regression Trees (BRT), LinearRegression (LR), Linear Support Vector Machine (L-SVM), Multi-Layer Perceptron Re-gressor (MLPR). The model with the best 5-fold crossvalidation scores was Random For-est, closely followed by Boosted Regression Trees. A robustness test, using 5 previouslyunseen scenarios, revealed that the Boosted Regression Trees model was the most robust.The radar position analysis showed that 6 sensors together with the MLPR model gavethe best RMSE scores.In conclusion, the models performed well on this type of system which shows thatthey might be a competitive alternative to load cell based systems.
APA, Harvard, Vancouver, ISO, and other styles
20

"Estimação não parametrica aplicada a problemas de classificação via Bagging e Boosting." Tese, Biblioteca Digital da Unicamp, 2004. http://libdigi.unicamp.br/document/?code=vtls000316781.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Nožička, Michal. "Ensemble learning metody pro vývoj skóringových modelů." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-382813.

Full text
Abstract:
Credit scoring is very important process in banking industry during which each potential or current client is assigned credit score that in certain way expresses client's probability of default, i.e. failing to meet his or her obligations on time or in full amount. This is a cornerstone of credit risk management in banking industry. Traditionally, statistical models (such as logistic regression model) are used for credit scoring in practice. Despite many advantages of such approach, recent research shows many alternatives that are in some ways superior to those traditional models. This master thesis is focused on introducing ensemble learning models (in particular constructed by using bagging, boosting and stacking algorithms) with various base models (in particular logistic regression, random forest, support vector machines and artificial neural network) as possible alternatives and challengers to traditional statistical models used for credit scoring and compares their advantages and disadvantages. Accuracy and predictive power of those scoring models is examined using standard measures of accuracy and predictive power in credit scoring field (in particular GINI coefficient and LIFT coefficient) on a real world dataset and obtained results are presented. The main result of this comparative study is that...
APA, Harvard, Vancouver, ISO, and other styles
22

Krugell, Marike. "Bias reduction studies in nonparametric regression with applications : an empirical approach / Marike Krugell." Thesis, 2014. http://hdl.handle.net/10394/15345.

Full text
Abstract:
The purpose of this study is to determine the effect of three improvement methods on nonparametric kernel regression estimators. The improvement methods are applied to the Nadaraya-Watson estimator with crossvalidation bandwidth selection, the Nadaraya-Watson estimator with plug-in bandwidth selection, the local linear estimator with plug-in bandwidth selection and a bias corrected nonparametric estimator proposed by Yao (2012). The di erent resulting regression estimates are evaluated by minimising a global discrepancy measure, i.e. the mean integrated squared error (MISE). In the machine learning context various improvement methods, in terms of the precision and accuracy of an estimator, exist. The rst two improvement methods introduced in this study are bootstrapped based. Bagging is an acronym for bootstrap aggregating and was introduced by Breiman (1996a) from a machine learning viewpoint and by Swanepoel (1988, 1990) in a functional context. Bagging is primarily a variance reduction tool, i.e. bagging is implemented to reduce the variance of an estimator and in this way improve the precision of the estimation process. Bagging is performed by drawing repetitive bootstrap samples from the original sample and generating multiple versions of an estimator. These replicates of the estimator are then used to obtain an aggregated estimator. Bragging stands for bootstrap robust aggregating. A robust estimator is obtained by using the sample median over the B bootstrap estimates instead of the sample mean as in bagging. The third improvement method aims to reduce the bias component of the estimator and is referred to as boosting. Boosting is a general method for improving the accuracy of any given learning algorithm. The method starts of with a sensible estimator and improves iteratively, based on its performance on a training dataset. Results and conclusions verifying existing literature are provided, as well as new results for the new methods.
MSc (Statistics), North-West University, Potchefstroom Campus, 2015
APA, Harvard, Vancouver, ISO, and other styles
23

Rodríguez, Hernán Cortés. "Ensemble classifiers in remote sensing: a comparative analysis." Master's thesis, 2014. http://hdl.handle.net/10362/11671.

Full text
Abstract:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Land Cover and Land Use (LCLU) maps are very important tools for understanding the relationships between human activities and the natural environment. Defining accurately all the features over the Earth's surface is essential to assure their management properly. The basic data which are being used to derive those maps are remote sensing imagery (RSI), and concretely, satellite images. Hence, new techniques and methods able to deal with those data and at the same time, do it accurately, have been demanded. In this work, our goal was to have a brief review over some of the currently approaches in the scientific community to face this challenge, to get higher accuracy in LCLU maps. Although, we will be focus on the study of the classifiers ensembles and the different strategies that those ensembles present in the literature. We have proposed different ensembles strategies based in our data and previous work, in order to increase the accuracy of previous LCLU maps made by using the same data and single classifiers. Finally, only one of the ensembles proposed have got significantly higher accuracy, in the classification of LCLU map, than the better single classifier performance with the same data. Also, it was proved that diversity did not play an important role in the success of this ensemble.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography