To see the other types of publications on this topic, follow the link: Ensemble Based Classification.

Dissertations / Theses on the topic 'Ensemble Based Classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 28 dissertations / theses for your research on the topic 'Ensemble Based Classification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

WANDEKOKEN, E. D. "Support Vector Machine Ensemble Based on Feature and Hyperparameter Variation." Universidade Federal do Espírito Santo, 2011. http://repositorio.ufes.br/handle/10/4234.

Full text
Abstract:
Made available in DSpace on 2016-08-29T15:33:14Z (GMT). No. of bitstreams: 1 tese_4163_.pdf: 479699 bytes, checksum: 04f01a137084c0859b4494de6db8b3ac (MD5) Previous issue date: 2011-02-23
Classificadores do tipo máquina de vetores de suporte (SVM) são atualmente considerados uma das técnicas mais poderosas para se resolver problemas de classificação com duas classes. Para aumentar o desempenho alcançado por classificadores SVM individuais, uma abordagem bem estabelecida é usar uma combinação de SVMs, a qual corresponde a um conjunto de classificadores SVMs que são, simultaneamente, individualmente precisos e coletivamente divergentes em suas decisões. Este trabalho propõe uma abordagem para se criar combinações de SVMs, baseada em um processo de três estágios. Inicialmente, são usadas execuções complementares de uma busca baseada em algoritmos genéticos (GEFS), com o objetivo de investigar globalmente o espaço de características para definir um conjunto de subconjuntos de características. Em seguida, para cada um desses subconjuntos de características definidos, uma SVM que usa parâmetros otimizados é construída. Por fim, é empregada uma busca local com o objetivo de selecionar um subconjunto otimizado dessas SVMs, e assim formar a combinação de SVMs que é finalmente produzida. Os experimentos foram realizados num contexto de detecção de defeitos em máquinas industriais. Foram usados 2000 exemplos de sinais de vibração de moto bombas instaladas em plataformas de petróleo. Os experimentos realizados mostram que o método proposto para se criar combinação de SVMs apresentou um desempenho superior em comparação a outras abordagens de classificação bem estabelecidas.
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Enezi, Jamal. "Artificial immune systems based committee machine for classification application." Thesis, Brunel University, 2012. http://bura.brunel.ac.uk/handle/2438/6826.

Full text
Abstract:
A new adaptive learning Artificial Immune System (AIS) based committee machine is developed in this thesis. The new proposed approach efficiently tackles the general problem of clustering high-dimensional data. In addition, it helps on deriving useful decision and results related to other application domains such classification and prediction. Artificial Immune System (AIS) is a branch of computational intelligence field inspired by the biological immune system, and has gained increasing interest among researchers in the development of immune-based models and techniques to solve diverse complex computational or engineering problems. This work presents some applications of AIS techniques to health problems, and a thorough survey of existing AIS models and algorithms. The main focus of this research is devoted to building an ensemble model integrating different AIS techniques (i.e. Artificial Immune Networks, Clonal Selection, and Negative Selection) for classification applications to achieve better classification results. A new AIS-based ensemble architecture with adaptive learning features is proposed by integrating different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the combination of these techniques. Various techniques related to the design and enhancements of the new adaptive learning architecture are studied, including a neuro-fuzzy based detector and an optimizer using particle swarm optimization method to achieve enhanced classification performance. An evaluation study was conducted to show the performance of the new proposed adaptive learning ensemble and to compare it to alternative combining techniques. Several experiments are presented using different medical datasets for the classification problem and findings and outcomes are discussed. The new adaptive learning architecture improves the accuracy of the ensemble. Moreover, there is an improvement over the existing aggregation techniques. The outcomes, assumptions and limitations of the proposed methods with its implications for further research in this area draw this research to its conclusion.
APA, Harvard, Vancouver, ISO, and other styles
3

Börthas, Lovisa, and Sjölander Jessica Krange. "Machine Learning Based Prediction and Classification for Uplift Modeling." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-266379.

Full text
Abstract:
The desire to model the true gain from targeting an individual in marketing purposes has lead to the common use of uplift modeling. Uplift modeling requires the existence of a treatment group as well as a control group and the objective hence becomes estimating the difference between the success probabilities in the two groups. Efficient methods for estimating the probabilities in uplift models are statistical machine learning methods. In this project the different uplift modeling approaches Subtraction of Two Models, Modeling Uplift Directly and the Class Variable Transformation are investigated. The statistical machine learning methods applied are Random Forests and Neural Networks along with the standard method Logistic Regression. The data is collected from a well established retail company and the purpose of the project is thus to investigate which uplift modeling approach and statistical machine learning method that yields in the best performance given the data used in this project. The variable selection step was shown to be a crucial component in the modeling processes as so was the amount of control data in each data set. For the uplift to be successful, the method of choice should be either the Modeling Uplift Directly using Random Forests, or the Class Variable Transformation using Logistic Regression. Neural network - based approaches are sensitive to uneven class distributions and is hence not able to obtain stable models given the data used in this project. Furthermore, the Subtraction of Two Models did not perform well due to the fact that each model tended to focus too much on modeling the class in both data sets separately instead of modeling the difference between the class probabilities. The conclusion is hence to use an approach that models the uplift directly, and also to use a great amount of control data in each data set.
Behovet av att kunna modellera den verkliga vinsten av riktad marknadsföring har lett till den idag vanligt förekommande metoden inkrementell responsanalys. För att kunna utföra denna typ av metod krävs förekomsten av en existerande testgrupp samt kontrollgrupp och målet är således att beräkna differensen mellan de positiva utfallen i de två grupperna. Sannolikheten för de positiva utfallen för de två grupperna kan effektivt estimeras med statistiska maskininlärningsmetoder. De inkrementella responsanalysmetoderna som undersöks i detta projekt är subtraktion av två modeller, att modellera den inkrementella responsen direkt samt en klassvariabeltransformation. De statistiska maskininlärningsmetoderna som tillämpas är random forests och neurala nätverk samt standardmetoden logistisk regression. Datan är samlad från ett väletablerat detaljhandelsföretag och målet är därmed att undersöka vilken inkrementell responsanalysmetod och maskininlärningsmetod som presterar bäst givet datan i detta projekt. De mest avgörande aspekterna för att få ett bra resultat visade sig vara variabelselektionen och mängden kontrolldata i varje dataset. För att få ett lyckat resultat bör valet av maskininlärningsmetod vara random forests vilken används för att modellera den inkrementella responsen direkt, eller logistisk regression tillsammans med en klassvariabeltransformation. Neurala nätverksmetoder är känsliga för ojämna klassfördelningar och klarar därmed inte av att erhålla stabila modeller med den givna datan. Vidare presterade subtraktion av två modeller dåligt på grund av att var modell tenderade att fokusera för mycket på att modellera klassen i båda dataseten separat, istället för att modellera differensen mellan dem. Slutsatsen är således att en metod som modellerar den inkrementella responsen direkt samt en relativt stor kontrollgrupp är att föredra för att få ett stabilt resultat.
APA, Harvard, Vancouver, ISO, and other styles
4

Feng, Wei. "Investigation of training data issues in ensemble classification based on margin concept : application to land cover mapping." Thesis, Bordeaux 3, 2017. http://www.theses.fr/2017BOR30016/document.

Full text
Abstract:
La classification a été largement étudiée en apprentissage automatique. Les méthodes d’ensemble, qui construisent un modèle de classification en intégrant des composants d’apprentissage multiples, atteignent des performances plus élevées que celles d’un classifieur individuel. La précision de classification d’un ensemble est directement influencée par la qualité des données d’apprentissage utilisées. Cependant, les données du monde réel sont souvent affectées par les problèmes de bruit d’étiquetage et de déséquilibre des données. La marge d'ensemble est un concept clé en apprentissage d'ensemble. Elle a été utilisée aussi bien pour l'analyse théorique que pour la conception d'algorithmes d'apprentissage automatique. De nombreuses études ont montré que la performance de généralisation d'un classifieur ensembliste est liée à la distribution des marges de ses exemples d'apprentissage. Ce travail se focalise sur l'exploitation du concept de marge pour améliorer la qualité de l'échantillon d'apprentissage et ainsi augmenter la précision de classification de classifieurs sensibles au bruit, et pour concevoir des ensembles de classifieurs efficaces capables de gérer des données déséquilibrées. Une nouvelle définition de la marge d'ensemble est proposée. C'est une version non supervisée d'une marge d'ensemble populaire. En effet, elle ne requière pas d'étiquettes de classe. Les données d'apprentissage mal étiquetées sont un défi majeur pour la construction d'un classifieur robuste que ce soit un ensemble ou pas. Pour gérer le problème d'étiquetage, une méthode d'identification et d'élimination du bruit d'étiquetage utilisant la marge d'ensemble est proposée. Elle est basée sur un algorithme existant d'ordonnancement d'instances erronées selon un critère de marge. Cette méthode peut atteindre un taux élevé de détection des données mal étiquetées tout en maintenant un taux de fausses détections aussi bas que possible. Elle s'appuie sur les valeurs de marge des données mal classifiées, considérant quatre différentes marges d'ensemble, incluant la nouvelle marge proposée. Elle est étendue à la gestion de la correction du bruit d'étiquetage qui est un problème plus complexe. Les instances de faible marge sont plus importantes que les instances de forte marge pour la construction d'un classifieur fiable. Un nouvel algorithme, basé sur une fonction d'évaluation de l'importance des données, qui s'appuie encore sur la marge d'ensemble, est proposé pour traiter le problème de déséquilibre des données. Cette méthode est évaluée, en utilisant encore une fois quatre différentes marges d'ensemble, vis à vis de sa capacité à traiter le problème de déséquilibre des données, en particulier dans un contexte multi-classes. En télédétection, les erreurs d'étiquetage sont inévitables car les données d'apprentissage sont typiquement issues de mesures de terrain. Le déséquilibre des données d'apprentissage est un autre problème fréquent en télédétection. Les deux méthodes d'ensemble proposées, intégrant la définition de marge la plus pertinente face à chacun de ces deux problèmes majeurs affectant les données d'apprentissage, sont appliquées à la cartographie d'occupation du sol
Classification has been widely studied in machine learning. Ensemble methods, which build a classification model by integrating multiple component learners, achieve higher performances than a single classifier. The classification accuracy of an ensemble is directly influenced by the quality of the training data used. However, real-world data often suffers from class noise and class imbalance problems. Ensemble margin is a key concept in ensemble learning. It has been applied to both the theoretical analysis and the design of machine learning algorithms. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. This work focuses on exploiting the margin concept to improve the quality of the training set and therefore to increase the classification accuracy of noise sensitive classifiers, and to design effective ensemble classifiers that can handle imbalanced datasets. A novel ensemble margin definition is proposed. It is an unsupervised version of a popular ensemble margin. Indeed, it does not involve the class labels. Mislabeled training data is a challenge to face in order to build a robust classifier whether it is an ensemble or not. To handle the mislabeling problem, we propose an ensemble margin-based class noise identification and elimination method based on an existing margin-based class noise ordering. This method can achieve a high mislabeled instance detection rate while keeping the false detection rate as low as possible. It relies on the margin values of misclassified data, considering four different ensemble margins, including the novel proposed margin. This method is extended to tackle the class noise correction which is a more challenging issue. The instances with low margins are more important than safe samples, which have high margins, for building a reliable classifier. A novel bagging algorithm based on a data importance evaluation function relying again on the ensemble margin is proposed to deal with the class imbalance problem. In our algorithm, the emphasis is placed on the lowest margin samples. This method is evaluated using again four different ensemble margins in addressing the imbalance problem especially on multi-class imbalanced data. In remote sensing, where training data are typically ground-based, mislabeled training data is inevitable. Imbalanced training data is another problem frequently encountered in remote sensing. Both proposed ensemble methods involving the best margin definition for handling these two major training data issues are applied to the mapping of land covers
APA, Harvard, Vancouver, ISO, and other styles
5

Alshahrani, Saeed Sultan. "Detection, classification and control of power quality disturbances based on complementary ensemble empirical mode decomposition and artificial neural networks." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/15872.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Xin. "Gaze based weakly supervised localization for image classification : application to visual recognition in a food dataset." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066577/document.

Full text
Abstract:
Dans cette dissertation, nous discutons comment utiliser les données du regard humain pour améliorer la performance du modèle d'apprentissage supervisé faible dans la classification des images. Le contexte de ce sujet est à l'ère de la technologie de l'information en pleine croissance. En conséquence, les données à analyser augmentent de façon spectaculaire. Étant donné que la quantité de données pouvant être annotées par l'humain ne peut pas tenir compte de la quantité de données elle-même, les approches d'apprentissage supervisées bien développées actuelles peuvent faire face aux goulets d'étranglement l'avenir. Dans ce contexte, l'utilisation de annotations faibles pour les méthodes d'apprentissage à haute performance est digne d'étude. Plus précisément, nous essayons de résoudre le problème à partir de deux aspects: l'un consiste à proposer une annotation plus longue, un regard de suivi des yeux humains, comme une annotation alternative par rapport à l'annotation traditionnelle longue, par exemple boîte de délimitation. L'autre consiste à intégrer l'annotation du regard dans un système d'apprentissage faiblement supervisé pour la classification de l'image. Ce schéma bénéficie de l'annotation du regard pour inférer les régions contenant l'objet cible. Une propriété utile de notre modèle est qu'elle exploite seulement regardez pour la formation, alors que la phase de test est libre de regard. Cette propriété réduit encore la demande d'annotations. Les deux aspects isolés sont liés ensemble dans nos modèles, ce qui permet d'obtenir des résultats expérimentaux compétitifs
In this dissertation, we discuss how to use the human gaze data to improve the performance of the weak supervised learning model in image classification. The background of this topic is in the era of rapidly growing information technology. As a consequence, the data to analyze is also growing dramatically. Since the amount of data that can be annotated by the human cannot keep up with the amount of data itself, current well-developed supervised learning approaches may confront bottlenecks in the future. In this context, the use of weak annotations for high-performance learning methods is worthy of study. Specifically, we try to solve the problem from two aspects: One is to propose a more time-saving annotation, human eye-tracking gaze, as an alternative annotation with respect to the traditional time-consuming annotation, e.g. bounding box. The other is to integrate gaze annotation into a weakly supervised learning scheme for image classification. This scheme benefits from the gaze annotation for inferring the regions containing the target object. A useful property of our model is that it only exploits gaze for training, while the test phase is gaze free. This property further reduces the demand of annotations. The two isolated aspects are connected together in our models, which further achieve competitive experimental results
APA, Harvard, Vancouver, ISO, and other styles
7

Xia, Junshi. "Multiple classifier systems for the classification of hyperspectral data." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENT047/document.

Full text
Abstract:
Dans cette thèse, nous proposons plusieurs nouvelles techniques pour la classification d'images hyperspectrales basées sur l'apprentissage d'ensemble. Le cadre proposé introduit des innovations importantes par rapport aux approches précédentes dans le même domaine, dont beaucoup sont basées principalement sur un algorithme individuel. Tout d'abord, nous proposons d'utiliser la Forêt de Rotation (Rotation Forest) avec différentes techiniques d'extraction de caractéristiques linéaire et nous comparons nos méthodes avec les approches d'ensemble traditionnelles, tels que Bagging, Boosting, Sous-espace Aléatoire et Forêts Aléatoires. Ensuite, l'intégration des machines à vecteurs de support (SVM) avec le cadre de sous-espace de rotation pour la classification de contexte est étudiée. SVM et sous-espace de rotation sont deux outils puissants pour la classification des données de grande dimension. C'est pourquoi, la combinaison de ces deux méthodes peut améliorer les performances de classification. Puis, nous étendons le travail de la Forêt de Rotation en intégrant la technique d'extraction de caractéristiques locales et l'information contextuelle spatiale avec un champ de Markov aléatoire (MRF) pour concevoir des méthodes spatio-spectrale robustes. Enfin, nous présentons un nouveau cadre général, ensemble de sous-espace aléatoire, pour former une série de classifieurs efficaces, y compris les arbres de décision et la machine d'apprentissage extrême (ELM), avec des profils multi-attributs étendus (EMaPS) pour la classification des données hyperspectrales. Six méthodes d'ensemble de sous-espace aléatoire, y compris les sous-espaces aléatoires avec les arbres de décision, Forêts Aléatoires (RF), la Forêt de Rotation (RoF), la Forêt de Rotation Aléatoires (Rorf), RS avec ELM (RSELM) et sous-espace de rotation avec ELM (RoELM), sont construits par multiples apprenants de base. L'efficacité des techniques proposées est illustrée par la comparaison avec des méthodes de l'état de l'art en utilisant des données hyperspectrales réelles dans de contextes différents
In this thesis, we propose several new techniques for the classification of hyperspectral remote sensing images based on multiple classifier system (MCS). Our proposed framework introduces significant innovations with regards to previous approaches in the same field, many of which are mainly based on an individual algorithm. First, we propose to use Rotation Forests with several linear feature extraction and compared them with the traditional ensemble approaches, such as Bagging, Boosting, Random subspace and Random Forest. Second, the integration of the support vector machines (SVM) with Rotation subspace framework for context classification is investigated. SVM and Rotation subspace are two powerful tools for high-dimensional data classification. Therefore, combining them can further improve the classification performance. Third, we extend the work of Rotation Forests by incorporating local feature extraction technique and spatial contextual information with Markov random Field (MRF) to design robust spatial-spectral methods. Finally, we presented a new general framework, Random subspace ensemble, to train series of effective classifiers, including decision trees and extreme learning machine (ELM), with extended multi-attribute profiles (EMAPs) for classifying hyperspectral data. Six RS ensemble methods, including Random subspace with DT (RSDT), Random Forest (RF), Rotation Forest (RoF), Rotation Random Forest (RoRF), RS with ELM (RSELM) and Rotation subspace with ELM (RoELM), are constructed by the multiple base learners. The effectiveness of the proposed techniques is illustrated by comparing with state-of-the-art methods by using real hyperspectral data sets with different contexts
APA, Harvard, Vancouver, ISO, and other styles
8

Al-Mter, Yusur. "Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166139.

Full text
Abstract:
Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs.
APA, Harvard, Vancouver, ISO, and other styles
9

Thames, John Lane. "Advancing cyber security with a semantic path merger packet classification algorithm." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45872.

Full text
Abstract:
This dissertation investigates and introduces novel algorithms, theories, and supporting frameworks to significantly improve the growing problem of Internet security. A distributed firewall and active response architecture is introduced that enables any device within a cyber environment to participate in the active discovery and response of cyber attacks. A theory of semantic association systems is developed for the general problem of knowledge discovery in data. The theory of semantic association systems forms the basis of a novel semantic path merger packet classification algorithm. The theoretical aspects of the semantic path merger packet classification algorithm are investigated, and the algorithm's hardware-based implementation is evaluated along with comparative analysis versus content addressable memory. Experimental results show that the hardware implementation of the semantic path merger algorithm significantly outperforms content addressable memory in terms of energy consumption and operational timing.
APA, Harvard, Vancouver, ISO, and other styles
10

Ekelund, Måns. "Uncertainty Estimation for Deep Learning-based LPI Radar Classification : A Comparative Study of Bayesian Neural Networks and Deep Ensembles." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301653.

Full text
Abstract:
Deep Neural Networks (DNNs) have shown promising results in classifying known Low-probability-of-intercept (LPI) radar signals in noisy environments. However, regular DNNs produce low-quality confidence and uncertainty estimates, making them unreliable, which inhibit deployment in real-world settings. Hence, the need for robust uncertainty estimation methods has grown, and two categories emerged, Bayesian approximation and ensemble learning. As autonomous LPI radar classification is deployed in safety-critical environments, this study compares Bayesian Neural Networks (BNNs) and Deep Ensembles (DEs) as uncertainty estimation methods. We synthetically generate a training and test data set, as well as a shifted data set where subtle changes are made to the signal parameters. The methods are evaluated on predictive performance, relevant confidence and uncertainty estimation metrics, and method-related metrics such as model size, training, and inference time. Our results show that our DE achieves slightly higher predictive performance than the BNN on both in-distribution and shifted data with an accuracy of 74% and 32%, respectively. Further, we show that both methods exhibit more cautiousness in their predictions compared to a regular DNN for in-distribution data, while the confidence quality significantly degrades on shifted data. Uncertainty in predictions is evaluated as predictive entropy, and we show that both methods exhibit higher uncertainty on shifted data. We also show that the signal-to-noise ratio affects uncertainty compared to a regular DNN. However, none of the methods exhibit uncertainty when making predictions on unseen signal modulation patterns, which is not a desirable behavior. Further, we conclude that the amount of available resources could influence the choice of the method since DEs are resource-heavy, requiring more memory than a regular DNN or BNN. On the other hand, the BNN requires a far longer training time.
Tidigare studier har visat att djupa neurala nätverk (DNN) kan klassificera signalmönster för en speciell typ av radar (LPI) som är skapad för att vara svår att identifiera och avlyssna. Traditionella neurala nätverk saknar dock ett naturligt sätt att skatta osäkerhet, vilket skadar deras pålitlighet och förhindrar att de används i säkerhetskritiska miljöer. Osäkerhetsskattning för djupinlärning har därför vuxit och på senare tid blivit ett stort område med två tydliga kategorier, Bayesiansk approximering och ensemblemetoder. LPI radarklassificering är av stort intresse för försvarsindustrin, och tekniken kommer med största sannolikhet att appliceras i säkerhetskritiska miljöer. I denna studie jämför vi Bayesianska neurala nätverk och djupa ensembler för LPI radarklassificering. Resultaten från studien pekar på att en djup ensemble uppnår högre träffsäkerhet än ett Bayesianskt neuralt nätverk och att båda metoderna uppvisar återhållsamhet i sina förutsägelser jämfört med ett traditionellt djupt neuralt nätverk. Vi skattar osäkerhet som entropi och visar att osäkerheten i metodernas slutledningar ökar både på höga brusnivåer och på data som är något förskjuten från den kända datadistributionen. Resultaten visar dock att metodernas osäkerhet inte ökar jämfört med ett vanligt nätverk när de får se tidigare osedda signal mönster. Vi visar också att val av metod kan influeras av tillgängliga resurser, eftersom djupa ensembler kräver mycket minne jämfört med ett traditionellt eller Bayesianskt neuralt nätverk.
APA, Harvard, Vancouver, ISO, and other styles
11

Ala'raj, Maher A. "A credit scoring model based on classifiers consensus system approach." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/13669.

Full text
Abstract:
Managing customer credit is an important issue for each commercial bank; therefore, banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. The manual estimation of customer creditworthiness has become both time- and resource-consuming. Moreover, a manual approach is subjective (dependable on the bank employee who gives this estimation), which is why devising and implementing programming models that provide loan estimations is the only way of eradicating the ‘human factor’ in this problem. This model should give recommendations to the bank in terms of whether or not a loan should be given, or otherwise can give a probability in relation to whether the loan will be returned. Nowadays, a number of models have been designed, but there is no ideal classifier amongst these models since each gives some percentage of incorrect outputs; this is a critical consideration when each percent of incorrect answer can mean millions of dollars of losses for large banks. However, the LR remains the industry standard tool for credit-scoring models development. For this purpose, an investigation is carried out on the combination of the most efficient classifiers in credit-scoring scope in an attempt to produce a classifier that exceeds each of its classifiers or components. In this work, a fusion model referred to as ‘the Classifiers Consensus Approach’ is developed, which gives a lot better performance than each of single classifiers that constitute it. The difference of the consensus approach and the majority of other combiners lie in the fact that the consensus approach adopts the model of real expert group behaviour during the process of finding the consensus (aggregate) answer. The consensus model is compared not only with single classifiers, but also with traditional combiners and a quite complex combiner model known as the ‘Dynamic Ensemble Selection’ approach. As a pre-processing technique, step data-filtering (select training entries which fits input data well and remove outliers and noisy data) and feature selection (remove useless and statistically insignificant features which values are low correlated with real quality of loan) are used. These techniques are valuable in significantly improving the consensus approach results. Results clearly show that the consensus approach is statistically better (with 95% confidence value, according to Friedman test) than any other single classifier or combiner analysed; this means that for similar datasets, there is a 95% guarantee that the consensus approach will outperform all other classifiers. The consensus approach gives not only the best accuracy, but also better AUC value, Brier score and H-measure for almost all datasets investigated in this thesis. Moreover, it outperformed Logistic Regression. Thus, it has been proven that the use of the consensus approach for credit-scoring is justified and recommended in commercial banks. Along with the consensus approach, the dynamic ensemble selection approach is analysed, the results of which show that, under some conditions, the dynamic ensemble selection approach can rival the consensus approach. The good sides of dynamic ensemble selection approach include its stability and high accuracy on various datasets. The consensus approach, which is improved in this work, may be considered in banks that hold the same characteristics of the datasets used in this work, where utilisation could decrease the level of mistakenly rejected loans of solvent customers, and the level of mistakenly accepted loans that are never to be returned. Furthermore, the consensus approach is a notable step in the direction of building a universal classifier that can fit data with any structure. Another advantage of the consensus approach is its flexibility; therefore, even if the input data is changed due to various reasons, the consensus approach can be easily re-trained and used with the same performance.
APA, Harvard, Vancouver, ISO, and other styles
12

Dam, Hai Huong Information Technology &amp Electrical Engineering Australian Defence Force Academy UNSW. "A scalable evolutionary learning classifier system for knowledge discovery in stream data mining." Awarded by:University of New South Wales - Australian Defence Force Academy, 2008. http://handle.unsw.edu.au/1959.4/38865.

Full text
Abstract:
Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks.
APA, Harvard, Vancouver, ISO, and other styles
13

Liu, Yongwen. "Cloud services selection based on rough set theory." Thesis, Troyes, 2016. http://www.theses.fr/2016TROY0018/document.

Full text
Abstract:
Avec le développement du cloud computing, de nouveaux services voient le jour et il devient primordial que les utilisateurs aient les outils nécessaires pour choisir parmi ses services. La théorie des ensembles approximatifs représente un bon outil de traitement de données incertaines. Elle peut exploiter les connaissances cachées ou appliquer des règles sur des ensembles de données. Le but principal de cette thèse est d'utiliser la théorie des ensembles approximatifs pour aider les utilisateurs de cloud computing à prendre des décisions. Dans ce travail, nous avons, d'une part, proposé un cadre utilisant la théorie des ensembles approximatifs pour la sélection de services cloud et nous avons donné un exemple en utilisant les ensembles approximatifs dans la sélection de services cloud pour illustrer la pratique et analyser la faisabilité de cette approche. Deuxièmement, l'approche proposée de sélection des services cloud permet d’évaluer l’importance des paramètres en fonction des préférences de l'utilisateur à l'aide de la théorie des ensembles approximatifs. Enfin, nous avons effectué des validations par simulation de l’algorithme proposé sur des données à large échelle pour vérifier la faisabilité de notre approche en pratique. Les résultats de notre travail peuvent aider les utilisateurs de services cloud à prendre la bonne décision et aider également les fournisseurs de services cloud pour cibler les améliorations à apporter aux services qu’ils proposent dans le cadre du cloud computing
With the development of the cloud computing technique, users enjoy various benefits that high technology services bring. However, there are more and more cloud service programs emerging. So it is important for users to choose the right cloud service. For cloud service providers, it is also important to improve the cloud services they provide, in order to get more customers and expand the scale of their cloud services.Rough set theory is a good data processing tool to deal with uncertain information. It can mine the hidden knowledge or rules on data sets. The main purpose of this thesis is to apply rough set theory to help cloud users make decision about cloud services. In this work, firstly, a framework using the rough set theory in cloud service selection is proposed, and we give an example using rough set in cloud services selection to illustrate and analyze the feasibility of our approach. Secondly, the proposed cloud services selection approach has been used to evaluate parameters importance based on the users’ preferences. Finally, we perform experiments on large scale dataset to verity the feasibility of our proposal.The performance results can help cloud service users to make the right decision and help cloud service providers to target the improvement about their cloud services
APA, Harvard, Vancouver, ISO, and other styles
14

Rasheed, Sarbast. "A Multiclassifier Approach to Motor Unit Potential Classification for EMG Signal Decomposition." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/934.

Full text
Abstract:
EMG signal decomposition is the process of resolving a composite EMG signal into its constituent motor unit potential trains (classes) and it can be configured as a classification problem. An EMG signal detected by the tip of an inserted needle electrode is the superposition of the individual electrical contributions of the different motor units that are active, during a muscle contraction, and background interference.
This thesis addresses the process of EMG signal decomposition by developing an interactive classification system, which uses multiple classifier fusion techniques in order to achieve improved classification performance. The developed system combines heterogeneous sets of base classifier ensembles of different kinds and employs either a one level classifier fusion scheme or a hybrid classifier fusion approach.
The hybrid classifier fusion approach is applied as a two-stage combination process that uses a new aggregator module which consists of two combiners: the first at the abstract level of classifier fusion and the other at the measurement level of classifier fusion such that it uses both combiners in a complementary manner. Both combiners may be either data independent or the first combiner data independent and the second data dependent. For the purpose of experimentation, we used as first combiner the majority voting scheme, while we used as the second combiner one of the fixed combination rules behaving as a data independent combiner or the fuzzy integral with the lambda-fuzzy measure as an implicit data dependent combiner.
Once the set of motor unit potential trains are generated by the classifier fusion system, the firing pattern consistency statistics for each train are calculated to detect classification errors in an adaptive fashion. This firing pattern analysis allows the algorithm to modify the threshold of assertion required for assignment of a motor unit potential classification individually for each train based on an expectation of erroneous assignments.
The classifier ensembles consist of a set of different versions of the Certainty classifier, a set of classifiers based on the nearest neighbour decision rule: the fuzzy k-NN and the adaptive fuzzy k-NN classifiers, and a set of classifiers that use a correlation measure as an estimation of the degree of similarity between a pattern and a class template: the matched template filter classifiers and its adaptive counterpart. The base classifiers, besides being of different kinds, utilize different types of features and their performances were investigated using both real and simulated EMG signals of different complexities. The feature sets extracted include time-domain data, first- and second-order discrete derivative data, and wavelet-domain data.
Following the so-called overproduce and choose strategy to classifier ensemble combination, the developed system allows the construction of a large set of candidate base classifiers and then chooses, from the base classifiers pool, subsets of specified number of classifiers to form candidate classifier ensembles. The system then selects the classifier ensemble having the maximum degree of agreement by exploiting a diversity measure for designing classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between the base classifier outputs, i. e. , to measure the degree of decision similarity between the base classifiers. This mechanism of choosing the team's classifiers based on assessing the classifier agreement throughout all the trains and the unassigned category is applied during the one level classifier fusion scheme and the first combiner in the hybrid classifier fusion approach. For the second combiner in the hybrid classifier fusion approach, we choose team classifiers also based on kappa statistics but by assessing the classifiers agreement only across the unassigned category and choose those base classifiers having the minimum agreement.
Performance of the developed classifier fusion system, in both of its variants, i. e. , the one level scheme and the hybrid approach was evaluated using synthetic simulated signals of known properties and real signals and then compared it with the performance of the constituent base classifiers. Across the EMG signal data sets used, the hybrid approach had better average classification performance overall, specially in terms of reducing the number of classification errors.
APA, Harvard, Vancouver, ISO, and other styles
15

Wilgenbus, Erich Feodor. "The file fragment classification problem : a combined neural network and linear programming discriminant model approach / Erich Feodor Wilgenbus." Thesis, North-West University, 2013. http://hdl.handle.net/10394/10215.

Full text
Abstract:
The increased use of digital media to store legal, as well as illegal data, has created the need for specialized tools that can monitor, control and even recover this data. An important task in computer forensics and security is to identify the true le type to which a computer le or computer le fragment belongs. File type identi cation is traditionally done by means of metadata, such as le extensions and le header and footer signatures. As a result, traditional metadata-based le object type identi cation techniques work well in cases where the required metadata is available and unaltered. However, traditional approaches are not reliable when the integrity of metadata is not guaranteed or metadata is unavailable. As an alternative, any pattern in the content of a le object can be used to determine the associated le type. This is called content-based le object type identi cation. Supervised learning techniques can be used to infer a le object type classi er by exploiting some unique pattern that underlies a le type's common le structure. This study builds on existing literature regarding the use of supervised learning techniques for content-based le object type identi cation, and explores the combined use of multilayer perceptron neural network classi ers and linear programming-based discriminant classi ers as a solution to the multiple class le fragment type identi cation problem. The purpose of this study was to investigate and compare the use of a single multilayer perceptron neural network classi er, a single linear programming-based discriminant classi- er and a combined ensemble of these classi ers in the eld of le type identi cation. The ability of each individual classi er and the ensemble of these classi ers to accurately predict the le type to which a le fragment belongs were tested empirically. The study found that both a multilayer perceptron neural network and a linear programming- based discriminant classi er (used in a round robin) seemed to perform well in solving the multiple class le fragment type identi cation problem. The results of combining multilayer perceptron neural network classi ers and linear programming-based discriminant classi ers in an ensemble were not better than those of the single optimized classi ers.
MSc (Computer Science), North-West University, Potchefstroom Campus, 2013
APA, Harvard, Vancouver, ISO, and other styles
16

Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.

Full text
Abstract:
Face à cette évolution technologique vertigineuse, l’utilisation des dispositifs de l'Internet des Objets (IdO), les capteurs, et les réseaux sociaux, d'énormes flux de données IdO sont générées quotidiennement de différentes applications pourront être transformées en connaissances à travers l’apprentissage automatique. En pratique, de multiples problèmes se posent afin d’extraire des connaissances utiles de ces flux qui doivent être gérés et traités efficacement. Dans ce contexte, cette thèse vise à améliorer les performances (en termes de mémoire et de temps) des algorithmes de l'apprentissage supervisé, principalement la classification à partir de flux de données en évolution. En plus de leur nature infinie, la dimensionnalité élevée et croissante de ces flux données dans certains domaines rendent la tâche de classification plus difficile. La première partie de la thèse étudie l’état de l’art des techniques de classification et de réduction de dimension pour les flux de données, tout en présentant les travaux les plus récents dans ce cadre.La deuxième partie de la thèse détaille nos contributions en classification pour les flux de données. Il s’agit de nouvelles approches basées sur les techniques de réduction de données visant à réduire les ressources de calcul des classificateurs actuels, presque sans perte en précision. Pour traiter les flux de données de haute dimension efficacement, nous incorporons une étape de prétraitement qui consiste à réduire la dimension de chaque donnée (dès son arrivée) de manière incrémentale avant de passer à l’apprentissage. Dans ce contexte, nous présentons plusieurs approches basées sur: Bayesien naïf amélioré par les résumés minimalistes et hashing trick, k-NN qui utilise compressed sensing et UMAP, et l’utilisation d’ensembles d’apprentissage également
With the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods
APA, Harvard, Vancouver, ISO, and other styles
17

Lin, Yu-Chih, and 林育智. "Content-based Image Classification Using Neural Network Ensemble." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/18350031976957526883.

Full text
Abstract:
碩士
輔仁大學
電子工程學系
93
Content-based image retrieval is the research that creates indices of images. Early studies usually extract low-level image features as indices. Image classification is one of various salient approaches to mine semantic information in image and video. This paper presents a classification approach that adopts classified results as indices for the retrieval of images and videos. After video segmentation and key frame extraction, a lot of features are extracted for each shot. These features include color and texture features. A classification framework using backpropagation neural networks as multiple binary classifiers is applied to classify images. 100 of 2000 images and 1029 video shots are selected randomly and used to train the neural networks, and all of the images and video shots are experimented to testify the feasibility of our method. Images are classified into four semantic classes. The best experimental results can achieve high recognition rate at 95.12%, which indicates that our approach can produce high-level indices with high reliability.
APA, Harvard, Vancouver, ISO, and other styles
18

Tseng, Hung-Lin, and 曾鴻麟. "An Ensemble Based Classification Algorithm for Network Intrusion Detection System." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/16771777095571370354.

Full text
Abstract:
碩士
國防大學理工學院
資訊科學碩士班
99
In the environment of changing information security threats, an intrusion detection system (IDS) is an important line of defense. With the continuous progress of information technology, the network speed and throughput are also increasing. There are hundreds of thousands of packets per second in the network. Taking both information security and network quality into account are a very important issue. In recent years, data mining technology becomes very popular and is applied in various fields successfully. Data mining can discover the useful information from a large volume of data. The current research tends to apply data mining technology in constructing the IDSs. However, many challenges still exist to be overcomed in the field of data mining-based IDSs, such as the imbalanced data sets, poor detection rate of the minority class, and low accuracy rate, etc. Therefore, by integrating the data selection, sampling, and feature selection methods, this thesis proposes an “Enhanced Integrated Learning” algorithm and an “EIL-Algorithm Based Ensemble System” to strengthen the classification model and its performance. This thesis uses KDD99 data set as the experiment data source. A series of experiments are conducted to show that the proposed algorithms can enhance the classification performance of the minority class. For U2R attack class, Recall and F-measure are 57.01% and 38.98%, respectively, which shows the classification performance for U2R attack class is effectively improved. Meanwhile, the overall classification performance of anomaly network-based IDS is enhanced.
APA, Harvard, Vancouver, ISO, and other styles
19

Hong, Je-Yi, and 洪哲儀. "Study of Stock Index Trend Using Tree-based Ensemble Classification." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/42855550952005440684.

Full text
Abstract:
碩士
靜宜大學
財務與計算數學系
104
Stock price Index and economic factors interact as both causes and effects.From to the view of investment, the trend prediction of Stock Price Index can be used to reduce the risk of investment.Predicting trends of stock market prices has been an interesting topic for many years.However, due to various subjective and objective factors, forecasting the trend of stock market prices index is a very challenging task. In this study, we treated the prediction of stock market price index as the classification problem.There are many machine learning algorithms can be used for classification including Support Vector Machine, Neural Network and so on.However, very few models are not plausible to understand how they work in practical.We applied Tree methods to take advantage of model interpretation and still keep acceptable prediction power.Comparing with traditional tree methods, random forest increases the difficulty of model interpretation.Therefore, we studied multiple trees structure constructed by real data to find meaningful predicting variables and the procedure to find model interpretable with financial meaning. We created new variables base on the distribution of cut-off values constructed from multiple trees and adjusted by known financial facts.For predicting 2013 Taiwan stock values index, we found that DPO is a highly impact factor.And we applied clustering methods in multiple trees model to identify the forest with small amounts of trees which has competitive prediction accuracy comparing with random forest.
APA, Harvard, Vancouver, ISO, and other styles
20

Zhong, Rui-Jia, and 鍾瑞嘉. "An ensemble-based sentiment classification framework for word of mouth." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/66823216198463228553.

Full text
Abstract:
碩士
中原大學
資訊管理研究所
105
There is no absolute selection method and logic to choose which machine learning approaches and sentiment lexicons are the best of data mining for data analysis. Ensemble learning is generally thought that it can improve the accuracy of the experiment’s analysis and prediction through combining multiple different single classifiers. Thus there are more and more applications in the fields of prediction and classification techniques in order to provide more basis to professional people when they are solving problems for medicine, sentiment analysis, weather forecast etc. In the study, we take four WOMs (IMDB, Hotels.com, TripAdvisor and Amazon) which are crawled on the internet as datasets for experiments in this paper. We will focus on the methods which are Stacking, Bagging and Boosting and how to improve the results’ accuracy of the prediction. As the results, the classification structure can help the same type dataset in the later experiments. First, by use of the framework, we can reduce the experiment time and do not need to use the all of the combination. Second, we use five kinds of sentiment lexicons, it shows that the single sentiment lexicon can not express the real sentiment of the different dataset. Therefore, it is better to use the multiple sentiment lexicons than using the single one for all domains.
APA, Harvard, Vancouver, ISO, and other styles
21

Liu, Chih-Kun, and 劉致坤. "Study on Classification Problem Using Ensemble Based on Feature Selection Approach." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/40872825341241493315.

Full text
Abstract:
碩士
華梵大學
資訊管理學系碩士班
99
During, the last decade, there are two issues that significantly affect the generalization ability of machine learning-based classifiers: the issue of feature subset selection and ensemble of the classifiers. The purpose of this study is the ensemble of classifiers in the Bagging algorithm, an ensemble of classifiers is a collection of several classifiers whose individual decisions are combined by voting or weight voting based on estimated prediction accuracy, the UCI data sets conduct experimental and evaluates the performance of the classification accuracy. This study seeks to develop an ensemble of the classifiers based on genetic algorithm wrapper feature selection for BPN, DT and SVM. In this experiment, first the ensemble algorithm is better than the single classifier and feature selection classification, second, the multiple base classifiers of ensemble is better than single base classifier of ensemble.
APA, Harvard, Vancouver, ISO, and other styles
22

Tasi, Wei-Lan, and 蔡維倫. "Improve the Classification Performance for Decision Tree by Population-based Approaches with Ensemble." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/90102780678990829408.

Full text
Abstract:
碩士
華梵大學
資訊管理學系碩士班
97
Data mining techniques have been widely used in prediction or classification problems. The decision trees algorithm (DT) that can provides rule-based tree structure is one of the most popular among them and can be applied to various areas. Nevertheless, different problems may require different parameters when applying DT to build the model and the parameter settings will influence classification result. On the other hand, a dataset may contain many features; however, not all features are beneficial for the model. If the feature selection did not perform may increasing cost and reduce DT learning ability. Therefore, scatter search (SS), genetic algorithm (GA) and particle swarm optimization (PSO) are proposed to select the beneficial subset of features and to obtain the better parameters which will result in a better classifications. The above three meta-heuristic algorithms mentioned above all have their its own strength and weakness. If these algorithms can work together, it is expected that the better results can be obtained. This is so called ensemble. This paper is proposed the ensemble to further enhance the prediction or classification accuracy rate. In order to evaluate the proposed approaches, datasets in UCI (University of California) are planned to evaluate the performance of the proposed approaches. The proposed three meta-heuristic methods-based DT algorithm can find the best parameters and feature subset when face various problems, and provide the higher classification accuracy rate.
APA, Harvard, Vancouver, ISO, and other styles
23

Sheikh-Nia, Samaneh. "An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization Duration." Thesis, 2012. http://hdl.handle.net/10214/3902.

Full text
Abstract:
In any health-care system, early identification of individuals who are most at risk of developing an illness is vital, not only to ensure that a patient is provided with the appropriate treatment, but also to avoid the considerable costs associated with unnecessary hospitalization. To achieve this goal there is a need for a breakthrough prediction method that is capable of dealing with a real world medical data which is inherently complex. In this study, we show how standard classification algorithms can be employed collectively to predict the length of stay in a hospital of a patient in the upcoming year, based on their medical history. Multiple classifiers are used to perform the prediction task, since real world medical data is significantly complex making the classification task very challenging. The data is voluminous, consists of wide range of class values some of which with a few instances, and it is highly unbalanced making the classification of minority classes very difficult. We propose two Sequential Ensemble Classification (SEC) schemes, one based on an ensemble of homogeneous classifiers, and a second based on a heterogeneous ensemble of classifiers, in three hierarchical granularity levels. The goal of using this system is to provide increased performance over the standard classifiers. This method is highly beneficial when dealing with complex data which is multi-class and highly unbalanced.
APA, Harvard, Vancouver, ISO, and other styles
24

Yang, Hui-Yu, and 楊蕙宇. "Random Rotboost: An Ensemble Classification Method Based on Rotation Forest and AdaBoost in Random Subsets." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/zdwm2h.

Full text
Abstract:
碩士
國立交通大學
科技管理研究所
107
Rotation forest algorithm has been developed for more than ten years. Many scholars have successively proposed improved versions, and in some fields, there have been application results. However, compared with other algorithms such as random forests, the disadvantage is that the calculation is complicated and time-consuming, the improvement of precision is quite limited as well.   This research work proposes a classifier ensemble method combining rotation forest and AdaBoost, which is called Random Rotboost. In the processing of training data, the data diversity is increased by randomly extracting feature sets, and then combined with rotation forest and AdaBoost to form an ensemble classifier. In the experimental section, this paper conducted experiments on ten data sets, and compared the other four ensemble algorithms as a control group to compare with Random Rotboost. The results show that Random Rotboost can maintain high precision when the execution time is the same or even less than the other algorithms.
APA, Harvard, Vancouver, ISO, and other styles
25

Agostinho, Daniel Andrade Pinho. "Diferential diagnosis of alzheimer's disease based on multimodal imaging data (MRI, PIB and DTI)." Master's thesis, 2019. http://hdl.handle.net/10316/87824.

Full text
Abstract:
Dissertação de Mestrado Integrado em Engenharia Biomédica apresentada à Faculdade de Ciências e Tecnologia
A doença de Alzheimer (DA) é a mais comum forma de demência humana e, de momento, não possui um critério de diagnóstico bem definido, embora, neste contexto, se dê ênfase ao uso de biomarcadores, provenientes de neuroimagens. Técnicas de imagem como a ressonância magnética (RM), tomografia de emissão de positrões e imagem de tensor de difusão são usadas, quer separadamente, quer numa abordagem multimodal, no estudo e na classificação da DA. As abordagens multimodais concentram-se maioritariamente no uso de biomarcadores baseados na ressonância magnética, que são posteriormente combinados com outro tipo de neuroimagem ou dados biológicos.Nesta tese, propomos analisar os efeitos da combinação dos dados provenientes das 3 modalidades de imagem, bem como estudar a informação complementar que é dada através da combinação de cada modalidade. Para alcançar este objetivo começámos por criar classificadores base, um para cada modalidade de imagem, e depois examinamos os efeitos da sua combinação, usando técnicas de ensemble.Os resultados obtidos mostram que a combinação das 3 modalidades de imagem melhora a performance geral dos classificadores base (exatidão-98%, sensibilidade-99%, especificidade-97%), mas não apresentam uma melhoria significativa em relação ao uso da combinação de apenas MRI+PIB (precisão-98%, sensitividade-99%, especificidade -98%) ou MRI+DTI (precisão-97%, sensitividade-94%, especificidade-99%). Mais ainda, a combinação de PIB+DTI (precisão-91%, sensitividade-93%, especificidade-90%) não mostrou qualquer melhoria em relação aos classificadores base, o que sugere uma falta de informação complementar entre estas duas modalidades de imagem.Estas descobertas podem representar benefícios clínicos não apenas para as instituições, reduzindo custos, mas também para o bem-estar do paciente, reduzindo o desconforto causado pelo longo tempo de aquisição de imagens PET e removendo a necessidade de exposição a radiação ionizante.
Alzheimer’s disease (AD) is the most common form of dementia in humans and currently it does not have a defined diagnostic criterion, although, nowadays, it emphasizes the use of neuroimaging biomarkers. Magnetic resonance imaging (MRI), positron emission tomography (PET) and diffusion tensor imaging (DTI) are used, either alone or in multimodal approaches, in the study and classification of AD. Today’s multimodal approaches focus on the use of MRI related biomarkers, as a base, and then combining them with other type of imaging or biological data.Here we propose to analyse the effects of the combination of the data from the three imaging modalities, as well as study the complementary information, provided from the combination of each modality. To achieve this goal, we start by creating base classifiers, one for each different imaging modality, and then examine the combination effects, using ensemble techniques.The results show that the combination of all three imaging modalities improves the general performance of the base classifiers (accuracy-98%, sensitivity-99%, specificity-97%), however, it did not show a significant improvement over the use of the combination of just MRI+PIB (accuracy-98%, sensitivity-99%, specificity-98%) or MRI+DTI (accuracy-97%, sensitivity-94%, specificity-99%). Furthermore, the combination of PIB+DTI (accuracy-91%, sensitivity-93%, specificity-90%) did not show any improvement over the base classifiers, suggesting a lack of complementary information between the two imaging modalities.These findings could represent clinical benefits not only for the institutions, by reducing costs, but also for the patient’s wellbeing by reducing the discomfort caused by the lengthy acquisition time of the PET scans and by removing the need of exposure to ionizing radiation.
Outro - This work was supported by Grants Funded by Fundação para a Ciência e 563 Tecnologia, PAC –286 MEDPERSYST, POCI-01-0145-FEDER-016428, 564 BIGDATIMAGE, CENTRO-01-0145-FEDER-000016 financed by Centro 2020 565 FEDER, COMPETE, FCT UID/4539/2013 – COMPETE, POCI-01-0145-FEDER- 566 007440
APA, Harvard, Vancouver, ISO, and other styles
26

Alves, Ana Sofia Tavares Jordão. "Time series classification for device fingerprinting: internship project at a telecommunications and technology company." Master's thesis, 2021. http://hdl.handle.net/10362/112035.

Full text
Abstract:
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
The telecommunication service providers seek an accurate insight into the devices that are connected within a home network, in order to provide a better in-home experience. In this way, the goal of the internship was to develop a machine learning model for fingerprinting of Amazon devices. This can be translated to a timeseries binary classification problem and assumes an exploration background of understanding the employment of bytes received by the router over time as an indicator of the internet usage to detect the Amazon devices. A feature-based analysis was conducted to make it possible to apply the most common and simple classifiers, which is relevant within a company context. The available data presented some challenges, namely a high imbalance and number of missing values. For this, it was studied several combinations of different techniques to increase the importance of the minority class and to impute the unknown values. In addition, multiple models were trained, whose results were evaluated and compared. The achieved performance of the best model was not considered satisfactory to correctly identify the Amazon devices, which lead to the conclusion that other approaches, algorithms and/or variable(s) need to be considered in a future iteration. The project contributed to a better understanding of the path to take on the identification of the devices and introduced new approaches and reasoning when dealing with similar data as the timeseries in analysis.
APA, Harvard, Vancouver, ISO, and other styles
27

Chao, Wei-Chieh, and 趙偉傑. "Base on RFpS of Ensemble learning in Malware Family Classification." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/4pavv7.

Full text
Abstract:
碩士
淡江大學
資訊管理學系碩士在職專班
105
As we know some fundamental issues of data mining applications are much more critical and severe once it refers to malware analysis, and unfortunately, they are still not well-addressed. In this paper, the proposed a function, as well as uses supervised feature projection for redundant feature reduction and noise filtering. Combining Random Forest with SVM for named RFPS (Random Forest Predicated Svm), Method of reducing feature and fast classification. The results that the learning time about 4.5 times compared with the SVM , predicted speed increases by about 2.5 times ,and the accuracy is about 20% to 98.4%.
APA, Harvard, Vancouver, ISO, and other styles
28

Reichenbach, Jonas. "Credit scoring with advanced analytics: applying machine learning methods for credit risk assessment at the Frankfurter sparkasse." Master's thesis, 2018. http://hdl.handle.net/10362/49557.

Full text
Abstract:
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies Management
The need for controlling and managing credit risk obliges financial institutions to constantly reconsider their credit scoring methods. In the recent years, machine learning has shown improvement over the common traditional methods for the application of credit scoring. Even small improvements in prediction quality are of great interest for the financial institutions. In this thesis classification methods are applied to the credit data of the Frankfurter Sparkasse to score their credits. Since recent research has shown that ensemble methods deliver outstanding prediction quality for credit scoring, the focus of the model investigation and application is set on such methods. Additionally, the typical imbalanced class distribution of credit scoring datasets makes us consider sampling techniques, which compensate the imbalances for the training dataset. We evaluate and compare different types of models and techniques according to defined metrics. Besides delivering a high prediction quality, the model’s outcome should be interpretable as default probabilities. Hence, calibration techniques are considered to improve the interpretation of the model’s scores. We find ensemble methods to deliver better results than the best single model. Specifically, the method of the Random Forest delivers the best performance on the given data set. When compared to the traditional credit scoring methods of the Frankfurter Sparkasse, the Random Forest shows significant improvement when predicting a borrower’s default within a 12-month period. The Logistic Regression is used as a benchmark to validate the performance of the model.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography