Dissertations / Theses on the topic 'Ensemble Based Classification'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 28 dissertations / theses for your research on the topic 'Ensemble Based Classification.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
WANDEKOKEN, E. D. "Support Vector Machine Ensemble Based on Feature and Hyperparameter Variation." Universidade Federal do Espírito Santo, 2011. http://repositorio.ufes.br/handle/10/4234.
Full textClassificadores do tipo máquina de vetores de suporte (SVM) são atualmente considerados uma das técnicas mais poderosas para se resolver problemas de classificação com duas classes. Para aumentar o desempenho alcançado por classificadores SVM individuais, uma abordagem bem estabelecida é usar uma combinação de SVMs, a qual corresponde a um conjunto de classificadores SVMs que são, simultaneamente, individualmente precisos e coletivamente divergentes em suas decisões. Este trabalho propõe uma abordagem para se criar combinações de SVMs, baseada em um processo de três estágios. Inicialmente, são usadas execuções complementares de uma busca baseada em algoritmos genéticos (GEFS), com o objetivo de investigar globalmente o espaço de características para definir um conjunto de subconjuntos de características. Em seguida, para cada um desses subconjuntos de características definidos, uma SVM que usa parâmetros otimizados é construída. Por fim, é empregada uma busca local com o objetivo de selecionar um subconjunto otimizado dessas SVMs, e assim formar a combinação de SVMs que é finalmente produzida. Os experimentos foram realizados num contexto de detecção de defeitos em máquinas industriais. Foram usados 2000 exemplos de sinais de vibração de moto bombas instaladas em plataformas de petróleo. Os experimentos realizados mostram que o método proposto para se criar combinação de SVMs apresentou um desempenho superior em comparação a outras abordagens de classificação bem estabelecidas.
Al-Enezi, Jamal. "Artificial immune systems based committee machine for classification application." Thesis, Brunel University, 2012. http://bura.brunel.ac.uk/handle/2438/6826.
Full textBörthas, Lovisa, and Sjölander Jessica Krange. "Machine Learning Based Prediction and Classification for Uplift Modeling." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-266379.
Full textBehovet av att kunna modellera den verkliga vinsten av riktad marknadsföring har lett till den idag vanligt förekommande metoden inkrementell responsanalys. För att kunna utföra denna typ av metod krävs förekomsten av en existerande testgrupp samt kontrollgrupp och målet är således att beräkna differensen mellan de positiva utfallen i de två grupperna. Sannolikheten för de positiva utfallen för de två grupperna kan effektivt estimeras med statistiska maskininlärningsmetoder. De inkrementella responsanalysmetoderna som undersöks i detta projekt är subtraktion av två modeller, att modellera den inkrementella responsen direkt samt en klassvariabeltransformation. De statistiska maskininlärningsmetoderna som tillämpas är random forests och neurala nätverk samt standardmetoden logistisk regression. Datan är samlad från ett väletablerat detaljhandelsföretag och målet är därmed att undersöka vilken inkrementell responsanalysmetod och maskininlärningsmetod som presterar bäst givet datan i detta projekt. De mest avgörande aspekterna för att få ett bra resultat visade sig vara variabelselektionen och mängden kontrolldata i varje dataset. För att få ett lyckat resultat bör valet av maskininlärningsmetod vara random forests vilken används för att modellera den inkrementella responsen direkt, eller logistisk regression tillsammans med en klassvariabeltransformation. Neurala nätverksmetoder är känsliga för ojämna klassfördelningar och klarar därmed inte av att erhålla stabila modeller med den givna datan. Vidare presterade subtraktion av två modeller dåligt på grund av att var modell tenderade att fokusera för mycket på att modellera klassen i båda dataseten separat, istället för att modellera differensen mellan dem. Slutsatsen är således att en metod som modellerar den inkrementella responsen direkt samt en relativt stor kontrollgrupp är att föredra för att få ett stabilt resultat.
Feng, Wei. "Investigation of training data issues in ensemble classification based on margin concept : application to land cover mapping." Thesis, Bordeaux 3, 2017. http://www.theses.fr/2017BOR30016/document.
Full textClassification has been widely studied in machine learning. Ensemble methods, which build a classification model by integrating multiple component learners, achieve higher performances than a single classifier. The classification accuracy of an ensemble is directly influenced by the quality of the training data used. However, real-world data often suffers from class noise and class imbalance problems. Ensemble margin is a key concept in ensemble learning. It has been applied to both the theoretical analysis and the design of machine learning algorithms. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. This work focuses on exploiting the margin concept to improve the quality of the training set and therefore to increase the classification accuracy of noise sensitive classifiers, and to design effective ensemble classifiers that can handle imbalanced datasets. A novel ensemble margin definition is proposed. It is an unsupervised version of a popular ensemble margin. Indeed, it does not involve the class labels. Mislabeled training data is a challenge to face in order to build a robust classifier whether it is an ensemble or not. To handle the mislabeling problem, we propose an ensemble margin-based class noise identification and elimination method based on an existing margin-based class noise ordering. This method can achieve a high mislabeled instance detection rate while keeping the false detection rate as low as possible. It relies on the margin values of misclassified data, considering four different ensemble margins, including the novel proposed margin. This method is extended to tackle the class noise correction which is a more challenging issue. The instances with low margins are more important than safe samples, which have high margins, for building a reliable classifier. A novel bagging algorithm based on a data importance evaluation function relying again on the ensemble margin is proposed to deal with the class imbalance problem. In our algorithm, the emphasis is placed on the lowest margin samples. This method is evaluated using again four different ensemble margins in addressing the imbalance problem especially on multi-class imbalanced data. In remote sensing, where training data are typically ground-based, mislabeled training data is inevitable. Imbalanced training data is another problem frequently encountered in remote sensing. Both proposed ensemble methods involving the best margin definition for handling these two major training data issues are applied to the mapping of land covers
Alshahrani, Saeed Sultan. "Detection, classification and control of power quality disturbances based on complementary ensemble empirical mode decomposition and artificial neural networks." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/15872.
Full textWang, Xin. "Gaze based weakly supervised localization for image classification : application to visual recognition in a food dataset." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066577/document.
Full textIn this dissertation, we discuss how to use the human gaze data to improve the performance of the weak supervised learning model in image classification. The background of this topic is in the era of rapidly growing information technology. As a consequence, the data to analyze is also growing dramatically. Since the amount of data that can be annotated by the human cannot keep up with the amount of data itself, current well-developed supervised learning approaches may confront bottlenecks in the future. In this context, the use of weak annotations for high-performance learning methods is worthy of study. Specifically, we try to solve the problem from two aspects: One is to propose a more time-saving annotation, human eye-tracking gaze, as an alternative annotation with respect to the traditional time-consuming annotation, e.g. bounding box. The other is to integrate gaze annotation into a weakly supervised learning scheme for image classification. This scheme benefits from the gaze annotation for inferring the regions containing the target object. A useful property of our model is that it only exploits gaze for training, while the test phase is gaze free. This property further reduces the demand of annotations. The two isolated aspects are connected together in our models, which further achieve competitive experimental results
Xia, Junshi. "Multiple classifier systems for the classification of hyperspectral data." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENT047/document.
Full textIn this thesis, we propose several new techniques for the classification of hyperspectral remote sensing images based on multiple classifier system (MCS). Our proposed framework introduces significant innovations with regards to previous approaches in the same field, many of which are mainly based on an individual algorithm. First, we propose to use Rotation Forests with several linear feature extraction and compared them with the traditional ensemble approaches, such as Bagging, Boosting, Random subspace and Random Forest. Second, the integration of the support vector machines (SVM) with Rotation subspace framework for context classification is investigated. SVM and Rotation subspace are two powerful tools for high-dimensional data classification. Therefore, combining them can further improve the classification performance. Third, we extend the work of Rotation Forests by incorporating local feature extraction technique and spatial contextual information with Markov random Field (MRF) to design robust spatial-spectral methods. Finally, we presented a new general framework, Random subspace ensemble, to train series of effective classifiers, including decision trees and extreme learning machine (ELM), with extended multi-attribute profiles (EMAPs) for classifying hyperspectral data. Six RS ensemble methods, including Random subspace with DT (RSDT), Random Forest (RF), Rotation Forest (RoF), Rotation Random Forest (RoRF), RS with ELM (RSELM) and Rotation subspace with ELM (RoELM), are constructed by the multiple base learners. The effectiveness of the proposed techniques is illustrated by comparing with state-of-the-art methods by using real hyperspectral data sets with different contexts
Al-Mter, Yusur. "Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166139.
Full textThames, John Lane. "Advancing cyber security with a semantic path merger packet classification algorithm." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45872.
Full textEkelund, Måns. "Uncertainty Estimation for Deep Learning-based LPI Radar Classification : A Comparative Study of Bayesian Neural Networks and Deep Ensembles." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301653.
Full textTidigare studier har visat att djupa neurala nätverk (DNN) kan klassificera signalmönster för en speciell typ av radar (LPI) som är skapad för att vara svår att identifiera och avlyssna. Traditionella neurala nätverk saknar dock ett naturligt sätt att skatta osäkerhet, vilket skadar deras pålitlighet och förhindrar att de används i säkerhetskritiska miljöer. Osäkerhetsskattning för djupinlärning har därför vuxit och på senare tid blivit ett stort område med två tydliga kategorier, Bayesiansk approximering och ensemblemetoder. LPI radarklassificering är av stort intresse för försvarsindustrin, och tekniken kommer med största sannolikhet att appliceras i säkerhetskritiska miljöer. I denna studie jämför vi Bayesianska neurala nätverk och djupa ensembler för LPI radarklassificering. Resultaten från studien pekar på att en djup ensemble uppnår högre träffsäkerhet än ett Bayesianskt neuralt nätverk och att båda metoderna uppvisar återhållsamhet i sina förutsägelser jämfört med ett traditionellt djupt neuralt nätverk. Vi skattar osäkerhet som entropi och visar att osäkerheten i metodernas slutledningar ökar både på höga brusnivåer och på data som är något förskjuten från den kända datadistributionen. Resultaten visar dock att metodernas osäkerhet inte ökar jämfört med ett vanligt nätverk när de får se tidigare osedda signal mönster. Vi visar också att val av metod kan influeras av tillgängliga resurser, eftersom djupa ensembler kräver mycket minne jämfört med ett traditionellt eller Bayesianskt neuralt nätverk.
Ala'raj, Maher A. "A credit scoring model based on classifiers consensus system approach." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/13669.
Full textDam, Hai Huong Information Technology & Electrical Engineering Australian Defence Force Academy UNSW. "A scalable evolutionary learning classifier system for knowledge discovery in stream data mining." Awarded by:University of New South Wales - Australian Defence Force Academy, 2008. http://handle.unsw.edu.au/1959.4/38865.
Full textLiu, Yongwen. "Cloud services selection based on rough set theory." Thesis, Troyes, 2016. http://www.theses.fr/2016TROY0018/document.
Full textWith the development of the cloud computing technique, users enjoy various benefits that high technology services bring. However, there are more and more cloud service programs emerging. So it is important for users to choose the right cloud service. For cloud service providers, it is also important to improve the cloud services they provide, in order to get more customers and expand the scale of their cloud services.Rough set theory is a good data processing tool to deal with uncertain information. It can mine the hidden knowledge or rules on data sets. The main purpose of this thesis is to apply rough set theory to help cloud users make decision about cloud services. In this work, firstly, a framework using the rough set theory in cloud service selection is proposed, and we give an example using rough set in cloud services selection to illustrate and analyze the feasibility of our approach. Secondly, the proposed cloud services selection approach has been used to evaluate parameters importance based on the users’ preferences. Finally, we perform experiments on large scale dataset to verity the feasibility of our proposal.The performance results can help cloud service users to make the right decision and help cloud service providers to target the improvement about their cloud services
Rasheed, Sarbast. "A Multiclassifier Approach to Motor Unit Potential Classification for EMG Signal Decomposition." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/934.
Full textThis thesis addresses the process of EMG signal decomposition by developing an interactive classification system, which uses multiple classifier fusion techniques in order to achieve improved classification performance. The developed system combines heterogeneous sets of base classifier ensembles of different kinds and employs either a one level classifier fusion scheme or a hybrid classifier fusion approach.
The hybrid classifier fusion approach is applied as a two-stage combination process that uses a new aggregator module which consists of two combiners: the first at the abstract level of classifier fusion and the other at the measurement level of classifier fusion such that it uses both combiners in a complementary manner. Both combiners may be either data independent or the first combiner data independent and the second data dependent. For the purpose of experimentation, we used as first combiner the majority voting scheme, while we used as the second combiner one of the fixed combination rules behaving as a data independent combiner or the fuzzy integral with the lambda-fuzzy measure as an implicit data dependent combiner.
Once the set of motor unit potential trains are generated by the classifier fusion system, the firing pattern consistency statistics for each train are calculated to detect classification errors in an adaptive fashion. This firing pattern analysis allows the algorithm to modify the threshold of assertion required for assignment of a motor unit potential classification individually for each train based on an expectation of erroneous assignments.
The classifier ensembles consist of a set of different versions of the Certainty classifier, a set of classifiers based on the nearest neighbour decision rule: the fuzzy k-NN and the adaptive fuzzy k-NN classifiers, and a set of classifiers that use a correlation measure as an estimation of the degree of similarity between a pattern and a class template: the matched template filter classifiers and its adaptive counterpart. The base classifiers, besides being of different kinds, utilize different types of features and their performances were investigated using both real and simulated EMG signals of different complexities. The feature sets extracted include time-domain data, first- and second-order discrete derivative data, and wavelet-domain data.
Following the so-called overproduce and choose strategy to classifier ensemble combination, the developed system allows the construction of a large set of candidate base classifiers and then chooses, from the base classifiers pool, subsets of specified number of classifiers to form candidate classifier ensembles. The system then selects the classifier ensemble having the maximum degree of agreement by exploiting a diversity measure for designing classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between the base classifier outputs, i. e. , to measure the degree of decision similarity between the base classifiers. This mechanism of choosing the team's classifiers based on assessing the classifier agreement throughout all the trains and the unassigned category is applied during the one level classifier fusion scheme and the first combiner in the hybrid classifier fusion approach. For the second combiner in the hybrid classifier fusion approach, we choose team classifiers also based on kappa statistics but by assessing the classifiers agreement only across the unassigned category and choose those base classifiers having the minimum agreement.
Performance of the developed classifier fusion system, in both of its variants, i. e. , the one level scheme and the hybrid approach was evaluated using synthetic simulated signals of known properties and real signals and then compared it with the performance of the constituent base classifiers. Across the EMG signal data sets used, the hybrid approach had better average classification performance overall, specially in terms of reducing the number of classification errors.
Wilgenbus, Erich Feodor. "The file fragment classification problem : a combined neural network and linear programming discriminant model approach / Erich Feodor Wilgenbus." Thesis, North-West University, 2013. http://hdl.handle.net/10394/10215.
Full textMSc (Computer Science), North-West University, Potchefstroom Campus, 2013
Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.
Full textWith the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods
Lin, Yu-Chih, and 林育智. "Content-based Image Classification Using Neural Network Ensemble." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/18350031976957526883.
Full text輔仁大學
電子工程學系
93
Content-based image retrieval is the research that creates indices of images. Early studies usually extract low-level image features as indices. Image classification is one of various salient approaches to mine semantic information in image and video. This paper presents a classification approach that adopts classified results as indices for the retrieval of images and videos. After video segmentation and key frame extraction, a lot of features are extracted for each shot. These features include color and texture features. A classification framework using backpropagation neural networks as multiple binary classifiers is applied to classify images. 100 of 2000 images and 1029 video shots are selected randomly and used to train the neural networks, and all of the images and video shots are experimented to testify the feasibility of our method. Images are classified into four semantic classes. The best experimental results can achieve high recognition rate at 95.12%, which indicates that our approach can produce high-level indices with high reliability.
Tseng, Hung-Lin, and 曾鴻麟. "An Ensemble Based Classification Algorithm for Network Intrusion Detection System." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/16771777095571370354.
Full text國防大學理工學院
資訊科學碩士班
99
In the environment of changing information security threats, an intrusion detection system (IDS) is an important line of defense. With the continuous progress of information technology, the network speed and throughput are also increasing. There are hundreds of thousands of packets per second in the network. Taking both information security and network quality into account are a very important issue. In recent years, data mining technology becomes very popular and is applied in various fields successfully. Data mining can discover the useful information from a large volume of data. The current research tends to apply data mining technology in constructing the IDSs. However, many challenges still exist to be overcomed in the field of data mining-based IDSs, such as the imbalanced data sets, poor detection rate of the minority class, and low accuracy rate, etc. Therefore, by integrating the data selection, sampling, and feature selection methods, this thesis proposes an “Enhanced Integrated Learning” algorithm and an “EIL-Algorithm Based Ensemble System” to strengthen the classification model and its performance. This thesis uses KDD99 data set as the experiment data source. A series of experiments are conducted to show that the proposed algorithms can enhance the classification performance of the minority class. For U2R attack class, Recall and F-measure are 57.01% and 38.98%, respectively, which shows the classification performance for U2R attack class is effectively improved. Meanwhile, the overall classification performance of anomaly network-based IDS is enhanced.
Hong, Je-Yi, and 洪哲儀. "Study of Stock Index Trend Using Tree-based Ensemble Classification." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/42855550952005440684.
Full text靜宜大學
財務與計算數學系
104
Stock price Index and economic factors interact as both causes and effects.From to the view of investment, the trend prediction of Stock Price Index can be used to reduce the risk of investment.Predicting trends of stock market prices has been an interesting topic for many years.However, due to various subjective and objective factors, forecasting the trend of stock market prices index is a very challenging task. In this study, we treated the prediction of stock market price index as the classification problem.There are many machine learning algorithms can be used for classification including Support Vector Machine, Neural Network and so on.However, very few models are not plausible to understand how they work in practical.We applied Tree methods to take advantage of model interpretation and still keep acceptable prediction power.Comparing with traditional tree methods, random forest increases the difficulty of model interpretation.Therefore, we studied multiple trees structure constructed by real data to find meaningful predicting variables and the procedure to find model interpretable with financial meaning. We created new variables base on the distribution of cut-off values constructed from multiple trees and adjusted by known financial facts.For predicting 2013 Taiwan stock values index, we found that DPO is a highly impact factor.And we applied clustering methods in multiple trees model to identify the forest with small amounts of trees which has competitive prediction accuracy comparing with random forest.
Zhong, Rui-Jia, and 鍾瑞嘉. "An ensemble-based sentiment classification framework for word of mouth." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/66823216198463228553.
Full text中原大學
資訊管理研究所
105
There is no absolute selection method and logic to choose which machine learning approaches and sentiment lexicons are the best of data mining for data analysis. Ensemble learning is generally thought that it can improve the accuracy of the experiment’s analysis and prediction through combining multiple different single classifiers. Thus there are more and more applications in the fields of prediction and classification techniques in order to provide more basis to professional people when they are solving problems for medicine, sentiment analysis, weather forecast etc. In the study, we take four WOMs (IMDB, Hotels.com, TripAdvisor and Amazon) which are crawled on the internet as datasets for experiments in this paper. We will focus on the methods which are Stacking, Bagging and Boosting and how to improve the results’ accuracy of the prediction. As the results, the classification structure can help the same type dataset in the later experiments. First, by use of the framework, we can reduce the experiment time and do not need to use the all of the combination. Second, we use five kinds of sentiment lexicons, it shows that the single sentiment lexicon can not express the real sentiment of the different dataset. Therefore, it is better to use the multiple sentiment lexicons than using the single one for all domains.
Liu, Chih-Kun, and 劉致坤. "Study on Classification Problem Using Ensemble Based on Feature Selection Approach." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/40872825341241493315.
Full text華梵大學
資訊管理學系碩士班
99
During, the last decade, there are two issues that significantly affect the generalization ability of machine learning-based classifiers: the issue of feature subset selection and ensemble of the classifiers. The purpose of this study is the ensemble of classifiers in the Bagging algorithm, an ensemble of classifiers is a collection of several classifiers whose individual decisions are combined by voting or weight voting based on estimated prediction accuracy, the UCI data sets conduct experimental and evaluates the performance of the classification accuracy. This study seeks to develop an ensemble of the classifiers based on genetic algorithm wrapper feature selection for BPN, DT and SVM. In this experiment, first the ensemble algorithm is better than the single classifier and feature selection classification, second, the multiple base classifiers of ensemble is better than single base classifier of ensemble.
Tasi, Wei-Lan, and 蔡維倫. "Improve the Classification Performance for Decision Tree by Population-based Approaches with Ensemble." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/90102780678990829408.
Full text華梵大學
資訊管理學系碩士班
97
Data mining techniques have been widely used in prediction or classification problems. The decision trees algorithm (DT) that can provides rule-based tree structure is one of the most popular among them and can be applied to various areas. Nevertheless, different problems may require different parameters when applying DT to build the model and the parameter settings will influence classification result. On the other hand, a dataset may contain many features; however, not all features are beneficial for the model. If the feature selection did not perform may increasing cost and reduce DT learning ability. Therefore, scatter search (SS), genetic algorithm (GA) and particle swarm optimization (PSO) are proposed to select the beneficial subset of features and to obtain the better parameters which will result in a better classifications. The above three meta-heuristic algorithms mentioned above all have their its own strength and weakness. If these algorithms can work together, it is expected that the better results can be obtained. This is so called ensemble. This paper is proposed the ensemble to further enhance the prediction or classification accuracy rate. In order to evaluate the proposed approaches, datasets in UCI (University of California) are planned to evaluate the performance of the proposed approaches. The proposed three meta-heuristic methods-based DT algorithm can find the best parameters and feature subset when face various problems, and provide the higher classification accuracy rate.
Sheikh-Nia, Samaneh. "An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization Duration." Thesis, 2012. http://hdl.handle.net/10214/3902.
Full textYang, Hui-Yu, and 楊蕙宇. "Random Rotboost: An Ensemble Classification Method Based on Rotation Forest and AdaBoost in Random Subsets." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/zdwm2h.
Full text國立交通大學
科技管理研究所
107
Rotation forest algorithm has been developed for more than ten years. Many scholars have successively proposed improved versions, and in some fields, there have been application results. However, compared with other algorithms such as random forests, the disadvantage is that the calculation is complicated and time-consuming, the improvement of precision is quite limited as well. This research work proposes a classifier ensemble method combining rotation forest and AdaBoost, which is called Random Rotboost. In the processing of training data, the data diversity is increased by randomly extracting feature sets, and then combined with rotation forest and AdaBoost to form an ensemble classifier. In the experimental section, this paper conducted experiments on ten data sets, and compared the other four ensemble algorithms as a control group to compare with Random Rotboost. The results show that Random Rotboost can maintain high precision when the execution time is the same or even less than the other algorithms.
Agostinho, Daniel Andrade Pinho. "Diferential diagnosis of alzheimer's disease based on multimodal imaging data (MRI, PIB and DTI)." Master's thesis, 2019. http://hdl.handle.net/10316/87824.
Full textA doença de Alzheimer (DA) é a mais comum forma de demência humana e, de momento, não possui um critério de diagnóstico bem definido, embora, neste contexto, se dê ênfase ao uso de biomarcadores, provenientes de neuroimagens. Técnicas de imagem como a ressonância magnética (RM), tomografia de emissão de positrões e imagem de tensor de difusão são usadas, quer separadamente, quer numa abordagem multimodal, no estudo e na classificação da DA. As abordagens multimodais concentram-se maioritariamente no uso de biomarcadores baseados na ressonância magnética, que são posteriormente combinados com outro tipo de neuroimagem ou dados biológicos.Nesta tese, propomos analisar os efeitos da combinação dos dados provenientes das 3 modalidades de imagem, bem como estudar a informação complementar que é dada através da combinação de cada modalidade. Para alcançar este objetivo começámos por criar classificadores base, um para cada modalidade de imagem, e depois examinamos os efeitos da sua combinação, usando técnicas de ensemble.Os resultados obtidos mostram que a combinação das 3 modalidades de imagem melhora a performance geral dos classificadores base (exatidão-98%, sensibilidade-99%, especificidade-97%), mas não apresentam uma melhoria significativa em relação ao uso da combinação de apenas MRI+PIB (precisão-98%, sensitividade-99%, especificidade -98%) ou MRI+DTI (precisão-97%, sensitividade-94%, especificidade-99%). Mais ainda, a combinação de PIB+DTI (precisão-91%, sensitividade-93%, especificidade-90%) não mostrou qualquer melhoria em relação aos classificadores base, o que sugere uma falta de informação complementar entre estas duas modalidades de imagem.Estas descobertas podem representar benefícios clínicos não apenas para as instituições, reduzindo custos, mas também para o bem-estar do paciente, reduzindo o desconforto causado pelo longo tempo de aquisição de imagens PET e removendo a necessidade de exposição a radiação ionizante.
Alzheimer’s disease (AD) is the most common form of dementia in humans and currently it does not have a defined diagnostic criterion, although, nowadays, it emphasizes the use of neuroimaging biomarkers. Magnetic resonance imaging (MRI), positron emission tomography (PET) and diffusion tensor imaging (DTI) are used, either alone or in multimodal approaches, in the study and classification of AD. Today’s multimodal approaches focus on the use of MRI related biomarkers, as a base, and then combining them with other type of imaging or biological data.Here we propose to analyse the effects of the combination of the data from the three imaging modalities, as well as study the complementary information, provided from the combination of each modality. To achieve this goal, we start by creating base classifiers, one for each different imaging modality, and then examine the combination effects, using ensemble techniques.The results show that the combination of all three imaging modalities improves the general performance of the base classifiers (accuracy-98%, sensitivity-99%, specificity-97%), however, it did not show a significant improvement over the use of the combination of just MRI+PIB (accuracy-98%, sensitivity-99%, specificity-98%) or MRI+DTI (accuracy-97%, sensitivity-94%, specificity-99%). Furthermore, the combination of PIB+DTI (accuracy-91%, sensitivity-93%, specificity-90%) did not show any improvement over the base classifiers, suggesting a lack of complementary information between the two imaging modalities.These findings could represent clinical benefits not only for the institutions, by reducing costs, but also for the patient’s wellbeing by reducing the discomfort caused by the lengthy acquisition time of the PET scans and by removing the need of exposure to ionizing radiation.
Outro - This work was supported by Grants Funded by Fundação para a Ciência e 563 Tecnologia, PAC –286 MEDPERSYST, POCI-01-0145-FEDER-016428, 564 BIGDATIMAGE, CENTRO-01-0145-FEDER-000016 financed by Centro 2020 565 FEDER, COMPETE, FCT UID/4539/2013 – COMPETE, POCI-01-0145-FEDER- 566 007440
Alves, Ana Sofia Tavares Jordão. "Time series classification for device fingerprinting: internship project at a telecommunications and technology company." Master's thesis, 2021. http://hdl.handle.net/10362/112035.
Full textThe telecommunication service providers seek an accurate insight into the devices that are connected within a home network, in order to provide a better in-home experience. In this way, the goal of the internship was to develop a machine learning model for fingerprinting of Amazon devices. This can be translated to a timeseries binary classification problem and assumes an exploration background of understanding the employment of bytes received by the router over time as an indicator of the internet usage to detect the Amazon devices. A feature-based analysis was conducted to make it possible to apply the most common and simple classifiers, which is relevant within a company context. The available data presented some challenges, namely a high imbalance and number of missing values. For this, it was studied several combinations of different techniques to increase the importance of the minority class and to impute the unknown values. In addition, multiple models were trained, whose results were evaluated and compared. The achieved performance of the best model was not considered satisfactory to correctly identify the Amazon devices, which lead to the conclusion that other approaches, algorithms and/or variable(s) need to be considered in a future iteration. The project contributed to a better understanding of the path to take on the identification of the devices and introduced new approaches and reasoning when dealing with similar data as the timeseries in analysis.
Chao, Wei-Chieh, and 趙偉傑. "Base on RFpS of Ensemble learning in Malware Family Classification." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/4pavv7.
Full text淡江大學
資訊管理學系碩士在職專班
105
As we know some fundamental issues of data mining applications are much more critical and severe once it refers to malware analysis, and unfortunately, they are still not well-addressed. In this paper, the proposed a function, as well as uses supervised feature projection for redundant feature reduction and noise filtering. Combining Random Forest with SVM for named RFPS (Random Forest Predicated Svm), Method of reducing feature and fast classification. The results that the learning time about 4.5 times compared with the SVM , predicted speed increases by about 2.5 times ,and the accuracy is about 20% to 98.4%.
Reichenbach, Jonas. "Credit scoring with advanced analytics: applying machine learning methods for credit risk assessment at the Frankfurter sparkasse." Master's thesis, 2018. http://hdl.handle.net/10362/49557.
Full textThe need for controlling and managing credit risk obliges financial institutions to constantly reconsider their credit scoring methods. In the recent years, machine learning has shown improvement over the common traditional methods for the application of credit scoring. Even small improvements in prediction quality are of great interest for the financial institutions. In this thesis classification methods are applied to the credit data of the Frankfurter Sparkasse to score their credits. Since recent research has shown that ensemble methods deliver outstanding prediction quality for credit scoring, the focus of the model investigation and application is set on such methods. Additionally, the typical imbalanced class distribution of credit scoring datasets makes us consider sampling techniques, which compensate the imbalances for the training dataset. We evaluate and compare different types of models and techniques according to defined metrics. Besides delivering a high prediction quality, the model’s outcome should be interpretable as default probabilities. Hence, calibration techniques are considered to improve the interpretation of the model’s scores. We find ensemble methods to deliver better results than the best single model. Specifically, the method of the Random Forest delivers the best performance on the given data set. When compared to the traditional credit scoring methods of the Frankfurter Sparkasse, the Random Forest shows significant improvement when predicting a borrower’s default within a 12-month period. The Logistic Regression is used as a benchmark to validate the performance of the model.