Thèses sur le sujet « Cost-sensitive classification »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les 19 meilleures thèses pour votre recherche sur le sujet « Cost-sensitive classification ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.
Dachraoui, Asma. « Cost-Sensitive Early classification of Time Series ». Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLA002/document.
Texte intégralEarly classification of time series is becoming increasingly a valuable task for assisting in decision making process in many application domains. In this setting, information can be gained by waiting for more evidences to arrive, thus helping to make better decisions that incur lower misclassification costs, but, meanwhile, the cost associated with delaying the decision generally increases, rendering the decision less attractive. Making early predictions provided that are accurate requires then to solve an optimization problem combining two types of competing costs. This thesis introduces a new general framework for time series early classification problem. Unlike classical approaches that implicitly assume that misclassification errors are cost equally and the cost of delaying the decision is constant over time, we cast the the problem as a costsensitive online decision making problem when delaying the decision is costly. We then propose a new formal criterion, along with two approaches that estimate the optimal decision time for a new incoming yet incomplete time series. In particular, they capture the evolutions of typical complete time series in the training set thanks to a segmentation technique that forms meaningful groups, and leverage these complete information to estimate the costs for all future time steps where data points still missing. These approaches are interesting in two ways: (i) they estimate, online, the earliest time in the future where a minimization of the criterion can be expected. They thus go beyond the classical approaches that myopically decide at each time step whether to make a decision or to postpone the call one more time step, and (ii) they are adaptive, in that the properties of the incoming time series are taken into account to decide when is the optimal time to output a prediction. Results of extensive experiments on synthetic and real data sets show that both approaches successfully meet the behaviors expected from early classification systems
MARQUES, DANIEL DOS SANTOS. « A DECISION TREE LEARNER FOR COST-SENSITIVE BINARY CLASSIFICATION ». PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28239@1.
Texte intégralCONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Problemas de classificação foram amplamente estudados na literatura de aprendizado de máquina, gerando aplicações em diversas áreas. No entanto, em diversos cenários, custos por erro de classificação podem variar bastante, o que motiva o estudo de técnicas de classificação sensível ao custo. Nesse trabalho, discutimos o uso de árvores de decisão para o problema mais geral de Aprendizado Sensível ao Custo do Exemplo (ASCE), onde os custos dos erros de classificação variam com o exemplo. Uma das grandes vantagens das árvores de decisão é que são fáceis de interpretar, o que é uma propriedade altamente desejável em diversas aplicações. Propomos um novo método de seleção de atributos para construir árvores de decisão para o problema ASCE e discutimos como este pode ser implementado de forma eficiente. Por fim, comparamos o nosso método com dois outros algoritmos de árvore de decisão propostos recentemente na literatura, em 3 bases de dados públicas.
Classification problems have been widely studied in the machine learning literature, generating applications in several areas. However, in a number of scenarios, misclassification costs can vary substantially, which motivates the study of Cost-Sensitive Learning techniques. In the present work, we discuss the use of decision trees on the more general Example-Dependent Cost-Sensitive Problem (EDCSP), where misclassification costs vary with each example. One of the main advantages of decision trees is that they are easy to interpret, which is a highly desirable property in a number of applications. We propose a new attribute selection method for constructing decision trees for the EDCSP and discuss how it can be efficiently implemented. Finally, we compare our new method with two other decision tree algorithms recently proposed in the literature, in 3 publicly available datasets.
Bakshi, Arjun. « Methodology For Generating High-Confidence Cost-Sensitive Rules For Classification ». University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1377868085.
Texte intégralKamath, Vidya P. « Enhancing Gene Expression Signatures in Cancer Prediction Models : Understanding and Managing Classification Complexity ». Scholar Commons, 2010. http://scholarcommons.usf.edu/etd/3653.
Texte intégralJulock, Gregory Alan. « The Effectiveness of a Random Forests Model in Detecting Network-Based Buffer Overflow Attacks ». NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/190.
Texte intégralMakki, Sara. « An Efficient Classification Model for Analyzing Skewed Data to Detect Frauds in the Financial Sector ». Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1339/document.
Texte intégralThere are different types of risks in financial domain such as, terrorist financing, money laundering, credit card fraudulence and insurance fraudulence that may result in catastrophic consequences for entities such as banks or insurance companies. These financial risks are usually detected using classification algorithms. In classification problems, the skewed distribution of classes also known as class imbalance, is a very common challenge in financial fraud detection, where special data mining approaches are used along with the traditional classification algorithms to tackle this issue. Imbalance class problem occurs when one of the classes have more instances than another class. This problem is more vulnerable when we consider big data context. The datasets that are used to build and train the models contain an extremely small portion of minority group also known as positives in comparison to the majority class known as negatives. In most of the cases, it’s more delicate and crucial to correctly classify the minority group rather than the other group, like fraud detection, disease diagnosis, etc. In these examples, the fraud and the disease are the minority groups and it’s more delicate to detect a fraud record because of its dangerous consequences, than a normal one. These class data proportions make it very difficult to the machine learning classifier to learn the characteristics and patterns of the minority group. These classifiers will be biased towards the majority group because of their many examples in the dataset and will learn to classify them much faster than the other group. After conducting a thorough study to investigate the challenges faced in the class imbalance cases, we found that we still can’t reach an acceptable sensitivity (i.e. good classification of minority group) without a significant decrease of accuracy. This leads to another challenge which is the choice of performance measures used to evaluate models. In these cases, this choice is not straightforward, the accuracy or sensitivity alone are misleading. We use other measures like precision-recall curve or F1 - score to evaluate this trade-off between accuracy and sensitivity. Our objective is to build an imbalanced classification model that considers the extreme class imbalance and the false alarms, in a big data framework. We developed two approaches: A Cost-Sensitive Cosine Similarity K-Nearest Neighbor (CoSKNN) as a single classifier, and a K-modes Imbalance Classification Hybrid Approach (K-MICHA) as an ensemble learning methodology. In CoSKNN, our aim was to tackle the imbalance problem by using cosine similarity as a distance metric and by introducing a cost sensitive score for the classification using the KNN algorithm. We conducted a comparative validation experiment where we prove the effectiveness of CoSKNN in terms of accuracy and fraud detection. On the other hand, the aim of K-MICHA is to cluster similar data points in terms of the classifiers outputs. Then, calculating the fraud probabilities in the obtained clusters in order to use them for detecting frauds of new transactions. This approach can be used to the detection of any type of financial fraud, where labelled data are available. At the end, we applied K-MICHA to a credit card, mobile payment and auto insurance fraud data sets. In all three case studies, we compare K-MICHA with stacking using voting, weighted voting, logistic regression and CART. We also compared with Adaboost and random forest. We prove the efficiency of K-MICHA based on these experiments
Charnay, Clément. « Enhancing supervised learning with complex aggregate features and context sensitivity ». Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD025/document.
Texte intégralIn this thesis, we study model adaptation in supervised learning. Firstly, we adapt existing learning algorithms to the relational representation of data. Secondly, we adapt learned prediction models to context change.In the relational setting, data is modeled by multiples entities linked with relationships. We handle these relationships using complex aggregate features. We propose stochastic optimization heuristics to include complex aggregates in relational decision trees and Random Forests, and assess their predictive performance on real-world datasets.We adapt prediction models to two kinds of context change. Firstly, we propose an algorithm to tune thresholds on pairwise scoring models to adapt to a change of misclassification costs. Secondly, we reframe numerical attributes with affine transformations to adapt to a change of attribute distribution between a learning and a deployment context. Finally, we extend these transformations to complex aggregates
Lo, Hung-Yi, et 駱宏毅. « Cost-Sensitive Multi-Label Classification with Applications ». Thesis, 2013. http://ndltd.ncl.edu.tw/handle/61015886145358618517.
Texte intégral國立臺灣大學
資訊工程學研究所
101
We study a generalization of the traditional multi-label classification, which we refer to as cost-sensitive multi-label classification (CSML). In this problem, the misclassification cost can be different for each instance-label pair. For solving the problem, we propose two novel and general strategies based on the problem transformation technique. The proposed strategies transform the CSML problem to several cost-sensitive single-label classification problems. In addition, we propose a basis expansion model for CSML, which we call the Generalized k-Labelsets Ensemble (GLE). In the basis expansion model, a basis function is a label powerset classifier trained on a random k-labelset. The expansion coefficients are learned by minimizing the cost-weighted global error between the prediction and the ground truth. GLE can also be used for traditional multi-label classification. Experimental results on both multi-label classification and cost-sensitive multi-label classification demonstrate that our method has better performance than other methods. Cost-sensitive classification is based on the assumption that the cost is given according to the application. “Where does cost come from?” is an important practical issue. We study two real-world prediction tasks and link their data distribution to the cost information. The two tasks are medical image classification and social tag prediction. In medical image classification, we observe a patient-imbalanced phenomenon that has seriously hurt the generalization ability of the image classifier. We design several patient-balanced learning algorithms based on cost-sensitive binary classification. The success of our patient-balanced learning methods has been proved by winning KDD Cup 2008. For social tag prediction, we propose to treat the tag counts as the mis-classification costs and model the social tagging problem as a cost-sensitive multi-label classification problem. The experimental results in audio tag annotation and retrieval demonstrate that the CSML approaches outperform our winning method in Music Information Retrieval Evaluation eXchange (MIREX) 2009 in terms of both cost-sensitive and cost-less evaluation metrics. The results on social bookmark prediction also demonstrate that our proposed method has better performance than other methods.
Sun, Yanmin. « Cost-Sensitive Boosting for Classification of Imbalanced Data ». Thesis, 2007. http://hdl.handle.net/10012/3000.
Texte intégralTu, Han-Hsing, et 涂漢興. « Regression approaches for multi-class cost-sensitive classification ». Thesis, 2009. http://ndltd.ncl.edu.tw/handle/79841686006299558588.
Texte intégral國立臺灣大學
資訊工程學研究所
97
Cost-sensitive classification is an important research problem in recent years. It allows machine learning algorithms to use the additional cost information to make more strategic decisions. Studies on binary cost-sensitive classification have led to promising results in theories, algorithms, and applications. The multi-class counterpart is also needed in many real-world applications, but is more difficult to analyze. This thesis focuses on multi-class cost-sensitive classification. Existing methods for multi-class cost-sensitive classification usually transform the cost information into example importance (weight). This thesis offers a different viewpoint of the problem, and proposes a novel method. We directly estimate the cost value corresponding to each prediction using regression, and outputs the label that comes with the smallest estimated cost. We improve the method by analyzing the errors made during the decision. Then, we propose a different regression loss function that tightly connects with the errors. The new loss function leads to a solid theoretical guarantee of error transformation. We design a concrete algorithm for the loss function with the support vector machines. The algorithm can be viewed as a theoretically justified extension the popular one-versus-all support vector machine. Experiments using real-world data sets with arbitrary cost values demonstrate the usefulness of our proposed methods, and validate that the cost information should be appropriately used instead of dropped.
Huang, Kuan-Hao, et 黃冠豪. « Cost-sensitive Label Embedding for Multi-label Classification ». Thesis, 2016. http://ndltd.ncl.edu.tw/handle/05626650270566576330.
Texte intégral國立臺灣大學
資訊工程學研究所
104
Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. Theoretical results justify that CLEMS achieves the cost-sensitivity and extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.
Chu, Hong-Min, et 朱鴻敏. « Dynamic Principal Projectionfor Cost-sensitive Online Multi-label Classification ». Thesis, 2017. http://ndltd.ncl.edu.tw/handle/h8qfu5.
Texte intégral國立臺灣大學
資訊工程學研究所
105
We study multi-label classification (MLC) with three important real-world issues: online updating, label space dimensional reduction (LSDR), and cost-sensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, cost- sensitive dynamic principal projection (CS-DPP) that resolves all three issues. The foundation of CS-DPP is a framework that extends a leading LSDR algorithm to online updating with online principal component analysis (PCA). In particular, CS-DPP investigates the use of matrix stochastic gradient as the on- line PCA solver, and establishes its theoretical backbone when coupled with a carefully-designed online regression learner. In addition, CS-DPP embeds the cost information into label weights to achieve cost-sensitivity along with theoretical guarantees. Practical enhancements of CS-DPP are also studied to improve its efficiency. Experimental results verify that CS-DPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously.
Li, Chun-Liang, et 李俊良. « Condensed Filter Tree For Cost Sensitive Multi-Label Classification ». Thesis, 2013. http://ndltd.ncl.edu.tw/handle/42380891805580530943.
Texte intégral國立臺灣大學
資訊工程學研究所
101
Many real-world applications call for better multi-label classification algorithms in recent years and different applications often need considering different evaluation criteria. We formalize this need with a general setup, cost-sensitive multi-label classification (CSMLC), which takes the evaluation criteria into account during the learning process. Nevertheless, most existed algorithms can only focus on optimizing a few specific evaluation criteria, and cannot systematically deal with different criteria. In this paper, we propose a novel algorithm, called condensed filter tree (CFT), for optimizing any criteria in CSMLC. CFT is derived from reducing CSMLC to the famous filter tree algorithm for cost-sensitive multi- class classification via the simple label powerset approach. We successfully cope with the difficulty of having exponentially many extend-classes within the powerset for representation, training and prediction by carefully designing the tree structure and focusing on the key nodes. Experimental results across many real-world datasets validate that the pro- posed CFT algorithm results in the better performance for many general evaluation criteria when compared with existing special- purpose algorithms.
Chen, Po-Lung, et 陳柏龍. « Active Learning for Multiclass Cost-sensitive Classification Using Probabilistic Models ». Thesis, 2012. http://ndltd.ncl.edu.tw/handle/65244803215661729379.
Texte intégral國立臺灣大學
資訊工程學研究所
100
Multiclass cost-sensitive active learning is a relatively new problem. In this thesis, we derive the maximum expected cost and cost-weighted minimum margin strategy for multiclass cost-sensitive active learning. These two strategies can be seem as the extended version of classical cost-insensitive active learning strategies. The experimental results demonstrate that the derived strategies are promising for cost-sensitive active learning. In particular, the cost-sensitive strategies outperform cost-insensitive ones on many benchmark data sets. The results also reveal how the hardness of data affects the performance of active learning strategies. Thus, in practical active learning applications, data analysis before strategy selection can be important.
Chiu, Hsien-Chun, et 邱顯鈞. « Multi-label Classification with Feature-aware Cost-sensitive Label Embedding ». Thesis, 2017. http://ndltd.ncl.edu.tw/handle/fy6vw4.
Texte intégral國立臺灣大學
資訊工程學研究所
106
Multi-label classification (MLC) is an important learning problem where each instance is annotated with multiple labels. Label embedding (LE) is an important family of methods for MLC that extracts and utilizes the latent structure of labels towards better performance. Within the family, feature- aware LE methods, which jointly consider the feature and label information during extraction, have been shown to reach better performance than feature- unaware ones. Nevertheless, current feature-aware LE methods are not de- signed to flexibly adapt to different evaluation criteria. In this work, we pro- pose a novel feature-aware LE method that takes the desired evaluation cri- terion into account during training. The method, named Feature-aware Cost- sensitive Label Embedding (FaCLE), encodes the criterion into the distance between embedded vectors with a deep Siamese network. The feature-aware characteristic of FaCLE is achieved with a loss function that jointly considers the embedding error and the feature-to-embedding error. Moreover, FaCLE is coupled with an additional-bit trick to deal with the possibly asymmetric criteria. Experiment results across different datasets and evaluation criteria demonstrate that FaCLE is superior to other state-of-the-art feature-aware LE methods and cost-sensitive LE methods.
« Cost-Sensitive Selective Classification and its Applications to Online Fraud Management ». Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.53598.
Texte intégralDissertation/Thesis
Doctoral Dissertation Computer Science 2019
Lo, Kuo-Hsuan, et 羅國宣. « Cost-sensitive Encoding for Label Space Dimension Reduction Algorithms on Multi-label Classification ». Thesis, 2017. http://ndltd.ncl.edu.tw/handle/52429303095910750546.
Texte intégral國立臺灣大學
資訊網路與多媒體研究所
105
In the multi-label classification problem (MLC) , the goal is to classify each instance into multiple classes simultaneously. Different real-world applications often demand different evaluation criteria, and hence algorithms that are capable of taking the criteria into account are preferable. Such algorithms are called cost-sensitive multi-label classification (CSMLC) algorithms. Existing algorithms such as label space dimension reduction (LSDR) are able to solve the MLC problem efficiently, but none of the LSDR algorithms are cost-sensitive. On the other hand, most of the existing CSMLC algorithms suffer from high computational complexity during training or prediction when using general criteria. In this work, we propose a novel algorithm called Cost-Sensitive Encoding for label space Dimension Reduction (CSEDR) that makes existing LSDR algorithms cost-sensitive while keeping their efficiency. Our algorithm embeds cost information into the encoded space, and reduce the computational burden of learning within the encoded space by LSDR. Extensive experiments justify that our algorithm both improves the existing LSDR algorithms and results in better performance or lower label space dimension than state-of-the-art CSMLC algorithms across different evaluating criteria.
Webster, Jennifer B. « Cost-Sensitive Classification Methods for the Detection of Smuggled Nuclear Material in Cargo Containers ». Thesis, 2013. http://hdl.handle.net/1969.1/151104.
Texte intégralParameswaran, Kamalaruban. « Transitions, Losses, and Re-parameterizations : Elements of Prediction Games ». Phd thesis, 2017. http://hdl.handle.net/1885/131341.
Texte intégral