Dissertations / Theses: 'SVM classification'

1

MELONI, RAPHAEL BELO DA SILVA. "REMOTE SENSING IMAGE CLASSIFICATION USING SVM." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2009. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=31439@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
Classificação de imagens é o processo de extração de informação em imagens digitais para reconhecimento de padrões e objetos homogêneos, que em sensoriamento remoto propõe-se a encontrar padrões entre os pixels pertencentes a uma imagem digital e áreas da superfície terrestre, para uma análise posterior por um especialista. Nesta dissertação, utilizamos a metodologia de aprendizado de máquina support vector machines para o problema de classificação de imagens, devido a possibilidade de trabalhar com grande quantidades de características. Construímos classificadores para o problema, utilizando imagens distintas que contém as informações de espaços de cores RGB e HSB, dos valores altimétricos e do canal infravermelho de uma região. Os valores de relevo ou altimétricos contribuíram de forma excelente nos resultados, uma vez que esses valores são características fundamentais de uma região e os mesmos não tinham sido analisados em classificação de imagens de sensoriamento remoto. Destacamos o resultado final, do problema de classificação de imagens, para o problema de identificação de piscinas com vizinhança dois. Os resultados obtidos são 99 por cento de acurácia, 100 por cento de precisão, 93,75 por cento de recall, 96,77 por cento de F-Score e 96,18 por cento de índice Kappa.
Image Classification is an information extraction process in digital images for pattern and homogeneous objects recognition. In remote sensing it aims to find patterns from digital images pixels, covering an area of earth surface, for subsequent analysis by a specialist. In this dissertation, to this images classification problem we employ Support Vector Machines, a machine learning methodology, due the possibility of working with large quantities of features. We built classifiers to the problem using different image information, such as RGB and HSB color spaces, altimetric values and infrared channel of a region. The altimetric values contributed to excellent results, since these values are fundamental characteristics of a region and they were not previously considered in remote sensing images classification. We highlight the final result, for the identifying swimming pools problem, when neighborhood is two. The results have 99 percent accuracy, 100 percent precision, 93.75 percent of recall, 96.77 percent F-Score and 96.18 percent of Kappa index.

APA, Harvard, Vancouver, ISO, and other styles

2

Jiang, Fuhua. "SVM-Based Negative Data Mining to Binary Classification." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/8.

Full text

Abstract:

The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner's result if audit acknowledges learner's result or learner agrees with audit's judgment, otherwise returns the booster's result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm.

APA, Harvard, Vancouver, ISO, and other styles

3

Severini, Jérôme. "Estimation et Classification de Signaux Altimétriques." Thesis, Toulouse, INPT, 2010. http://www.theses.fr/2010INPT0125/document.

Full text

Abstract:

La mesure de la hauteur des océans, des vents de surface (fortement liés aux températures des océans), ou encore de la hauteur des vagues sont un ensemble de paramètres nécessaires à l'étude des océans mais aussi au suivi de leurs évolutions : l'altimétrie spatiale est l'une des disciplines le permettant. Une forme d'onde altimétrique est le résultat de l'émission d'une onde radar haute fréquence sur une surface donnée (classiquement océanique) et de la mesure de la réflexion de cette onde. Il existe actuellement une méthode d'estimation non optimale des formes d'onde altimétriques ainsi que des outils de classifications permettant d'identifier les différents types de surfaces observées. Nous proposons dans cette étude d'appliquer la méthode d'estimation bayésienne aux formes d'onde altimétriques ainsi que de nouvelles approches de classification. Nous proposons enfin la mise en place d'un algorithme spécifique permettant l'étude de la topographie en milieu côtier, étude qui est actuellement très peu développée dans le domaine de l'altimétrie
After having scanned the ocean levels during thirteen years, the french/american satelliteTopex-Poséidon disappeared in 2005. Topex-Poséidon was replaced by Jason-1 in december 2001 and a new satellit Jason-2 is waited for 2008. Several estimation methods have been developed for signals resulting from these satellites. In particular, estimators of the sea height and wave height have shown very good performance when they are applied on waveforms backscattered from ocean surfaces. However, it is a more challenging problem to extract relevant information from signals backscattered from non-oceanic surfaces such as inland waters, deserts or ices. This PhD thesis is divided into two parts : A first direction consists of developing classification methods for altimetric signals in order to recognize the type of surface affected by the radar waveform. In particular, a specific attention will be devoted to support vector machines (SVMs) and functional data analysis for this problem. The second part of this thesis consists of developing estimation algorithms appropriate to altimetric signals obtained after reflexion on non-oceanic surfaces. Bayesian algorithms are currently under investigation for this estimation problem. This PhD is co-supervised by the french society CLS (Collect Localisation Satellite) (seehttp://www.cls.fr/ for more details) which will in particular provide the real altimetric data necessary for this study

APA, Harvard, Vancouver, ISO, and other styles

4

Almasiri, osamah A. "SKIN CANCER DETECTION USING SVM-BASED CLASSIFICATION AND PSO FOR SEGMENTATION." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5489.

Full text

Abstract:

Various techniques are developed for detecting skin cancer. However, the type of maligned skin cancer is still an open problem. The objective of this study is to diagnose melanoma through design and implementation of a computerized image analysis system. The dataset which is used with the proposed system is Hospital Pedro Hispano (PH²). The proposed system begins with preprocessing of images of skin cancer. Then, particle swarm optimization (PSO) is used for detecting the region of interest (ROI). After that, features extraction (geometric, color, and texture) is taken from (ROI). Lastly, features selection and classification are done using a support vector machine (SVM). Results showed that with a data set of 200 images, the sensitivity (SE) and the specificity (SP) reached 100% with a maximum processing time of 0.03 sec.

APA, Harvard, Vancouver, ISO, and other styles

5

Tarasova, Natalya. "Classification of Hate Tweets and Their Reasons using SVM." Thesis, Uppsala universitet, Avdelningen för datalogi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-275782.

Full text

Abstract:

Denna studie fokuserar på att klassificera hat-meddelanden riktade mot mobiloperatörerna Verizon, AT&T and Sprint. Huvudsyftet är att med hjälp av maskininlärningsalgoritmen Support Vector Machines (SVM) klassificera meddelanden i fyra kategorier - Hat, Orsak, Explicit och Övrigt - för att kunna identifiera ett hat-meddelande och dess orsak. Studien resulterade i två metoder: en "naiv" metod (the Naive Method, NM) och en mer "avancerad" metod (the Partial Timeline Method, PTM). NM är en binär metod i den bemärkelsen att den ställer frågan: "Tillhör denna tweet klassen Hat?". PTM ställer samma fråga men till en begränsad mängd av tweets, dvs bara de som ligger inom ± 30 min från publiceringen av hat-tweeten. Sammanfattningsvis indikerade studiens resultat att PTM är noggrannare än NM. Dock tar den inte hänsyn till samtliga tweets på användarens tidslinje. Därför medför valet av metod en avvägning: PTM erbjuder en noggrannare klassificering och NM erbjuder en mer utförlig klassificering.
This study focused on finding the hate tweets posted by the customers of three mobileoperators Verizon, AT&T and Sprint and identifying the reasons for their dissatisfaction. The timelines with a hate tweet were collected and studied for the presence of an explanation. A machine learning approach was employed using four categories: Hate, Reason, Explanatory and Other. The classication was conducted with one-versus-all approach using Support Vector Machines algorithm implemented in a LIBSVM tool. The study resulted in two methodologies: the Naive method (NM) and the Partial Time-line Method (PTM). The Naive Method relied only on the feature space consisting of the most representative words chosen with Akaike Information Criterion. PTM utilized the fact that the majority of the explanations were posted within a one-hour time window of the posting of a hate tweet. We found that the accuracy of PTM is higher than for NM. In addition, PTM saves time and memory by analysing fewer tweets. At the same time this implies a trade-off between relevance and completeness.

Opponent: Kristina Wettainen

APA, Harvard, Vancouver, ISO, and other styles

6

Lekic, Sasa, and Kasper Liu. "Intent classification through conversational interfaces : Classification within a small domain." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-257863.

Full text

Abstract:

Natural language processing and Machine learning are subjects undergoing intense study nowadays. These fields are continually spreading, and are more interrelated than ever before. A case in point is text classification which is an instance of Machine learning(ML) application in Natural Language processing(NLP).Although these subjects have evolved over the recent years, they still have some problems that have to be considered. Some are related to the computing power techniques from these subjects require, whereas the others to how much training data they require.The research problem addressed in this thesis regards lack of knowledge on whether Machine learning techniques such as Word2Vec, Bidirectional encoder representations from transformers (BERT) and Support vector machine(SVM) classifier can be used for text classification, provided only a small training set. Furthermore, it is not known whether these techniques can be run on regular laptops.To solve the research problem, the main purpose of this thesis was to develop two separate conversational interfaces utilizing text classification techniques. These interfaces, provided with user input, can recognise the intent behind it, viz. classify the input sentence within a small set of pre-defined categories. Firstly, a conversational interface utilizing Word2Vec, and SVM classifier was developed. Secondly, an interface utilizing BERT and SVM classifier was developed. The goal of the thesis was to determine whether a small dataset can be used for intent classification and with what accuracy, and if it can be run on regular laptops.The research reported in this thesis followed a standard applied research method. The main purpose was achieved and the two conversational interfaces were developed. Regarding the conversational interface utilizing Word2Vec pre-trained dataset, and SVM classifier, the main results showed that it can be used for intent classification with the accuracy of 60%, and that it can be run on regular computers. Concerning the conversational interface utilizing BERT and SVM Classifier, the results showed that this interface cannot be trained and run on regular laptops. The training ran over 24 hours and then crashed.The results showed that it is possible to make a conversational interface which is able to classify intents provided only a small training set. However, due to the small training set, and consequently low accuracy, this conversational interface is not a suitable option for important tasks, but can be used for some non-critical classification tasks.
Natural language processing och maskininlärning är ämnen som forskas mycket om idag. Dessa områden fortsätter växa och blir allt mer sammanvävda, nu mer än någonsin. Ett område är textklassifikation som är en gren av maskininlärningsapplikationer (ML) inom Natural language processing (NLP).Även om dessa ämnen har utvecklats de senaste åren, finns det fortfarande problem att ha i å tanke. Vissa är relaterade till rå datakraft som krävs för dessa tekniker medans andra problem handlar om mängden data som krävs.Forskningsfrågan i denna avhandling handlar om kunskapsbrist inom maskininlärningtekniker som Word2vec, Bidirectional encoder representations from transformers (BERT) och Support vector machine(SVM) klassificierare kan användas som klassification, givet endast små träningsset. Fortsättningsvis, vet man inte om dessa metoder fungerar på vanliga datorer.För att lösa forskningsproblemet, huvudsyftet för denna avhandling var att utveckla två separata konversationsgränssnitt som använder textklassifikationstekniker. Dessa gränssnitt, give med data, kan känna igen syftet bakom det, med andra ord, klassificera given datamening inom ett litet set av fördefinierade kategorier. Först, utvecklades ett konversationsgränssnitt som använder Word2vec och SVM klassificerare. För det andra, utvecklades ett gränssnitt som använder BERT och SVM klassificerare. Målet med denna avhandling var att avgöra om ett litet dataset kan användas för syftesklassifikation och med vad för träffsäkerhet, och om det kan användas på vanliga datorer.Forskningen i denna avhandling följde en standard tillämpad forskningsmetod. Huvudsyftet uppnåddes och de två konversationsgränssnitten utvecklades. Angående konversationsgränssnittet som använde Word2vec förtränat dataset och SVM klassificerar, visade resultatet att det kan användas för syftesklassifikation till en träffsäkerhet på 60%, och fungerar på vanliga datorer. Angående konversationsgränssnittet som använde BERT och SVM klassificerare, visade resultatet att det inte går att köra det på vanliga datorer. Träningen kördes i över 24 timmar och kraschade efter det.Resultatet visade att det är möjligt att skapa ett konversationsgränssnitt som kan klassificera syften, givet endast ett litet träningsset. Däremot, på grund av det begränsade träningssetet, och konsekvent låg träffsäkerhet, är denna konversationsgränssnitt inte lämplig för viktiga uppgifter, men kan användas för icke kritiska klassifikationsuppdrag.

APA, Harvard, Vancouver, ISO, and other styles

7

LI, YUANXUN. "SVM Object Based Classification Using Dense Satellite Imagery Time Series." Thesis, KTH, Geoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233340.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Liu, Wen. "Incremental Learning and Online-Style SVM for Traffic Light Classification." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-theses/1216.

Full text

Abstract:

Training a large dataset has become a serious issue for researchers because it requires large memories and can take a long time for computing. People are trying to process large scale dataset not only by changing programming model, such as using MapReduce and Hadoop, but also by designing new algorithms that can retain performance with less complexity and runtime. In this thesis, we present implementations of incremental learning and online learning methods to classify a large traffic light dataset for traffic light recognition. The introduction part includes the concepts and related works of incremental learning and online learning. The main algorithm is a modification of IMORL incremental learning model to enhance its performance over the learning process of our application. Then we briefly discuss how the traffic light recognition algorithm works and the problem we encounter during training. Rather than focusing on incremental learning, which uses batch to batch data during training procedure, we introduce Pegasos, an online style primal gradient-based support vector machine method. The performance of Pegasos for classification is extraordinary and the number of instances it uses for training is relatively small. Therefore, Pegasos is the recommended solution to the large dataset training problem.

APA, Harvard, Vancouver, ISO, and other styles

9

Nordström, Jesper. "Automated classification of bibliographic data using SVM and Naive Bayes." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-75167.

Full text

Abstract:

Classification of scientific bibliographic data is an important and increasingly more time-consuming task in a “publish or perish” paradigm where the number of scientific publications is steadily growing. Apart from being a resource-intensive endeavor, manual classification has also been shown to be often performed with a quite high degree of inconsistency. Since many bibliographic databases contain a large number of already classified records supervised machine learning for automated classification might be a solution for handling the increasing volumes of published scientific articles. In this study automated classification of bibliographic data, based on two different machine learning methods; Naive Bayes and Support Vector Machine (SVM), were evaluated. The data used in the study were collected from the Swedish research database SwePub and the features used for training the classifiers were based on abstracts and titles in the bibliographic records. The accuracy achieved ranged between a lowest score of 0.54 and a highest score of 0.84. The classifiers based on Support Vector Machine did consistently receive higher scores than the classifiers based on Naive Bayes. Classification performed at the second level in the hierarchical classification system used clearly resulted in lower scores than classification performed at the first level. Using abstracts as the basis for feature extraction yielded overall better results than using titles, the differences were however very small.

APA, Harvard, Vancouver, ISO, and other styles

10

Shaik, Abdul Ameer Basha. "SVM Classification and Analysis of Margin Distance on Microarray Data." University of Akron / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=akron1302618924.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Terrones, Michael. "A precise robotic arm positioning using an SVM classification algorithm." Diss., Online access via UMI:, 2007.

Find full text

Abstract:

Thesis (M.S.)--State University of New York at Binghamton, Department of Systems Science and Industrial Engineering, Thomas J. Watson School of Engineering and Applied Science, 2007.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

12

Wang, Wenjuan. "Optimization algorithms for SVM classification : Applications to geometrical chromosome analysis." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30111/document.

Full text

Abstract:

Le génome est très organisé au sein du noyau cellulaire. Cette organisation et plus spécifiquement la localisation et la dynamique des gènes et chromosomes contribuent à l'expression génétique et la différenciation des cellules que ce soit dans le cas de pathologies ou non. L'exploration de cette organisation pourrait dans le futur aider à diagnostiquer et identifier de nouvelles cibles thérapeutiques. La conformation des chromosomes peut être analysée grâce au marquage ADN sur plusieurs sites et aux mesures de distances entre ces différents marquages fluorescents. Dans ce contexte, l'organisation spatiale du chromosome III de levure a montré que les deux types de cellules, MATa et MATalpha, sont différents. Par contre, les données issues de l'imagerie electronique sont bruitées à cause de la résolution des systèmes de microscope et du fait du caractère vivant des cellules observées. Dans cette thèse, nous nous intéressons au développement de méthodes de classification pour différencier les types de cellules sur la base de mesures de distances entre 3 loci du chromosome III et d'une estimation du bruit. Dans un premier temps, nous nous intéressons de façon générale aux problèmes de classification binaire à l'aide de SVM de grandes tailles et passons en revue les algorithmes d'optimisation stochastiques du premier ordre. Afin de prendre en compte les incertudes, nous proposons un modèle d'apprentissage qui ajuste sa robustesse en fonction du bruit. La méthode évite les situations où le modèle est trop conservatif et que l'on rencontre parfois avec les formulations SVM robustes. L'amplitude des pertubations liées au bruit qui sont incorporées dans le modèle est controllée par l'optimisation d'une erreur de généralisation. Aucune hypothèse n'est faite sur la distribution de probabilité du bruit. Seule une borne estimée des pertubations est nécessaire. Le problème peut s'écrire sous la forme d'un programme biniveaux de grande taille. Afin de le résoudre, nous proposons un algorithme biniveau qui réalise des déplacements stochastiques très peu coûteux et donc adapté aux problèmes de grandes tailles. La convergence de l'algorithme est prouvée pour une classe générale de problèmes. Nous présentons des résultats numériques très encourageants qui confirment que la technique est meilleure que l'approche SOCP (Second Order Cone Programming) pour plusieurs bases de données publiques. Les expériences numériques montrent également que la nonlinéarité additionnelle générée par l'incertitude sur les données pénalise la classification des chromosomes et motivent des recherches futures sur une version nonlinéaire de la technique proposée. Enfin, nous présentons également des résultats numériques de l'algorithme biniveau stochastique pour la sélection automatique de l'hyperparamètre de pénalité dans les SVM. L'approche évite les coûteux calculs que l'on doit inévitablement réaliser lorsque l'on effectue une validation croisée sur des problèmes de grandes tailles
The genome is highly organized within the cell nucleus. This organization, in particular the localization and dynamics of genes and chromosomes, is known to contribute to gene expression and cell differentiation in normal and pathological contexts. The exploration of this organization may help to diagnose disease and to identify new therapeutic targets. Conformation of chromosomes can be analyzed by distance measurements of distinct fluorescently labeled DNA sites. In this context, the spatial organization of yeast chromosome III was shown to differ between two cell types, MATa and MATa. However, imaging data are subject to noise, due to microscope resolution and the living state of yeast cells. In this thesis, the aim is to develop new classification methods to discriminate two mating types of yeast cells based on distance measurements between three loci on chromosome III aided by estimation the bound of the perturbations. We first address the issue of solving large scale SVM binary classification problems and review state of the art first order optimization stochastic algorithms. To deal with uncertainty, we propose a learning model that adjusts its robustness to noise. The method avoids over conservative situations that can be encountered with worst case robust support vector machine formulations. The magnitude of the noise perturbations that is incorporated in the model is controlled by optimizing a generalization error. No assumption on the distribution of noise is taken. Only rough estimates of perturbations bounds are required. The resulting problem is a large scale bi-level program. To solve it, we propose a bi-level algorithm that performs very cheap stochastic gradient moves and is therefore well suited to large datasets. The convergence is proven for a class of general problems. We present encouraging experimental results confirming that the technique outperforms robust second order cone programming formulations on public datasets. The experiments also show that the extra nonlinearity generated by the uncertainty in the data penalizes the classification of chromosome data and advocates for further research on nonlinear robust models. Additionally, we provide the experimenting results of the bilevel stochastic algorithm used to perform automatic selection of the penalty parameter in linear and non-linear support vector machines. This approach avoids expensive computations that usually arise in k-fold cross validation

APA, Harvard, Vancouver, ISO, and other styles

13

Yao, Xiaojun. "Méthodes Non-linéaires (ANNs, SVMs) : applications à la Classification et à la Corrélation des Propriétés Physicochimiques et Biologiques." Paris 7, 2004. http://www.theses.fr/2004PA077182.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Hess, Eric. "Ramp Loss SVM with L1-Norm Regularizaion." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3538.

Full text

Abstract:

The Support Vector Machine (SVM) classification method has recently gained popularity due to the ease of implementing non-linear separating surfaces. SVM is an optimization problem with the two competing goals, minimizing misclassification on training data and maximizing a margin defined by the normal vector of a learned separating surface. We develop and implement new SVM models based on previously conceived SVM with L_1-Norm regularization with ramp loss error terms. The goal being a new SVM model that is both robust to outliers due to ramp loss, while also easy to implement in open source and off the shelf mathematical programming solvers and relatively efficient in finding solutions due to the mixed linear-integer form of the model. To show the effectiveness of the models we compare results of ramp loss SVM with L_1-Norm and L_2-Norm regularization on human organ microbial data and simulated data sets with outliers.

APA, Harvard, Vancouver, ISO, and other styles

15

LEITE, VANESSA RODRIGUES COELHO. "AN ANALYSIS OF LITHOLOGY CLASSIFICATION USING SVM, MLP AND ENSEMBLE METHODS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=21205@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
A classificação de litologias e uma tarefa importante na caracterização de reservatorios de petróleo. Um de seus principais objetivos e dar suporte ao planejamento e as atividades de perfuracao de poços. Dessa forma, quanto mais rapidos e eficazes sejam os algoritmos de classificacao, mais confiavel ser a as decisoes tomadas pelos geologos e geofısicos. Esta dissertação analisa os metodos ensemble aplicados a classificacao automática de litologias. Para isso, foi realizada uma comparação entre classificadores individuais (Support Vector Machine e Multilayer Perceptron) e estes mesmos classificadores com métodos Ensemble (Bagging e Adaboost). Assim, concluımos com uma avaliação comparativa entre as técnicas, bem como apresentamos o trade-off em utilizar métodos Ensemble em substituição aos classificadores individuais.
Lithology classification is an important task in oil reservoir characterization, one of its major purposes is to support well planning and drilling activities. Therefore, faster and more effective classification algorithms will increase the speed and reliability of decisions made by geologists and geophysicists. This work analises ensemble methods applied to automatic lithology classification. For this, we performed a comparison between single classifiers (Support Vector Machine and Multilayer Perceptron) and these classifiers with ensemble methods (Bagging and Boost). Thus, we conclude with a comparative evaluation of techniques and present the trade-off in using Ensemble methods to replace single classifiers.

APA, Harvard, Vancouver, ISO, and other styles

16

Gidudu, Anthony. "Land cover mapping through optimizing remote sensing data for SVM classification." Doctoral thesis, University of Cape Town, 2006. http://hdl.handle.net/11427/5599.

Full text

Abstract:

Includes bibliographical references (leaves 123-129)
Support Vector Machines (SVMs) are a new supervised classification technique that has its roots in statistical learning theory. It has gained popularity in fields such as machine vision, artificial intelligence, digital image processing and more recently remote sensing. The three commonly used SVMs include linear, polynomial and radial basis function (i.e. Gaussian) classifiers.

APA, Harvard, Vancouver, ISO, and other styles

17

Johnson, Kurt Eugene. "A NEW CENTROID BASED ALGORITHM FOR HIGH SPEED BINARY CLASSIFICATION." Miami University / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=miami1102089037.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

CHAVES, ADRIANA DA COSTA FERREIRA. "FUZZY RULES EXTRACTION FROM SUPPORT VECTOR MACHINES (SVM) FOR MULTI-CLASS CLASSIFICATION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2006. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=9191@1.

Full text

Abstract:

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Este trabalho apresenta a proposta de um novo método para a extração de regras fuzzy de máquinas de vetor suporte (SVMs) treinadas para problemas de classificação. SVMs são sistemas de aprendizado baseados na teoria estatística do aprendizado e apresentam boa habilidade de generalização em conjuntos de dados reais. Estes sistemas obtiveram sucesso em vários tipos de problemas. Entretanto, as SVMs, da mesma forma que redes neurais (RN), geram um modelo caixa preta, isto é, um modelo que não explica o processo pelo qual sua saída é obtida. Alguns métodos propostos para reduzir ou eliminar essa limitação já foram desenvolvidos para o caso de classificação binária, embora sejam restritos à extração de regras simbólicas, isto é, contêm funções ou intervalos nos antecedentes das regras. No entanto, a interpretabilidade de regras simbólicas ainda é reduzida. Deste modo, propõe-se, neste trabalho, uma técnica para a extração de regras fuzzy de SVMs treinadas, com o objetivo de aumentar a interpretabilidade do conhecimento gerado. Além disso, o modelo proposto foi desenvolvido para classificação em múltiplas classes, o que ainda não havia sido abordado até agora. As regras fuzzy obtidas são do tipo se x1 pertence ao conjunto fuzzy C1, x2 pertence ao conjunto fuzzy C2,..., xn pertence ao conjunto fuzzy Cn, então o ponto x = (x1,...,xn) é da classe A. Para testar o modelo foram realizados estudos de caso detalhados com quatro bancos de dados: Íris, Wine, Bupa Liver Disorders e Winconsin Breast Cancer. A cobertura das regras resultantes da aplicação desse modelo nos testes realizados mostrou-se muito boa, atingindo 100% no caso da Íris. Após a geração das regras, foi feita uma avaliação das mesmas, usando dois critérios, a abrangência e a acurácia fuzzy. Além dos testes acima mencionados foi comparado o desempenho dos métodos de classificação em múltiplas classes usados no trabalho.
This text proposes a new method for fuzzy rule extraction from support vector machines (SVMs) trained to solve classification problems. SVMs are learning systems based on statistical learning theory and present good ability of generalization in real data base sets. These systems have been successfully applied to a wide variety of application. However SVMs, as well as neural networks, generates a black box model, i.e., a model which does not explain the process used in order to obtain its result. Some considered methods to reduce this limitation already has been proposed for the binary classification case, although they are restricted to symbolic rules extraction, and they have, in their antecedents, functions or intervals. However, the interpretability of the symbolic generated rules is small. Hence, to increase the linguistic interpretability of the generating rules, we propose a new technique for extracting fuzzy rules of a trained SVM. Moreover, the proposed model was developed for classification in multiple classes, which was not introduced till now. Fuzzy rules obtained are presented in the format if x1 belongs to the fuzzy set C1, x2 belongs to the fuzzy set C2 , … , xn belongs to the fuzzy set Cn , then the point x=(x1, x2, …xn) belongs to class A. For testing this new model, we performed detailed researches on four data bases: Iris, Wine, Bupa Liver Disorders and Wisconsin Breast Cancer. The rules´ coverage resultant of the application of this method was quite good, reaching 100% in Iris case. After the rules generation, its evaluation was performed using two criteria: coverage and accuracy. Besides the testing above, the performance of the methods for multi-class SVM described in this work was evaluated.

APA, Harvard, Vancouver, ISO, and other styles

19

Synek, Radovan. "Klasifikace textu pomocí metody SVM." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237229.

Full text

Abstract:

This thesis deals with text mining. It focuses on problems of document classification and related techniques, mainly data preprocessing. Project also introduces the SVM method, which has been chosen for classification, design and testing of implemented application.

APA, Harvard, Vancouver, ISO, and other styles

20

Viau, Claude. "Multispectral Image Analysis for Object Recognition and Classification." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34532.

Full text

Abstract:

Computer and machine vision applications are used in numerous fields to analyze static and dynamic imagery in order to assist or automate some form of decision-making process. Advancements in sensor technologies now make it possible to capture and visualize imagery at various wavelengths (or bands) of the electromagnetic spectrum. Multispectral imaging has countless applications in various field including (but not limited to) security, defense, space, medical, manufacturing and archeology. The development of advanced algorithms to process and extract salient information from the imagery is a critical component of the overall system performance. The fundamental objectives of this research project were to investigate the benefits of combining imagery from the visual and thermal bands of the electromagnetic spectrum to improve the recognition rates and accuracy of commonly found objects in an office setting. The goal was not to find a new way to “fuse” the visual and thermal images together but rather establish a methodology to extract multispectral descriptors in order to improve a machine vision system’s ability to recognize specific classes of objects.A multispectral dataset (visual and thermal) was captured and features from the visual and thermal images were extracted and used to train support vector machine (SVM) classifiers. The SVM’s class prediction ability was evaluated separately on the visual, thermal and multispectral testing datasets. Commonly used performance metrics were applied to assess the sensitivity, specificity and accuracy of each classifier. The research demonstrated that the highest recognition rate was achieved by an expert system (multiple classifiers) that combined the expertise of the visual-only classifier, the thermal-only classifier and the combined visual-thermal classifier.

APA, Harvard, Vancouver, ISO, and other styles

21

Vieux, Rémi. "Extraction de Descripteurs Pertinents et Classiﬁcation pour le Problème de Recherche des Images par le Contenu." Thesis, Bordeaux 1, 2011. http://www.theses.fr/2011BOR14244/document.

Full text

Abstract:

Dans le cadre du projet Européen X-Media, de nombreuses contributions ont été apportées aux problèmes de classification d'image et de recherche d'images par le contenu dans des contextes industriels hétérogènes. Ainsi, après avoir établi un état de l'art des descripteurs d'image les plus courant, nous nous sommes dans un premier temps intéressé a des méthodes globales, c'est à dire basée sur la description totale de l'image par des descripteurs. Puis, nous nous sommes attachés a une analyse plus fine du contenu des images afin d'en extraire des informations locales, sur la présence et la localisation d'objets d'intérêt. Enfin, nous avons proposé une méthode hybride de recherche d'image basée sur le contenu qui s'appuie sur la description locale des régions de l'image afin d'en tirer une signature pouvant être utilisée pour des requêtes globales et locales
The explosive development of affordable, high quality image acquisition deviceshas made available a tremendous amount of digital content. Large industrial companies arein need of efficient methods to exploit this content and transform it into valuable knowledge.This PhD has been accomplished in the context of the X-MEDIA project, a large Europeanproject with two major industrial partners, FIAT for the automotive industry andRolls-Royce plc. for the aircraft industry. The project has been the trigger for research linkedwith strong industrial requirements. Although those user requirements can be very specific,they covered more generic research topics. Hence, we bring several contributions in thegeneral context of Content-Based Image Retrieval (CBIR), Indexing and Classification.In the first part of the manuscript we propose contributions based on the extraction ofglobal image descriptors. We rely on well known descriptors from the literature to proposemodels for the indexing of image databases, and the approximation of a user defined categorisation.Additionally, we propose a new descriptor for a CBIR system which has toprocess a very specific image modality, for which traditional descriptors are irrelevant. Inthe second part of the manuscript, we focus on the task of image classification. Industrialrequirements on this topic go beyond the task of global image classification. We developedtwo methods to localize and classify the local content of images, i.e. image regions, usingsupervised machine learning algorithms (Support Vector Machines). In the last part of themanuscript, we propose a model for Content-Based Image Retrieval based on the constructionof a visual dictionary of image regions. We extensively experiment the model in orderto identify the most influential parameters in the retrieval efficiency

APA, Harvard, Vancouver, ISO, and other styles

22

Severini, Jerome, Corinne Mailhes, and Jean-Yves Tourneret. "Estimation et Classification des Signaux Altimétriques." Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2010. http://tel.archives-ouvertes.fr/tel-00526100.

Full text

Abstract:

La mesure de la hauteur des océans, des vents de surface (fortement liés aux températures des océans), ou encore de la hauteur des vagues sont un ensemble de paramètres nécessaires à l'étude des océans mais aussi au suivi de leurs évolutions : l'altimétrie spatiale est l'une des disciplines le permettant. Une forme d'onde altimétrique est le résultat de l'émission d'une onde radar haute fréquence sur une surface donnée (classiquement océanique) et de la mesure de la réflexion de cette onde. Il existe actuellement une méthode d'estimation non optimale des formes d'onde altimétriques ainsi que des outils de classifications permettant d'identifier les différents types de surfaces observées. Nous proposons dans cette étude d'appliquer la méthode d'estimation bayésienne aux formes d'onde altimétriques ainsi que de nouvelles approches de classification. Nous proposons enfin la mise en place d'un algorithme spécifique permettant l'étude de la topographie en milieu côtier, étude qui est actuellement très peu développée dans le domaine de l'altimétrie.

APA, Harvard, Vancouver, ISO, and other styles

23

Lecomte, Sébastien. "Classification partiellement supervisée par SVM : application à la détection d’événements en surveillance audio." Thesis, Troyes, 2013. http://www.theses.fr/2013TROY0031/document.

Full text

Abstract:

Cette thèse s’intéresse aux méthodes de classification par Machines à Vecteurs de Support (SVM) partiellement supervisées permettant la détection de nouveauté (One-Class SVM). Celles-ci ont été étudiées dans le but de réaliser la détection d’événements audio anormaux pour la surveillance d’infrastructures publiques, en particulier dans les transports. Dans ce contexte, l’hypothèse « ambiance normale » est relativement bien connue (même si les signaux correspondants peuvent être très non stationnaires). En revanche, tout signal « anormal » doit pouvoir être détecté et, si possible, regroupé avec les signaux de même nature. Ainsi, un système de référence s’appuyant sur une modélisation unique de l’ambiance normale est présenté, puis nous proposons d’utiliser plusieurs SVM de type One Class mis en concurrence. La masse de données à traiter a impliqué l’étude de solveurs adaptés à ces problèmes. Les algorithmes devant fonctionner en temps réel, nous avons également investi le terrain de l’algorithmie pour proposer des solveurs capables de démarrer à chaud. Par l’étude de ces solveurs, nous proposons une formulation unifiée des problèmes à une et deux classes, avec et sans biais. Les approches proposées ont été validées sur un ensemble de signaux réels. Par ailleurs, un démonstrateur intégrant la détection d’événements anormaux pour la surveillance de station de métro en temps réel a également été présenté dans le cadre du projet Européen VANAHEIM
This thesis addresses partially supervised Support Vector Machines for novelty detection (One-Class SVM). These have been studied to design abnormal audio events detection for supervision of public infrastructures, in particular public transportation systems. In this context, the null hypothesis (“normal” audio signals) is relatively well known (even though corresponding signals can be notably non stationary). Conversely, every “abnormal” signal should be detected and, if possible, clustered with similar signals. Thus, a reference system based on a single model of normal signals is presented, then we propose to use several concurrent One-Class SVM to cluster new data. Regarding the amount of data to process, special solvers have been studied. The proposed algorithms must be real time. This is the reason why we have also investigated algorithms with warm start capabilities. By the study of these algorithms, we have proposed a unified framework for One Class and Binary SVMs, with and without bias. The proposed approach has been validated on a database of real signals. The whole process applied to the monitoring of a subway station has been presented during the final review of the European Project VANAHEIM

APA, Harvard, Vancouver, ISO, and other styles

24

Wang, Rui. "Comparisons of Classification Methods in Efficiency and Robustness." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1345564802.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Alvarez, Manuela. "Mapping forest habitats in protected areas by integrating LiDAR and SPOT Multispectral Data." Thesis, KTH, Geoinformatik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189199.

Full text

Abstract:

KNAS (Continuous Habitat Mapping of Protected Areas) is a Metria AB project that produces vegetation and habitat mapping in protected areas in Sweden. Vegetation and habitat mapping is challenging due to its heterogeneity, spatial variability and complex vertical and horizontal structure. Traditionally, multispectral data is used due to its ability to give information about horizontal structure of vegetation. LiDAR data contains information about vertical structure of vegetation, and therefore contributes to improve classification accuracy when used together with spectral data. The objectives of this study are to integrate LiDAR and multispectral data for KNAS and to determine the contribution of LiDAR data to the classification accuracy. To achieve these goals, two object-based classification schemes are proposed and compared: a spectral classification scheme and a spectral-LiDAR classification scheme. Spectral data consists of four SPOT-5 bands acquired in 2005 and 2006. Spectral-LiDAR includes the same four spectral bands from SPOT-5 and nine LiDAR-derived layers produced from NH point cloud data from airborne laser scanning acquired in 2011 and 2012 from The Swedish Mapping, Cadastral and Land Registration Authority. Processing of point cloud data includes: filtering, buffer and tiles creation, height normalization and rasterization. Due to the complexity of KNAS production, classification schemes are based on a simplified KNAS workflow and a selection of KNAS forest classes. Classification schemes include: segmentation, database creation, training and validation areas collection, SVM classification and accuracy assessment. Spectral-LiDAR data fusion is performed during segmentation in eCognition. Results from segmentation are used to build a database with segmented objects, and mean values of spectral or spectral-LiDAR data. Databases are used in Matlab to perform SVM classification with cross validation. Cross validation accuracy, overall accuracy, kappa coefficient, producer’s and user’s accuracy are computed. Training and validation areas are common to both classification schemes. Results show an improvement in overall classification accuracy for spectral-LiDAR classification scheme, compared to spectral classification scheme. Improvements of 21.9 %, 11.0 % and 21.1 % are obtained for the study areas of Linköping, Örnsköldsvik and Vilhelmina respectively.

APA, Harvard, Vancouver, ISO, and other styles

26

Mathieu, Bérangère. "Segmentation interactive multiclasse d'images par classification de superpixels et optimisation dans un graphe de facteurs." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30290/document.

Full text

Abstract:

La segmentation est l'un des principaux thèmes du domaine de l'analyse d'images. Segmenter une image consiste à trouver une partition constituée de régions, c'est-à-dire d'ensembles de pixels connexes homogènes selon un critère choisi. L'objectif de la segmentation consiste à obtenir des régions correspondant aux objets ou aux parties des objets qui sont présents dans l'image et dont la nature dépend de l'application visée. Même s'il peut être très fastidieux, un tel découpage de l'image peut être facilement obtenu par un être humain. Il n'en est pas de même quand il s'agit de créer un programme informatique dont l'objectif est de segmenter les images de manière entièrement automatique. La segmentation interactive est une approche semi-automatique où l'utilisateur guide la segmentation d'une image en donnant des indications. Les méthodes qui s'inscrivent dans cette approche se divisent en deux catégories en fonction de ce qui est recherché : les contours ou les régions. Les méthodes qui recherchent des contours permettent d'extraire un unique objet correspondant à une région sans trou. L'utilisateur vient guider la méthode en lui indiquant quelques points sur le contour de l'objet. L'algorithme se charge de relier chacun des points par une courbe qui respecte les caractéristiques de l'image (les pixels de part et d'autre de la courbe sont aussi dissemblables que possible), les indications données par l'utilisateur (la courbe passe par chacun des points désignés) et quelques propriétés intrinsèques (les courbes régulières sont favorisées). Les méthodes qui recherchent les régions groupent les pixels de l'image en des ensembles, de manière à maximiser la similarité en leur sein et la dissemblance entre les différents ensembles. Chaque ensemble correspond à une ou plusieurs composantes connexes et peut contenir des trous. L'utilisateur guide la méthode en traçant des traits de couleur qui désignent quelques pixels appartenant à chacun des ensembles. Si la majorité des méthodes ont été conçues pour extraire un objet principal du fond, les travaux menés durant la dernière décennie ont permis de proposer des méthodes dites multiclasses, capables de produire une partition de l'image en un nombre arbitraire d'ensembles. La contribution principale de ce travail de recherche est la conception d'une nouvelle méthode de segmentation interactive multiclasse par recherche des régions. Elle repose sur la modélisation du problème comme la minimisation d'une fonction de coût pouvant être représentée par un graphe de facteurs. Elle intègre une méthode de classification par apprentissage supervisé assurant l'adéquation entre la segmentation produite et les indications données par l'utilisateur, l'utilisation d'un nouveau terme de régularisation et la réalisation d'un prétraitement consistant à regrouper les pixels en petites régions cohérentes : les superpixels. L'utilisation d'une méthode de sur-segmentation produisant des superpixels est une étape clé de la méthode que nous proposons : elle réduit considérablement la complexité algorithmique et permet de traiter des images contenant plusieurs millions de pixels, tout en garantissant un temps interactif. La seconde contribution de ce travail est une évaluation des algorithmes permettant de grouper les pixels en superpixels, à partir d'un nouvel ensemble de données de référence que nous mettons à disposition et dont la particularité est de contenir des images de tailles différentes : de quelques milliers à plusieurs millions de pixels. Cette étude nous a également permis de concevoir et d'évaluer une nouvelle méthode de production de superpixels
Image segmentation is one of the main research topics in image analysis. It is the task of researching a partition into regions, i.e., into sets of connected pixels, meeting a given uniformity criterion. The goal of image segmentation is to find regions corresponding to the objects or the object parts appearing in the image. The choice of what objects are relevant depends on the application context. Manually locating these objects is a tedious but quite simple task. Designing an automatic algorithm able to achieve the same result is, on the contrary, a difficult problem. Interactive segmentation methods are semi-automatic approaches where a user guide the search of a specific segmentation of an image by giving some indications. There are two kinds of methods : boundary-based and region-based interactive segmentation methods. Boundary-based methods extract a single object corresponding to a unique region without any holes. The user guides the method by selecting some boundary points of the object. The algorithm search for a curve linking all the points given by the user, following the boundary of the object and having some intrinsic properties (regular curves are encouraged). Region-based methods group the pixels of an image into sets, by maximizing the similarity of pixels inside each set and the dissimilarity between pixels belonging to different sets. Each set can be composed of one or several connected components and can contain holes. The user guides the method by drawing colored strokes, giving, for each set, some pixels belonging to it. If the majority of region-based methods extract a single object from the background, some algorithms, proposed during the last decade, are able to solve multi-class interactive segmentation problems, i.e., to extract more than two sets of pixels. The main contribution of this work is the design of a new multi-class interactive segmentation method. This algorithm is based on the minimization of a cost function that can be represented by a factor graph. It integrates a supervised learning classification method checking that the produced segmentation is consistent with the indications given by the user, a new regularization term, and a preprocessing step grouping pixels into small homogeneous regions called superpixels. The use of an over-segmentation method to produce these superpixels is a key step in the proposed interactive segmentation method : it significantly reduces the computational complexity and handles the segmentation of images containing several millions of pixels, by keeping the execution time small enough to ensure comfortable use of the method. The second contribution of our work is an evaluation of over-segmentation algorithms. We provide a new dataset, with images of different sizes with a majority of big images. This review has also allowed us to design a new over-segmentation algorithm and to evaluate it

APA, Harvard, Vancouver, ISO, and other styles

27

Shantilal. "SUPPORT VECTOR MACHINE FOR HIGH THROUGHPUT RODENT SLEEP BEHAVIOR CLASSIFICATION." UKnowledge, 2008. http://uknowledge.uky.edu/gradschool_theses/506.

Full text

Abstract:

This thesis examines the application of a Support Vector Machine (SVM) classifier to automatically detect sleep and quiet wake (rest) behavior in mice from pressure signals on their cage floor. Previous work employed Neural Networks (NN) and Linear Discriminant Analysis (LDA) to successfully detect sleep and wake behaviors in mice. Although the LDA was successful in distinguishing between the sleep and wake behaviors, it has several limitations, which include the need to select a threshold and difficulty separating additional behaviors with subtle differences, such as sleep and rest. The SVM has advantages in that it offers greater degrees of freedom than the LDA for working with complex data sets. In addition, the SVM has direct methods to limit overfitting for the training sets (unlike the NN method). This thesis develops an SVM classifier to characterize the linearly non separable sleep and rest behaviors using a variety of features extracted from the power spectrum, autocorrelation function, and generalized spectrum (autocorrelation of complex spectrum). A genetic algorithm (GA) optimizes the SVM parameters and determines a combination of 5 best features. Experimental results from over 9 hours of data scored by human observation indicate 75% classification accuracy for SVM compared to 68% accuracy for LDA.

APA, Harvard, Vancouver, ISO, and other styles

28

Hersén, Nicklas, and Axel Kennedal. "The Effect of Audio Snippet Locations and Durations on Genre Classification Accuracy Using SVM." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-228841.

Full text

Abstract:

Real world scenarios where machine learning based music genre classification could be applied includes; streaming services, music distribution platforms and automatic tagging of music libraries. Music genre classification is inherently a subjective task; there are no exact boundaries that separate different genres. Machine learning based audio classification attempts to classify audio by comparing feature vectors. Which features to extract from which parts of the audio greatly impact the classification accuracy. This paper investigates whether different audio snippet locations and durations impact the classification accuracy. A number of experiments were run across six genres, four kinds of snippet locations and eight durations. The results show that these parameters do in fact have a significant impact on the accuracy.
Genre-klassificering baserad på maskininlärning har en mängd användningsområden, exempelvis rekommendationer i streaming- och distributionsplattformar och automatisk taggning av musikbibliotek. På grund av att det inte existerar några exakta objektiva definitioner av specifika genrer är denna typ av automatiskklassificering en subjektiv uppgift. Klassificering med hjälp av maskininlärning försöker att klassificera låtar genom att jämföra så kallade feature vectors. De features som används har en stor påverkan på precisionen av klassificeringen. Denna rapport undersöker om det finns något samband mellan startpositionen och längden av utvalda ljudklipp på precisionen. Flera experiment genomfördes på sex olika musikgenrer, fyra olika startpositioner och åtta längder för ljudklipp. Resultaten visar att startpositionen och längden har en signifikant påverkan på klassificeringsprecisionen.

APA, Harvard, Vancouver, ISO, and other styles

29

Rogers, Spencer David. "Support Vector Machines for Classification and Imputation." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3215.

Full text

Abstract:

Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios using data from the 1997 National Labor Survey.

APA, Harvard, Vancouver, ISO, and other styles

30

Jabali, Aghyad, and Husein Abdelkadir Mohammedbrhan. "Tyre sound classification with machine learning." Thesis, Högskolan i Gävle, Datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-36209.

Full text

Abstract:

Having enough data about the usage of tyre types on the road can lead to a better understanding of the consequences of studded tyres on the environment. This paper is focused on training and testing a machine learning model which can be further integrated into a larger system for automation of the data collection process. Different machine learning algorithms, namely CNN, SVM, and Random Forest, were compared in this experiment. The method used in this paper is an empirical method. First, sound data for studded and none-studded tyres was collected from three different locations in the city of Gävle/Sweden. A total of 760 Mel spectrograms from both classes was generated to train and test a well-known CNN model (AlexNet) on MATLAB. Sound features for both classes were extracted using JAudio to train and test models that use SVM and Random Forest classifi-ers on Weka. Unnecessary features were removed one by one from the list of features to improve the performance of the classifiers. The result shows that CNN achieved accuracy of 84%, SVM has the best performance both with and without removing some audio features (i.e 94% and 92%, respectively), while Random Forest has 89 % accuracy. The test data is comprised of 51% of the studded class and 49% of the none-studded class and the result of the SVM model has achieved more than 94 %. Therefore, it can be considered as an acceptable result that can be used in practice.

APA, Harvard, Vancouver, ISO, and other styles

31

Lopez, Marcano Juan L. "Classification of ADHD and non-ADHD Using AR Models and Machine Learning Algorithms." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/73688.

Full text

Abstract:

As of 2016, diagnosis of ADHD in the US is controversial. Diagnosis of ADHD is based on subjective observations, and treatment is usually done through stimulants, which can have negative side-effects in the long term. Evidence shows that the probability of diagnosing a child with ADHD not only depends on the observations of parents, teachers, and behavioral scientists, but also on state-level special education policies. In light of these facts, unbiased, quantitative methods are needed for the diagnosis of ADHD. This problem has been tackled since the 1990s, and has resulted in methods that have not made it past the research stage and methods for which claimed performance could not be reproduced. This work proposes a combination of machine learning algorithms and signal processing techniques applied to EEG data in order to classify subjects with and without ADHD with high accuracy and confidence. More specifically, the K-nearest Neighbor algorithm and Gaussian-Mixture-Model-based Universal Background Models (GMM-UBM), along with autoregressive (AR) model features, are investigated and evaluated for the classification problem at hand. In this effort, classical KNN and GMM-UBM were also modified in order to account for uncertainty in diagnoses. Some of the major findings reported in this work include classification performance as high, if not higher, than those of the highest performing algorithms found in the literature. One of the major findings reported here is that activities that require attention help the discrimination of ADHD and Non-ADHD subjects. Mixing in EEG data from periods of rest or during eyes closed leads to loss of classification performance, to the point of approximating guessing when only resting EEG data is used.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

32

Huss, Jakob. "Cross Site Product Page Classification with Supervised Machine Learning." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189555.

Full text

Abstract:

This work outlines a possible technique for identifying webpages that contain product specifications. Using support vector machines a product web page classifier was constructed and tested with various settings. The final result for this classifier ended up being 0.958 in precision and 0.796 in recall for product pages. The scores imply that the method could be considered a valid technique in real world web classification tasks if additional features and more data were made available.

APA, Harvard, Vancouver, ISO, and other styles

33

Moulis, Armand. "Automatic Detection and Classification of Permanent and Non-Permanent Skin Marks." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138132.

Full text

Abstract:

When forensic examiners try to identify the perpetrator of a felony, they use individual facial marks when comparing the suspect with the perpetrator. Facial marks are often used for identification and they are nowadays found manually. To speed up this process, it is desired to detect interesting facial marks automatically. This master thesis describes a method to automatically detect and separate permanent and non-permanent marks. It uses a fast radial symmetry algorithm as a core element in the mark detector. After candidate skin mark extraction, the false detections are removed depending on their size, shape and number of hair pixels. The classification of the skin marks is done with a support vector machine and the different features are examined. The results show that the facial mark detector has a good recall while the precision is poor. The elimination methods of false detection were analysed as well as the different features for the classifier. One can conclude that the color of facial marks is more relevant than the structure when classifying them into permanent and non-permanent marks.
När forensiker försöker identifiera förövaren till ett brott använder de individuella ansiktsmärken när de jämför den misstänkta med förövaren. Dessa ansiktsmärken identifieras och lokaliseras oftast manuellt idag. För att effektivisera denna process, är det önskvärt att detektera ansiktsmärken automatiskt. I rapporten beskrivs en framtagen metod som möjliggör automatiskt detektion och separation av permanenta och icke-permanenta ansiktsmärken. Metoden som är framtagen använder en snabb radial symmetri algoritm som en huvuddel i detektorn. När kandidater av ansiktsmärken har tagits, elimineras alla falska detektioner utifrån deras storlek, form och hårinnehåll. Utifrån studiens resultat visar sig detektorn ha en god känslighet men dålig precision. Eliminationsmetoderna av falska detektioner analyserades och olika attribut användes till klassificeraren. I rapporten kan det fastställas att färgskiftningar på ansiktsmärkena har en större inverkan än formen när det gäller att sortera dem i permanenta och icke-permanenta märken.

APA, Harvard, Vancouver, ISO, and other styles

34

Al-Insaif, Sadiq. "Shearlet-Based Descriptors and Deep Learning Approaches for Medical Image Classification." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42258.

Full text

Abstract:

In this Ph.D. thesis, we develop eﬀective techniques for medical image classiﬁcation, particularly, for histopathological and magnetic resonance images (MRI). Our techniques are capable of handling the high variability in the content of such images. Handcrafted techniques based on texture analysis are used for the classiﬁcation task. We also use deep learning models but training such models from scratch can be a challenging process, instead, we employ deep features and transfer learning. First, we propose a combined texture-based feature representation that is computed in the complex shearlet domain for histopathological image classiﬁcation. With complex coeﬃcients, we examine both the magnitude and relative phase of shearlets to form the feature space. Our proposed techniques are successful for histopathological image classiﬁcation. Furthermore, we investigate their ability to generalize to MRI datasets that present an additional challenge, namely high dimensionality. An MRI sample consists of a large number of slices. Our proposed shearlet-based feature representation for histopathological images cannot be used without adjustment. Therefore, we consider the 3D shearlet transform given the volumetric nature of MRI data. An advantage of the 3D shearlet transform is that it takes into consideration adjacent slices of MRI data. Secondly, we study the classiﬁcation of histopathological images using pre-trained deep learning models. A pre-trained deep learning model can act as a starting point for datasets with a limited number of samples. Therefore, we used various models either as unsupervised feature extractors, or weight initializers to classify histopathological images. When it comes to MRI samples, ﬁne-tuning a deep learning model is not straightforward. Pre-trained models are trained on RGB images which have a channel size of 3, but an MRI sample has a larger number of slices. Fine-tuning a convolutional neural network (CNN) requires adjusting a model to work with MRI data. We ﬁne-tune pre-trained models and then use them as feature extractors. Thereafter, we demonstrate the eﬀectiveness of ﬁne-tuned deep features with classical machine learning (ML) classiﬁers, namely a support vector machine and a decision tree bagger. Furthermore, instead of using a classical ML classiﬁer for the MRI sample, we built a custom CNN that takes both the 3D shearlet descriptors and deep features as an input. This custom network processes our feature representation end-to-end and then classiﬁes an MRI sample. Our custom CNN is more eﬀective in comparison to a classical ML on a hidden MRI dataset. It is an indication that our CNN model is less susceptible to over-ﬁtting.

APA, Harvard, Vancouver, ISO, and other styles

35

Štechr, Vladislav. "Využití SVM v prostředí finančních trhů." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2016. http://www.nusl.cz/ntk/nusl-241651.

Full text

Abstract:

This thesis deals with use of regression or classification based on support vector machines from machine learning field. SVMs predict values that are used for decisions of automatic trading system. Regression and classification are evaluated for their usability for decision making. Strategy is being then optimized, tested and evaluated on foreign exchange market Forex historic data set. Results are promising. Strategy could be used in combination with other strategy that would confirm decisions for entering and exiting trades.

APA, Harvard, Vancouver, ISO, and other styles

36

Guernine, Taoufik. "Classification hiérarchique floue basée sur le SVM et son application pour la catégorisation des documents." Mémoire, Université de Sherbrooke, 2010. http://savoirs.usherbrooke.ca/handle/11143/4838.

Full text

Abstract:

La croissance exponentielle des moyens de communication durant ces dernières années et en particulier l'Internet a contribué à l'augmentation du volume de données traitées via les réseaux informatiques. Cette croissance a poussé les chercheurs à penser à la meilleure façon de structurer ces données pour faciliter leur accès et leur classification. À ce problème de classification, plusieurs techniques ont été proposées. Dans la pratique, nous constatons deux grandes familles de problèmes de classification, les problèmes binaires et les problèmes multi-classes. Le premier constat ayant attiré notre attention est l'existence du problème de confusion de classes lors de la classification. Ce phénomène rend les résultats ambigus et non interprétables. Le deuxième constat est la difficulté de résoudre ces problèmes par les méthodes existantes surtout dans le cas où les données ne sont pas linéairement séparables. En outre, les méthodes existantes souffrent des problèmes de complexité en temps de calcul et d'espace mémoire. Afin de remédier à ces problèmes, nous proposons une nouvelle méthode de classification qui s'articule autour de trois principaux concepts: la classification hiérarchique, la théorie de la logique floue et la machine à vecteur de support (SVM). À cet égard et vu l'importance accordée au domaine de classification des textes, nous adaptons notre méthode pour faire face au problème de la catégorisation des textes. Nous testons la méthode proposée sur des données numériques et des données textuelles respectivement. Les résultats expérimentaux ont démontré une performance considérable comparativement à certaines méthodes de classification.

APA, Harvard, Vancouver, ISO, and other styles

37

Lin, Tsu-Hui Angel. "Detection of mental task related EEG for brain computer interface implementation (using SVM classification approach)." Master's thesis, University of Cape Town, 2007. http://hdl.handle.net/11427/5182.

Full text

Abstract:

Includes bibliographical references (p. 97-103).
Brain computer interface (BCI) technology provides a method of communication and control for people with severe motor disabilities. This thesis explores the application of a Fast Fourier transform and support vector machine (FFT-SVM) to the problem of mental task detection in EEG-based brain computer interface implementation.

APA, Harvard, Vancouver, ISO, and other styles

38

TU, SHANSHAN. "Case Influence and Model Complexity in Regression and Classification." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563324139376977.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Danielsson, Benjamin. "A Study on Text Classification Methods and Text Features." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159992.

Full text

Abstract:

When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance

APA, Harvard, Vancouver, ISO, and other styles

40

Pouteau, Robin Sylvain. "Étude de la phytogéographie des îles hautes de Polynésie française par classification SVM d'images multi-sources." Polynésie française, 2011. http://www.theses.fr/2011POLF0005.

Full text

Abstract:

La composition floristique des îles hautes de Polynésie Française est caractérisée par une forte hétérogénéité spatiale. Les méthodes existantes de cartographie par télédétection ne sont guère appropriées à une telle complexité. Cette étude vise à les adapter afin de produire des cartes ayant une précision maximale. D’abord, la précision des classifications par SVM (machines à vecteurs de support, un algorithme d’apprentissage automatique prometteur) est comparée à celle de toute une gamme d’autres algorithmes afin de compléter la littérature. Puis, une méthodologie d’échantillonnage in situ adaptée au paradigme des SVM est décrite. Nous distinguons deux modèles d’études qui nécessitent des outils communs mais des méthodologies différentes pour être cartographiés : les espèces dominantes, caractérisées par une réponse spectrale propre, pour lesquelles l’ensemble des images sources disponibles (multi-spectrales, RaDAR à synthèse d’ouverture, descripteurs environnementaux) peuvent être fusionnées. Dans ce cas, nous proposons un schéma de classification sélectif des sources en fonction des caractéristiques discriminantes des espèces ; et (ii) les espèces de sub-canopée ou rares qui ne peuvent pas être directement identifiées par télédétection. Cette fois, les images de télédétection sont utilisées à priori pour produire une cartographie de la canopée à son tour intégrée à un jeu de descripteurs environnementaux par un SVM afin de modéliser la niche écologique des espèces. Ces méthodes peuvent permettre de connaître plus précisément la répartition de tout type de plante dans les paysages de forêts tropicales montagnardes
The floristic composition of French Polynesian high volcanic islands are characterized by a great spatial heterogeneity. The existing remote sensing-based mapping methods are hardly suitable for such a complexity level. This study aims to adapt these methods in order to yield maps with a maximum accuracy. First, SVM (Support Vector Machines, a promising machine learning algorithm) classification accuracy is compared to classification accuracy of a range of other algorithms to complement the literature. Then, a ground data collection methodology that takes account of the SVM paradigm is described. We distinguish two study models requiring the same tools but dissimilar methodologies to be mapped: dominant species with a characteristic spectral response for which all available source images (multispectral, synthetic aperture RaDAR, environmental proxies) can be merged. For this purpose, we define a selective classification scheme that considers the discriminative properties of each species; And (ii) species found in the forest subcanopy or rare species which cannot be remote sensed. In this case, remote sensing data are used a priori to produce a canopy map that is subsequently staked with a set of environmental proxies to be integrated by a SVM in order to model the ecological niche of species. These methods can lead to a more accurate knowledge of plant distribution across montane tropical forest landscapes

APA, Harvard, Vancouver, ISO, and other styles

41

Diddikadi, Abhishek. "Multi Criteria Mapping Based on SVM and Clustering Methods." Master's thesis, Universitätsbibliothek Chemnitz, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-187132.

Full text

Abstract:

There are many more ways to automate the application process like using some commercial software’s that are used in big organizations to scan bills and forms, but this application is only for the static frames or formats. In our application, we are trying to automate the non-static frames as the study certificate we get are from different counties with different universities. Each and every university have there one format of certificates, so we try developing a very new application that can commonly work for all the frames or formats. As we observe many applicants are from same university which have a common format of the certificate, if we implement this type of tools, then we can analyze this sort of certificates in a simple way within very less time. To make this process more accurate we try implementing SVM and Clustering methods. With these methods we can accurately map courses in certificates to ASE study path if not to exclude list. A grade calculation is done for courses which are mapped to an ASE list by separating the data for both labs and courses in it. At the end, we try to award some points, which includes points from ASE related courses, work experience, specialization certificates and German language skills. Finally, these points are provided to the chair to select the applicant for master course ASE.

APA, Harvard, Vancouver, ISO, and other styles

42

Westlinder, Simon. "Video Traffic Classification : A Machine Learning approach with Packet Based Features using Support Vector Machine." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap (from 2013), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-43011.

Full text

Abstract:

Internet traffic classification is an important field which several stakeholders are dependent on for a number of different reasons. Internet Service Providers (ISPs) and network operators benefit from knowing what type of traffic that propagates over their network in order to correctly treat different applications. Today Deep Packet Inspection (DPI) and port based classification are two of the more commonly used methods in order to classify Internet traffic. However, both of these techniques fail when the traffic is encrypted. This study explores a third method, classifying Internet traffic by machine learning in which the classification is realized by looking at Internet traffic flow characteristics instead of actual payloads. Machine learning can solve the inherent limitations that DPI and port based classification suffers from. In this study the Internet traffic is divided into two classes of interest: Video and Other. There exist several machine learning methods for classification, and this study focuses on Support Vector Machine (SVM) to classify traffic. Several traffic characteristics are extracted, such as individual payload sizes and the longest consecutive run of payload packets in the downward direction. Several experiments using different approaches are conducted and the achieved results show that overall accuracies above 90% are achievable.
HITS, 4707

APA, Harvard, Vancouver, ISO, and other styles

43

Morin, Eugène. "Étude de précision et de performance du processus de classification d'images de phytoplancton à l'aide de machines à vecteurs de support." Mémoire, Université de Sherbrooke, 2014. http://hdl.handle.net/11143/5405.

Full text

Abstract:

Ce projet de recherche cible l’étude et l’amélioration de la précision de la classification d’images de phytoplancton et la diminution du temps de traitement moyen requis par image. Deux solutions de classification sont proposées pour atteindre ces objectifs. La première solution vise à effectuer la classification d’images en passant par les phases de prétraitement, de discrimination et de classification, et la deuxième solution utilise uniquement les phases de prétraitement et de classification. En résumé, la phase de prétraitement manipule une image en vue de caractériser l’élément principal (le phytoplancton), la phase de discrimination utilise les arbres décisionnels à intervalles pour éliminer les catégories ayant peu ou pas de similitude avec l’image traitée et finalement, la phase de classification se sert de machines à vecteurs de support (SVM) pour prédire une catégorie d’appartenance à chaque image traitée. À la base, il y a un appareil de capture automatisée d’images qui transmet celles-ci à un classificateur. Selon la vitesse de classification, une portion ou l’ensemble des images générées seront classifiés. Donc, plus le nombre d’échantillons à classifier est grand, meilleure est l’approximation de la population de chaque groupe de phytoplanctons, à un temps donné. Le but étant d’obtenir une analyse qualitative, quantitative et temporelle plus précise de ce micro-organisme. Pour permettre la classification de ce type d’image, un logiciel nommé Biotaxis a été développé. Celui-ci offre à l’utilisateur l’option de choisir parmis les deux solutions de classification proposées ci-haut. Toutes deux débutent par l’entraînement d’un groupe de classification, qui est composé de plusieurs catégories d’image, suivi par des tests de classification, qui sont effectués sur ce groupe pour vérifier la précision de la classification des catégories d’image qui le compose. Pour entraîner et tester le classificateur du logiciel Biotaxis, deux ensembles d’images ont été employés. L’un d’eux sert uniquement à l’entrainement de groupes de classification et le second à tester ces derniers. Les résultats obtenus dans ce projet de recherche ont permis de confirmer la validité des deux solutions proposées. Il fut possible d’atteindre une précision de la classification moyenne de 87 % et plus avec des groupes de classification de 13 catégories et moins. De plus, un temps de traitement moyen inférieur à 200 ms par image a été réalisé à partir de ces mêmes groupes de classification. Le logiciel Biotaxis est proposé comme une nouvelle solution pour classifier rapidement des images de phytoplancton.

APA, Harvard, Vancouver, ISO, and other styles

44

Li, Sichu. "Application of Machine Learning Techniques for Real-time Classification of Sensor Array Data." ScholarWorks@UNO, 2009. http://scholarworks.uno.edu/td/913.

Full text

Abstract:

There is a significant need to identify approaches for classifying chemical sensor array data with high success rates that would enhance sensor detection capabilities. The present study attempts to fill this need by investigating six machine learning methods to classify a dataset collected using a chemical sensor array: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes Classifier (NB), and Principal Component Regression (PCR). A total of 10 predictors that are associated with the response from 10 sensor channels are used to train and test the classifiers. A training dataset of 4 classes containing 136 samples is used to build the classifiers, and a dataset of 4 classes with 56 samples is used for testing. The results generated with the six different methods are compared and discussed. The RF, CART, and KNN are found to have success rates greater than 90%, and to outperform the other methods.

APA, Harvard, Vancouver, ISO, and other styles

45

Zhao, Haitao. "Analyzing TCGA Genomic and Expression Data Using SVM with Embedded Parameter Tuning." University of Akron / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=akron1415629295.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Plis, Kevin A. "The Effects of Novel Feature Vectors on Metagenomic Classification." Ohio University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1399578867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Anne, Chaitanya. "Advanced Text Analytics and Machine Learning Approach for Document Classification." ScholarWorks@UNO, 2017. http://scholarworks.uno.edu/td/2292.

Full text

Abstract:

Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This thesis addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with other classes for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this work consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

APA, Harvard, Vancouver, ISO, and other styles

48

Do, Cao Tri. "Apprentissage de métrique temporelle multi-modale et multi-échelle pour la classification robuste de séries temporelles par plus proches voisins." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM028/document.

Full text

Abstract:

La définition d'une métrique entre des séries temporelles est un élément important pour de nombreuses tâches en analyse ou en fouille de données, tel que le clustering, la classification ou la prédiction. Les séries temporelles présentent naturellement différentes caractéristiques, que nous appelons modalités, sur lesquelles elles peuvent être comparées, comme leurs valeurs, leurs formes ou leurs contenus fréquentielles. Ces caractéristiques peuvent être exprimées avec des délais variables et à différentes granularités ou localisations temporelles - exprimées globalement ou localement. Combiner plusieurs modalités à plusieurs échelles pour apprendre une métrique adaptée est un challenge clé pour de nombreuses applications réelles impliquant des données temporelles. Cette thèse propose une approche pour l'Apprentissage d'une Métrique Multi-modal et Multi-scale (M2TML) en vue d'une classification robuste par plus proches voisins. La solution est basée sur la projection des paires de séries temporelles dans un espace de dissimilarités, dans lequel un processus d'optimisation à vaste marge est opéré pour apprendre la métrique. La solution M2TML est proposée à la fois dans le contexte linéaire et non-linéaire, et est étudiée pour différents types de régularisation. Une variante parcimonieuse et interprétable de la solution montre le potentiel de la métrique temporelle apprise à pouvoir localiser finement les modalités discriminantes, ainsi que leurs échelles temporelles en vue de la tâche d'analyse considérée. L'approche est testée sur un vaste nombre de 30 bases de données publiques et challenging, couvrant des images, traces, données ECG, qui sont linéairement ou non-linéairement séparables. Les expériences montrent l'efficacité et le potentiel de la méthode M2TML pour la classification de séries temporelles par plus proches voisins
The definition of a metric between time series is inherent to several data analysis and mining tasks, including clustering, classification or forecasting. Time series data present naturally several characteristics, called modalities, covering their amplitude, behavior or frequential spectrum, that may be expressed with varying delays and at different temporal granularity and localization - exhibited globally or locally. Combining several modalities at multiple temporal scales to learn a holistic metric is a key challenge for many real temporal data applications. This PhD proposes a Multi-modal and Multi-scale Temporal Metric Learning (M2TML) approach for robust time series nearest neighbors classification. The solution is based on the embedding of pairs of time series into a pairwise dissimilarity space, in which a large margin optimization process is performed to learn the metric. The M2TML solution is proposed for both linear and non linear contexts, and is studied for different regularizers. A sparse and interpretable variant of the solution shows the ability of the learned temporal metric to localize accurately discriminative modalities as well as their temporal scales.A wide range of 30 public and challenging datasets, encompassing images, traces and ECG data, that are linearly or non linearly separable, are used to show the efficiency and the potential of M2TML for time series nearest neighbors classification

APA, Harvard, Vancouver, ISO, and other styles

49

Lardeux, Cédric. "Apport des données radar polarimétriques pour la cartographie en milieu tropical." Phd thesis, Université Paris-Est, 2008. http://tel.archives-ouvertes.fr/tel-00481850.

Full text

Abstract:

Les capteurs RSO (Radar à Synthèse d'Ouverture) fournissent des observations des surfaces terrestres de manière continue depuis 1991 avec la mise en orbite du satellite ERS-1. Les données acquises jusqu'à peu, principalement basées sur l'exploitation de l'intensité du signal acquis selon une configuration de polarisation particulière, ont été l'objet de nombreuses études, notamment sur le suivi de la déforestation. Depuis 2007, de nouveaux capteurs RSO polarimétriques (PALSAR, RADARSAT-2, TerraSAR-X...) permettent la caractérisation polarimétrique des surfaces observées. Ces données nécessitent des traitements adpatés afin d'en extraire l'information la plus pertinente pour la thématique considérée. L'objet de ces travaux a été d'évaluer leur potentiel pour la cartographie de surfaces naturelles en milieu tropical. L'apport des multiples indices polarimétriques a été évalué à partir de l'algorithme de classification SVM (Machines à Vecteurs de Support). Cet algorithme est spécialement adapté pour prendre en compte un grand nombre d'indices non forcément homogènes. Les données utilisées ont été acquises par le capteur aéroporté AIRSAR sur une île en Polynésie Française. De nombreux relevés in situ ont permis la validation des résultats obtenus. Les résultats montrent que la sensibilité de ces données à la structure géométrique des surfaces observées permet une bonne discrimination entre les différents couvert végétaux étudiés, en particulier des types de forêts. De plus, la classification obtenue à partir de la méthode SVM est particulièrement plus performante que la classification usuelle basée sur la distribution de Wishart vérifiée a priori par les données radar. Ces résultats laissent présager de l'apport significatif des données radar polarimétriques futures pour le suivi des surfaces naturelles

APA, Harvard, Vancouver, ISO, and other styles

50

Fernquist, Johan. "Detection of deceptive reviews : using classification and natural language processing features." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-306956.

Full text

Abstract:

With the great growth of open forums online where anyone can givetheir opinion on everything, the Internet has become a place wherepeople are trying to mislead others. By assuming that there is acorrelation between a deceptive text's purpose and the way to writethe text, our goal with this thesis was to develop a model fordetecting these fake texts by taking advantage of this correlation.Our approach was to use classification together with threedifferent feature types, term frequency-inverse document frequency,word2vec and probabilistic context-free grammar. We have managed todevelop a model which have improved all, to us known, results for twodifferent datasets.With machine translation, we have detected that there is apossibility to hide the stylometric footprints and thecharacteristics of deceptive texts, making it possible to slightlydecrease the accuracy of a classifier and still convey a message.Finally we investigated whether it was possible to train and test ourmodel on data from different sources and managed to achieve anaccuracy hardly better than chance. That indicated the resultingmodel is not versatile enough to be used on different kinds ofdeceptive texts than it has been trained on.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'SVM classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles