Log in

Relevant bibliographies by topics / Support vector data description / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Support vector data description.

Dissertations / Theses on the topic 'Support vector data description'

Author: Grafiati

Published: 4 June 2021

Last updated: 16 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Support vector data description.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Chu, Shun-Kwong. "Scaling up support vector data description by using core-sets /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20CHU.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.<br>Includes bibliographical references (leaves 60-64). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

2

Sfikas, Giorgos. "Modèles statistiques non linéaires pour l'analyse de formes : application à l'imagerie cérébrale." Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00789793.

Full text

Abstract:

Cette thèse a pour objet l'analyse statistique de formes, dans le contexte de l'imagerie médicale.Dans le champ de l'imagerie médicale, l'analyse de formes est utilisée pour décrire la variabilité morphologique de divers organes et tissus. Nous nous focalisons dans cette thèse sur la construction d'un modèle génératif et discriminatif, compact et non-linéaire, adapté à la représentation de formes.Ce modèle est évalué dans le contexte de l'étude d'une population de patients atteints de la maladie d'Alzheimer et d'une population de sujets contrôles sains. Notre intérêt principal ici est l'utilisationdu modèle discriminatif pour découvrir les différences morphologiques les plus discriminatives entre une classe de formes donnée et des formes n'appartenant pas à cette classe. L'innovation théorique apportée par notre modèle réside en deux points principaux : premièrement, nous proposons un outil pour extraire la différence discriminative dans le cadre Support Vector Data Description (SVDD) ; deuxièmement, toutes les reconstructions générées sont anatomiquementcorrectes. Ce dernier point est dû au caractère non-linéaire et compact du modèle, lié à l'hypothèse que les données (les formes) se trouvent sur une variété non-linéaire de dimension faible. Une application de notre modèle à des données médicales réelles montre des résultats cohérents avec les connaissances médicales.

APA, Harvard, Vancouver, ISO, and other styles

3

El, Azami Meriem. "Computer aided diagnosis of epilepsy lesions based on multivariate and multimodality data analysis." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI087/document.

Full text

Abstract:

Environ 150.000 personnes souffrent en France d'une épilepsie partielle réfractaire à tous les médicaments. La chirurgie, qui constitue aujourd’hui le meilleur recours thérapeutique nécessite un bilan préopératoire complexe. L'analyse de données d'imagerie telles que l’imagerie par résonance magnétique (IRM) anatomique et la tomographie d’émission de positons (TEP) au FDG (fluorodéoxyglucose) tend à prendre une place croissante dans ce protocole, et pourrait à terme limiter de recourir à l’électroencéphalographie intracérébrale (SEEG), procédure très invasive mais qui constitue encore la technique de référence. Pour assister les cliniciens dans leur tâche diagnostique, nous avons développé un système d'aide au diagnostic (CAD) reposant sur l'analyse multivariée de données d'imagerie. Compte tenu de la difficulté relative à la constitution de bases de données annotées et équilibrées entre classes, notre première contribution a été de placer l'étude dans le cadre méthodologique de la détection du changement. L'algorithme du séparateur à vaste marge adapté à ce cadre là (OC-SVM) a été utilisé pour apprendre, à partir de cartes multi-paramétriques extraites d'IRM T1 de sujets normaux, un modèle prédictif caractérisant la normalité à l'échelle du voxel. Le modèle permet ensuite de faire ressortir, dans les images de patients, les zones cérébrales suspectes s'écartant de cette normalité. Les performances du système ont été évaluées sur des lésions simulées ainsi que sur une base de données de patients. Trois extensions ont ensuite été proposées. D'abord un nouveau schéma de détection plus robuste à la présence de bruit d'étiquetage dans la base de données d'apprentissage. Ensuite, une stratégie de fusion optimale permettant la combinaison de plusieurs classifieurs OC-SVM associés chacun à une séquence IRM. Enfin, une généralisation de l'algorithme de détection d'anomalies permettant la conversion de la sortie du CAD en probabilité, offrant ainsi une meilleure interprétation de la sortie du système et son intégration dans le bilan pré-opératoire global<br>One third of patients suffering from epilepsy are resistant to medication. For these patients, surgical removal of the epileptogenic zone offers the possibility of a cure. Surgery success relies heavily on the accurate localization of the epileptogenic zone. The analysis of neuroimaging data such as magnetic resonance imaging (MRI) and positron emission tomography (PET) is increasingly used in the pre-surgical work-up of patients and may offer an alternative to the invasive reference of Stereo-electro-encephalo -graphy (SEEG) monitoring. To assist clinicians in screening these lesions, we developed a computer aided diagnosis system (CAD) based on a multivariate data analysis approach. Our first contribution was to formulate the problem of epileptogenic lesion detection as an outlier detection problem. The main motivation for this formulation was to avoid the dependence on labelled data and the class imbalance inherent to this detection task. The proposed system builds upon the one class support vector machines (OC-SVM) classifier. OC-SVM was trained using features extracted from MRI scans of healthy control subjects, allowing a voxelwise assessment of the deviation of a test subject pattern from the learned patterns. System performance was evaluated using realistic simulations of challenging detection tasks as well as clinical data of patients with intractable epilepsy. The outlier detection framework was further extended to take into account the specificities of neuroimaging data and the detection task at hand. We first proposed a reformulation of the support vector data description (SVDD) method to deal with the presence of uncertain observations in the training data. Second, to handle the multi-parametric nature of neuroimaging data, we proposed an optimal fusion approach for combining multiple base one-class classifiers. Finally, to help with score interpretation, threshold selection and score combination, we proposed to transform the score outputs of the outlier detection algorithm into well calibrated probabilities

APA, Harvard, Vancouver, ISO, and other styles

4

Díaz, Jorge Luis Guevara. "Modelos de aprendizado supervisionado usando métodos kernel, conjuntos fuzzy e medidas de probabilidade." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-03122015-155546/.

Full text

Abstract:

Esta tese propõe uma metodologia baseada em métodos de kernel, teoria fuzzy e probabilidade para tratar conjuntos de dados cujas observações são conjuntos de pontos. As medidas de probabilidade e os conjuntos fuzzy são usados para modelar essas observações. Posteriormente, graças a kernels definidos sobre medidas de probabilidade, ou em conjuntos fuzzy, é feito o mapeamento implícito dessas medidas de probabilidade, ou desses conjuntos fuzzy, para espaços de Hilbert com kernel reproduzível, onde a análise pode ser feita com algum método kernel. Usando essa metodologia, é possível fazer frente a uma ampla gamma de problemas de aprendizado para esses conjuntos de dados. Em particular, a tese apresenta o projeto de modelos de descrição de dados para observações modeladas com medidas de probabilidade. Isso é conseguido graças ao mergulho das medidas de probabilidade nos espaços de Hilbert, e a construção de esferas envolventes mínimas nesses espaços de Hilbert. A tese apresenta como esses modelos podem ser usados como classificadores de uma classe, aplicados na tarefa de detecção de anomalias grupais. No caso que as observações sejam modeladas por conjuntos fuzzy, a tese propõe mapear esses conjuntos fuzzy para os espaços de Hilbert com kernel reproduzível. Isso pode ser feito graças à projeção de novos kernels definidos sobre conjuntos fuzzy. A tese apresenta como esses novos kernels podem ser usados em diversos problemas como classificação, regressão e na definição de distâncias entre conjuntos fuzzy. Em particular, a tese apresenta a aplicação desses kernels em problemas de classificação supervisionada em dados intervalares e teste kernel de duas amostras para dados contendo atributos imprecisos.<br>This thesis proposes a methodology based on kernel methods, probability measures and fuzzy sets, to analyze datasets whose individual observations are itself sets of points, instead of individual points. Fuzzy sets and probability measures are used to model observations; and kernel methods to analyze the data. Fuzzy sets are used when the observation contain imprecise, vague or linguistic values. Whereas probability measures are used when the observation is given as a set of multidimensional points in a $D$-dimensional Euclidean space. Using this methodology, it is possible to address a wide range of machine learning problems for such datasets. Particularly, this work presents data description models when observations are modeled by probability measures. Those description models are applied to the group anomaly detection task. This work also proposes a new class of kernels, \\emph{the kernels on fuzzy sets}, that are reproducing kernels able to map fuzzy sets to a geometric feature spaces. Those kernels are similarity measures between fuzzy sets. We give from basic definitions to applications of those kernels in machine learning problems as supervised classification and a kernel two-sample test. Potential applications of those kernels include machine learning and patter recognition tasks over fuzzy data; and computational tasks requiring a similarity measure estimation between fuzzy sets.

APA, Harvard, Vancouver, ISO, and other styles

5

Mao, Jin, Lisa R. Moore, Carrine E. Blank, et al. "Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources." BIOMED CENTRAL LTD, 2016. http://hdl.handle.net/10150/622562.

Full text

Abstract:

Background: The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. Results: We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. Conclusion: MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.

APA, Harvard, Vancouver, ISO, and other styles

6

D'Orangeville, Vincent. "Analyse automatique de données par Support Vector Machines non supervisés." Thèse, Université de Sherbrooke, 2012. http://hdl.handle.net/11143/6678.

Full text

Abstract:

Cette dissertation présente un ensemble d'algorithmes visant à en permettre un usage rapide, robuste et automatique des « Support Vector Machines » (SVM) non supervisés dans un contexte d'analyse de données. Les SVM non supervisés se déclinent sous deux types algorithmes prometteurs, le « Support Vector Clustering » (SVC) et le « Support Vector Domain Description » (SVDD), offrant respectivement une solution à deux problèmes importants en analyse de données, soit la recherche de groupements homogènes (« clustering »), ainsi que la reconnaissance d'éléments atypiques (« novelty/abnomaly detection ») à partir d'un ensemble de données. Cette recherche propose des solutions concrètes à trois limitations fondamentales inhérentes à ces deux algorithmes, notamment I) l'absence d'algorithme d'optimisation efficace permettant d'exécuter la phase d'entrainement des SVDD et SVC sur des ensembles de données volumineux dans un délai acceptable, 2) le manque d'efficacité et de robustesse des algorithmes existants de partitionnement des données pour SVC, ainsi que 3) l'absence de stratégies de sélection automatique des hyperparamètres pour SVDD et SVC contrôlant la complexité et la tolérance au bruit des modèles générés. La résolution individuelle des trois limitations mentionnées précédemment constitue les trois axes principaux de cette thèse doctorale, chacun faisant l'objet d'un article scientifique proposant des stratégies et algorithmes permettant un usage rapide, robuste et exempt de paramètres d'entrée des SVDD et SVC sur des ensembles de données arbitraires.

APA, Harvard, Vancouver, ISO, and other styles

7

Perez, Daniel Antonio. "Performance comparison of support vector machine and relevance vector machine classifiers for functional MRI data." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34858.

Full text

Abstract:

Multivariate pattern analysis (MVPA) of fMRI data has been growing in popularity due to its sensitivity to networks of brain activation. It is performed in a predictive modeling framework which is natural for implementing brain state prediction and real-time fMRI applications such as brain computer interfaces. Support vector machines (SVM) have been particularly popular for MVPA owing to their high prediction accuracy even with noisy datasets. Recent work has proposed the use of relevance vector machines (RVM) as an alternative to SVM. RVMs are particularly attractive in time sensitive applications such as real-time fMRI since they tend to perform classification faster than SVMs. Despite the use of both methods in fMRI research, little has been done to compare the performance of these two techniques. This study compares RVM to SVM in terms of time and accuracy to determine which is better suited to real-time applications.

APA, Harvard, Vancouver, ISO, and other styles

8

Devine, Jon. "Support Vector Methods for Higher-Level Event Extraction in Point Data." Fogler Library, University of Maine, 2009. http://www.library.umaine.edu/theses/pdf/DevineJ2009.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Crampton, Andrew. "Radial basis and support vector machine algorithms for approximating discrete data." Thesis, University of Huddersfield, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.273723.

Full text

Abstract:

The aim of this thesis is to demonstrate how the versatility of radial basis functions can be used to construct algorithms for approximating discrete sets of scattered data. In many cases, these algorithms have been constructed by blending together existing methods or by extending algorithms that exploit certain properties of a particular basis function to include certain radial functions. In the later chapters, we shall see that methods which currently use radial basis functions can be made more efficient by considering a change to the existing methods of solution. In chapter one we introduce radial basis functions (RBFs) and show how they can be used to construct interpolation and approximation models. We examine the uniqueness properties of the interpolation scheme for two specific functions and review some of the methods currently being used to determine the type of function to use and how to choose the number and location of centres. We describe three methods for choosing centres based on data clustering techniques and compare the accuracy of an approximation using two of these schemes. We show through a numerical example how greater accuracy can be achieved by combining these two schemes intelligently to construct a new, hybrid method. Problems that currently exist, for a particular clustering algorithm, when dealing with domain boundaries and which are not covered in great detail in the literature are highlighted and a new method is proposed. We conclude the chapter with an investigation into point distributions on the sphere. Radial basis functions are increasingly being used as a tool for approximating both discrete data and known functions on the sphere. Much of the current research focuses on constructing optimum point distributions for approximations using spherical harmonics. In this section we compare and evaluate these point distributions for RBF approximations and contrast the accuracy of the spherical harmonics with results obtained using the multiquadric function. In chapter two we develop an algorithm for surface approximation by combining the works of Mason & Bennell [40], and Clenshaw & Hayes [18]. Here, the well known method for constructing tensor products on rectangular grids is combined with an algorithm for approximating data collected along curved paths. The method developed in the literature for separable Chebyshev polynomials is extended to include the Gaussian radial function. Since the centres of the Gaussians can be distinct from the data points, we suggest a method for constructing a suitable set of centres to enable the efficiency of the two methods to be preserved. Possibilities for further efficiency using parallel processing are also discussed. We conclude the chapter by reviewing the Gram-Schmidt method and show how the use of orthogonal functions results in a numerically stable computation for evaluating the model parameters. The local support of the Gaussian function is investigated and the method of Mason & Crampton [41] is explained for constructing orthogonalised Gaussian functions. Chapter three introduces a relatively new topic in data approximation called support vector machines (SVMs). The motivation behind using SVMs for constructing regression models to corrupted data is addressed and the use of RBF kernels to map data into feature space is explained. We show how the regression model is formulated and discuss currently used methods of solution. The flexibility of SVMs to adapt to different types and level of noise is demonstrated through some numerical examples. We make use of the techniques developed in SVM regression to show how the algorithm described in chapter two can be extended. Here we make use of SVMs in the early stages of the algorithm to remove the need for further consideration of noise. We complete the discussion of SVMs by explaining their use in the field of data classification through a simple pattern recognition example.Chapter four focuses on a new approach to the solution of an SVM. The new approach taken is one of constructing an entirely linear objective function. This is achieved by changing the regularisation term. We show, in detail, how the changes made to the existing framework affects the construction of the model. We describe the solution method and explain how advantage can be taken of the new linear structure. To determine the model parameters, we show how the solution, in the form of a simplex tableau, can be found extremely efficiently by recognising certain relationships between variables that allow us to employ Lei's algorithm. Examples that show SVM approximants to noisy data for both curves and surfaces are given together with a comparison between Lei's algorithm and a standard simplex solution method. We finish the section by highlighting the link between support vectors and radial basis function centres. The sparsity produced by the method in the coefficient vector is also discussed. The new linearised approach to constructing SVM regression models is used in a new algorithm developed to construct planar curves that model the path of fault lines in a surface. Part of a detection algorithm proposed by Gutzmer & Iske [33] is used to determine points that lie close to a fault line. The new approach is then to model the fault line by constructing an SVM regression curve. The chapter concludes with some examples and remarks. The thesis concludes with Chapter five in which we summarise the main points discussed and point to possibilities for extending the work presented.

APA, Harvard, Vancouver, ISO, and other styles

10

Andreola, Rafaela. "Support Vector Machines na classificação de imagens hiperespectrais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/17894.

Full text

Abstract:

É de conhecimento geral que, em alguns casos, as classes são espectralmente muito similares e que não é possível separá-las usando dados convencionais em baixa dimensionalidade. Entretanto, estas classes podem ser separáveis com um alto grau de acurácia em espaço de alta dimensão. Por outro lado, classificação de dados em alta dimensionalidade pode se tornar um problema para classificadores paramétricos, como o Máxima Verossimilhança Gaussiana (MVG). Um grande número de variáveis que caracteriza as imagens hiperespectrais resulta em um grande número de parâmetros a serem estimados e, geralmente, tem-se um número limitado de amostras de treinamento disponíveis. Essa condição causa o fenômeno de Hughes que consiste na gradual degradação da acurácia com o aumento da dimensionalidade dos dados. Neste contexto, desperta o interesse a utilização de classificadores não-paramétricos, como é o caso de Support Vector Machines (SVM). Nesta dissertação é analisado o desempenho do classificador SVM quando aplicado a imagens hiperespectrais de sensoriamento remoto. Inicialmente os conceitos teóricos referentes à SVM são revisados e discutidos. Em seguida, uma série de experimentos usando dados AVIRIS são realizados usando diferentes configurações para o classificador. Os dados cobrem uma área de teste da Purdue University e apresenta classes de culturas agrícolas espectralmente muito similares. A acurácia produzida na classificação por diferentes kernels são investigadas em função da dimensionalidade dos dados e comparadas com as obtidas com o classificador MVG. Como SVM é aplicado a um par de classes por vez, desenvolveu-se um classificador multi-estágio estruturado em forma de árvore binária para lidar como problema multi-classe. Em cada nó, a seleção do par de classes mais separáveis é feita pelo critério distância de Bhattacharyya. Tais classes darão origem aos nós descendentes e serão responsáveis por definir a função de decisão SVM. Repete-se este procedimento em todos os nós da árvore, até que reste apenas uma classe por nó, nos chamados nós terminais. Os softwares necessários foram desenvolvidos em ambiente MATLAB e são apresentados na dissertação. Os resultados obtidos nos experimentos permitem concluir que SVM é uma abordagem alternativa válida e eficaz para classificação de imagens hiperespectrais de sensoriamento remoto.<br>This dissertation deals with the application of Support Vector Machines (SVM) to the classification of remote sensing high-dimensional image data. It is well known that in many cases classes that are spectrally very similar and thus not separable when using the more conventional low-dimensional data, can nevertheless be separated with an high degree of accuracy in high dimensional spaces. Classification of high-dimensional image data can, however, become a challenging problem for parametric classifiers such as the well-known Gaussian Maximum Likelihood. A large number of variables produce an also large number of parameters to be estimated from a generally limited number of training samples. This condition causes the Hughes phenomenon which consists in a gradual degradation of the accuracy as the data dimensionality increases beyond a certain value. Non-parametric classifiers present the advantage of being less sensitive to this dimensionality problem. SVM has been receiving a great deal of attention from the international community as an efficient classifier. In this dissertation it is analyzed the performance of SVM when applied to remote sensing hyper-spectral image data. Initially the more theoretical concepts related to SVM are reviewed and discussed. Next, a series of experiments using AVIRIS image data are performed, using different configurations for the classifier. The data covers a test area established by Purdue University and presents a number of classes (agricultural fields) which are spectrally very similar to each other. The classification accuracy produced by different kernels is investigated as a function of the data dimensionality and compared with the one yielded by the well-known Gaussian Maximum Likelihood classifier. As SVM apply to a pair of classes at a time, a multi-stage classifier structured as a binary tree was developed to deal with the multi-class problem. The tree classifier is initially defined by selecting at each node the most separable pair of classes by using the Bhattacharyya distance as a criterion. These two classes will then be used to define the two descending nodes and the corresponding SVM decision function. This operation is performed at every node across the tree, until the terminal nodes are reached. The required software was developed in MATLAB environment and is also presented in this dissertation.

APA, Harvard, Vancouver, ISO, and other styles

11

Park, Yongwon Baskiyar Sanjeev. "Dynamic task scheduling onto heterogeneous machines using Support Vector Machine." Auburn, Ala, 2008. http://repo.lib.auburn.edu/EtdRoot/2008/SPRING/Computer_Science_and_Software_Engineering/Thesis/Park_Yong_50.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Hayes, Timothy. "Novel vector architectures for data management." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/397645.

Full text

Abstract:

As the rate of annual data generation grows exponentially, there is a demand to manage, query and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased, and further performance must come from exploiting parallelism. Vector architectures offer a highly efficient and scalable way of exploiting data-level parallelism (DLP) through sophisticated single instruction-multiple data (SIMD) instruction sets. Traditionally, vector machines were used to accelerate scientific workloads rather than business-domain applications. In this thesis, we design innovative vector extensions for a modern superscalar microarchitecture that are optimised for data management workloads. Based on extensive analysis of these workloads, we propose new algorithms, novel instructions and microarchitectural optimisations. We first profile a leading commercial decision support system to better understand where the execution time is spent. We find that the hash join operator is responsible for a significant portion of the time. Based on our profiling, we develop lightweight integer-based pipelined vector extensions to capture the DLP in the operator. We then proceed to implement and evaluate these extensions using a custom simulation framework based on PTLsim and DRAMSim2. We motivate key design decisions based on the structure of the algorithm and compare these choices against alternatives experimentally. We discover that relaxing the base architecture's memory model is very beneficial when executing a vectorised implementation of the algorithm. This relaxed model serves as a powerful mechanism to execute indexed vector memory instructions out of order without requiring complex associative hardware. We find that our vectorised implementation shows good speedups. Furthermore, the vectorised version exhibits better scalability compared to the original scalar version run on a microarchitecture with larger superscalar and out-of-order structures. We then make a detailed study of SIMD sorting algorithms. Using our simulation framework, we evaluate the strengths, weaknesses and scalability of three diverse vectorised sorting algorithms- quicksort, bitonic mergesort and radix sort. We find that each of these algorithms has its unique set of bottlenecks. Based on these findings, we propose VSR sort- a novel vectorised non-comparative sorting algorithm that is based on radix sort but without its drawbacks. VSR sort, however, cannot be implemented directly with typical vector instructions due to the irregularity of its DLP. To facilitate the implementation of this algorithm, we define two new vector instructions and propose a complementary hardware structure for their execution. We find that VSR sort significantly outperforms each of the other vectorised algorithms. Next, we propose and evaluate five different ways of vectorising GROUP BY data aggregations. We find that although data aggregation algorithms are abundant in DLP, it is often too irregular to be expressed efficiently using typical vector instructions. By extending the hardware used for VSR sort, we propose a set of vector instructions and novel algorithms to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Finally, we evaluate the area, energy and power of these extensions using McPAT. Our results show that our proposed vector extensions come with a modest area overhead, even when using a large maximum vector length with lockstepped parallel lanes. Using sorting as a case study, we find that all of the vectorised algorithms consume much less energy than their scalar counterpart. In particular, our novel VSR sort requires an order of magnitude less energy than the scalar baseline. With respect to power, we discover that our vector extensions present a very reasonable increase in wattage.<br>El crecimiento exponencial de la ratio de creación de datos anual conlleva asociada una demanda para gestionar, consultar y resumir cantidades enormes de información rápidamente. En el pasado, se confiaba en el escalado de la frecuencia de los procesadores para incrementar el rendimiento. Hoy en día los incrementos de rendimiento deben conseguirse mediante la explotación de paralelismo. Las arquitecturas vectoriales ofrecen una manera muy eficiente y escalable de explotar el paralelismo a nivel de datos (DLP, por sus siglas en inglés) a través de sofisticados conjuntos de instrucciones "Single Instruction-Multiple Data" (SIMD). Tradicionalmente, las máquinas vectoriales se usaban para acelerar aplicaciones científicas y no de negocios. En esta tesis diseñamos extensiones vectoriales innovadoras para una microarquitectura superescalar moderna, optimizadas para tareas de gestión de datos. Basándonos en un extenso análisis de estas aplicaciones, también proponemos nuevos algoritmos, instrucciones novedosas y optimizaciones en la microarquitectura. Primero, caracterizamos un sistema comercial de soporte de decisiones. Encontramos que el operador "hash join" es responsable de una porción significativa del tiempo. Basándonos en nuestra caracterización, desarrollamos extensiones vectoriales ligeras para datos enteros, con el objetivo de capturar el paralelismo en este operandos. Entonces implementos y evaluamos estas extensiones usando un simulador especialmente adaptado por nosotros, basado en PTLsim y DRAMSim2. Descubrimos que relajar el modelo de memoria de la arquitectura base es altamente beneficioso, permitiendo ejecutar instrucciones vectoriales de memoria indexadas, fuera de orden, sin necesitar hardware asociativo complejo. Encontramos que nuestra implementación vectorial consigue buenos incrementos de rendimiento. Seguimos con la realización de un estudio detallado de algoritmos de ordenación SIMD. Usando nuestra infraestructura de simulación, evaluamos los puntos fuertes y débiles así como la escalabilidad de tres algoritmos vectorizados de ordenación diferentes quicksort, bitonic mergesort y radix sort. A partir de este análisis, proponemos "VSR sort" un nuevo algoritmo de ordenación vectorizado, basado en radix sort pero sin sus limitaciones. Sin embargo, VSR sort no puede ser implementado directamente con instrucciones vectoriales típicas, debido a la irregularidad de su DLP. Para facilitar la implementación de este algoritmo, definimos dos nuevas instrucciones vectoriales y proponemos una estructura hardware correspondiente. VSR sort consigue un rendimiento significativamente más alto que los otros algoritmos. A continuación, proponemos y evaluamos cinco maneras diferentes de vectorizar agregaciones de datos "GROUP BY". Encontramos que, aunque los algoritmos de agregación de datos tienen DLP abundante, frecuentemente este es demasiado irregular para ser expresado eficientemente usando instrucciones vectoriales típicas. Mediante la extensión del hardware usado para VSR sort, proponemos un conjunto de instrucciones vectoriales y algoritmos para capturar mejor este DLP irregular. Finalmente, evaluamos el área, energía y potencia de estas extensiones usando McPAT. Nuestros resultados muestran que las extensiones vectoriales propuestas conllevan un aumento modesto del área del procesador, incluso cuando se utiliza una longitud vectorial larga con varias líneas de ejecución vectorial paralelas. Escogiendo los algoritmos de ordenación como caso de estudio, encontramos que todos los algoritmos vectorizados consumen mucha menos energía que una implementación escalar. En particular, nuestro nuevo algoritmo VSR sort requiere un orden de magnitud menos de energía que el algoritmo escalar de referencia. Respecto a la potencia disipada, descubrimos que nuestras extensiones vectoriales presentan un incremento muy razonable

APA, Harvard, Vancouver, ISO, and other styles

13

Daniel, Öberg. "Clinical Assessment for Deep Vein Thrombosis using Support Vector Machines : A description of a clinical assessment and compression ultrasonography journaling system for deep vein thrombosis using support vector machines." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178419.

Full text

Abstract:

This master thesis describes a journaling system for compression ultrasonography and a clinical assessment system for deep vein thrombosis (DVT). We evaluate Support Vector Machines (SVM) models with linear- and radial basis function-kernels for predicting deep vein thrombosis, and for facilitating creation of new clinical DVT assessment. Data from 159 patients where analysed, with our dataset, Wells Score with a high clinical probability have an accuracy of 58%, sensitivity 60% and specificity of 57% these figured should be compared to those of our base models accuracy of 81%, sensitivity 66% and specificity 84%. A 23 percentage point increase in accuracy.The diagnostic odds ratio went from 2.12 to 11.26. However a larger dataset is required to report anything conclusive. As our system is both a journaling and prediction system, every patient examined helps the accuracy of the assessment.<br>I denna rapport beskrivs ett journalsystem samt ett system för klinisk bedömning av djupvenstromboser.Vår modell baserar sig på en stödvektormaskin (eng. Support Vector Machine) med linjär och radial basfunktion för att fastställa förekomsten av djupa ventromboser samt att hjälpa till i skapandet av nya modeller för bedömning. 159 patientjournaler användes för att fastställa att Wells Score har en klinisk precision på 58%, 60% sensitivitet och specificitet på 57% somkan jämföras med våran modell som har en precision på 81%, 66% sensitivitet och specificitet på 84%. En 23 procentenheters ökning i precision.Den diagnostiska oddskvoten gick från 2.12 till 11.26. Det behövs dock en större datamängd för att rapportera något avgörande. Då vårt system både är för journalskapande och klinisk bedömning så kommer varje undersökt patient att bidra till högre precision i modellen.

APA, Harvard, Vancouver, ISO, and other styles

14

Yi, Long. "KernTune: self-tuning Linux kernel performance using support vector machines." Thesis, University of the Western Cape, 2006. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_6921_1249280496.

Full text

Abstract:

<p>Self-tuning has been an elusive goal for operating systems and is becoming a pressing issue for modern operating systems. Well-trained system administrators are able to tune an operating system to achieve better system performance for a specific system class. Unfortunately, the system class can change when the running applications change. The model for self-tuning operating system is based on a monitor-classify-adjust loop. The idea of this loop is to continuously monitor certain performance metrics, and whenever these change, the system determines the new system class and dynamically adjusts tuning parameters for this new class. This thesis described KernTune, a prototype tool that identifies the system class and improves system performance automatically. A key aspect of KernTune is the notion of Artificial Intelligence oriented performance tuning. Its uses a support vector machine to identify the system class, and tunes the operating system for that specific system class. This thesis presented design and implementation details for KernTune. It showed how KernTune identifies a system class and tunes the operating system for improved performance.</p>

APA, Harvard, Vancouver, ISO, and other styles

15

Shakeel, Mohammad Danish. "Land Cover Classification Using Linear Support Vector Machines." Connect to resource online, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1231812653.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Melki, Gabriella A. "Novel Support Vector Machines for Diverse Learning Paradigms." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5630.

Full text

Abstract:

This dissertation introduces novel support vector machines (SVM) for the following traditional and non-traditional learning paradigms: Online classification, Multi-Target Regression, Multiple-Instance classification, and Data Stream classification. Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets' correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model's prediction performance, while reducing computational complexity. Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model. Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy. OLLAWV's online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm - List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2's low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV's fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods. Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields.

APA, Harvard, Vancouver, ISO, and other styles

17

Guan, Wei. "New support vector machine formulations and algorithms with application to biomedical data analysis." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41126.

Full text

Abstract:

The Support Vector Machine (SVM) classifier seeks to find the separating hyperplane wx=r that maximizes the margin distance 1/||w||2^2. It can be formalized as an optimization problem that minimizes the hinge loss Ʃ[subscript i](1-y[subscript i] f(x[subscript i]))₊ plus the L₂-norm of the weight vector. SVM is now a mainstay method of machine learning. The goal of this dissertation work is to solve different biomedical data analysis problems efficiently using extensions of SVM, in which we augment the standard SVM formulation based on the application requirements. The biomedical applications we explore in this thesis include: cancer diagnosis, biomarker discovery, and energy function learning for protein structure prediction. Ovarian cancer diagnosis is problematic because the disease is typically asymptomatic especially at early stages of progression and/or recurrence. We investigate a sample set consisting of 44 women diagnosed with serous papillary ovarian cancer and 50 healthy women or women with benign conditions. We profile the relative metabolite levels in the patient sera using a high throughput ambient ionization mass spectrometry technique, Direct Analysis in Real Time (DART). We then reduce the diagnostic classification on these metabolic profiles into a functional classification problem and solve it with functional Support Vector Machine (fSVM) method. The assay distinguished between the cancer and control groups with an unprecedented 99\% accuracy (100\% sensitivity, 98\% specificity) under leave-one-out-cross-validation. This approach has significant clinical potential as a cancer diagnostic tool. High throughput technologies provide simultaneous evaluation of thousands of potential biomarkers to distinguish different patient groups. In order to assist biomarker discovery from these low sample size high dimensional cancer data, we first explore a convex relaxation of the L₀-SVM problem and solve it using mixed-integer programming techniques. We further propose a more efficient L₀-SVM approximation, fractional norm SVM, by replacing the L₂-penalty with L[subscript q]-penalty (q in (0,1)) in the optimization formulation. We solve it through Difference of Convex functions (DC) programming technique. Empirical studies on the synthetic data sets as well as the real-world biomedical data sets support the effectiveness of our proposed L₀-SVM approximation methods over other commonly-used sparse SVM methods such as the L₁-SVM method. A critical open problem in emph{ab initio} protein folding is protein energy function design. We reduce the problem of learning energy function for extit{ab initio} folding to a standard machine learning problem, learning-to-rank. Based on the application requirements, we constrain the reduced ranking problem with non-negative weights and develop two efficient algorithms for non-negativity constrained SVM optimization. We conduct the empirical study on an energy data set for random conformations of 171 proteins that falls into the {it ab initio} folding class. We compare our approach with the optimization approach used in protein structure prediction tool, TASSER. Numerical results indicate that our approach was able to learn energy functions with improved rank statistics (evaluated by pairwise agreement) as well as improved correlation between the total energy and structural dissimilarity.

APA, Harvard, Vancouver, ISO, and other styles

18

Pang, Bo. "Handwriting Chinese character recognition based on quantum particle swarm optimization support vector machine." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3950620.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Chen, Li. "Integrative Modeling and Analysis of High-throughput Biological Data." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30192.

Full text

Abstract:

Computational biology is an interdisciplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. In this dissertation research work, we propose novel methods to integrate, model and analyze multiple biological data, including microarray gene expression data, protein-DNA interaction data and protein-protein interaction data. These methods will help improve our understanding of biological systems. First, we propose a knowledge-guided multi-scale independent component analysis (ICA) method for biomarker identification on time course microarray data. Guided by a knowledge gene pool related to a specific disease under study, the method can determine disease relevant biological components from ICA modes and then identify biologically meaningful markers related to the specific disease. We have applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Second, we propose a novel method for transcriptional regulatory network identification by integrating gene expression data and protein-DNA binding data. The approach is built upon a multi-level analysis strategy designed for suppressing false positive predictions. With this strategy, a regulatory module becomes increasingly significant as more relevant gene sets are formed at finer levels. At each level, a two-stage support vector regression (SVR) method is utilized to reduce false positive predictions by integrating binding motif information and gene expression data; a significance analysis procedure is followed to assess the significance of each regulatory module. The resulting performance on simulation data and yeast cell cycle data shows that the multi-level SVR approach outperforms other existing methods in the identification of both regulators and their target genes. We have further applied the proposed method to breast cancer cell line data to identify condition-specific regulatory modules associated with estrogen treatment. Experimental results show that our method can identify biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Third, we propose a bootstrapping Markov Random Filed (MRF)-based method for subnetwork identification on microarray data by incorporating protein-protein interaction data. Methodologically, an MRF-based network score is first derived by considering the dependency among genes to increase the chance of selecting hub genes. A modified simulated annealing search algorithm is then utilized to find the optimal/suboptimal subnetworks with maximal network score. A bootstrapping scheme is finally implemented to generate confident subnetworks. Experimentally, we have compared the proposed method with other existing methods, and the resulting performance on simulation data shows that the bootstrapping MRF-based method outperforms other methods in identifying ground truth subnetwork and hub genes. We have then applied our method to breast cancer data to identify significant subnetworks associated with drug resistance. The identified subnetworks not only show good reproducibility across different data sets, but indicate several pathways and biological functions potentially associated with the development of breast cancer and drug resistance. In addition, we propose to develop network-constrained support vector machines (SVM) for cancer classification and prediction, by taking into account the network structure to construct classification hyperplanes. The simulation study demonstrates the effectiveness of our proposed method. The study on the real microarray data sets shows that our network-constrained SVM, together with the bootstrapping MRF-based subnetwork identification approach, can achieve better classification performance compared with conventional biomarker selection approaches and SVMs. We believe that the research presented in this dissertation not only provides novel and effective methods to model and analyze different types of biological data, the extensive experiments on several real microarray data sets and results also show the potential to improve the understanding of biological mechanisms related to cancers by generating novel hypotheses for further study.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

20

Luo, Tong. "Scaling up support vector machines with application to plankton recognition." [Tampa, Fla.] : University of South Florida, 2005. http://purl.fcla.edu/fcla/etd/SFE0001154.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Sharma, Jason P. (Jason Poonam) 1979. "Classification performance of support vector machines on genomic data utilizing feature space selection techniques." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87830.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Huang, Norman Jason. "Graph-based Support Vector Machines for Patient Response Prediction Using Pathway and Gene Expression Data." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11072.

Full text

Abstract:

Over the past decade, multiple function genomic datasets studying chromosomal aberrations and their downstream implications on gene expression have accumulated across a variety of cancer types. With the majority being paired copy number/gene expression profiles originating from the same patient groups, this time frame has also induced a wealth of integrative attempts in hope that the concurrent analysis between both genomic structures will result in optimized downstream results. Borrowing the concept, this dissertation presents a novel contribution to the development of statistical methodology for integrating copy number and gene expression data for purposes of predicting treatment response in multiple myeloma patients.

APA, Harvard, Vancouver, ISO, and other styles

23

Premanode, Bhusana. "Prediction of nonlinear nonstationary time series data using a digital filter and support vector regression." Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/23954.

Full text

Abstract:

Volatility is a key parameter when measuring the size of the errors made in modelling returns and other nonlinear nonstationary time series data. The Autoregressive Integrated Moving- Average (ARIMA) model is a linear process in time series; whilst in the nonlinear system, the Generalised Autoregressive Conditional Heteroskedasticity (GARCH) and Markov Switching GARCH (MS-GARCH) models have been widely applied. In statistical learning theory, Support Vector Regression (SVR) plays an important role in predicting nonlinear and nonstationary time series data. We propose a new class model comprised of a combination of a novel derivative Empirical Mode Decomposition (EMD), averaging intrinsic mode function (aIMF) and a novel of multiclass SVR using mean reversion and coefficient of variance (CV) to predict financial data i.e. EUR-USD exchange rates. The proposed novel aIMF is capable of smoothing and reducing noise, whereas the novel of multiclass SVR model can predict exchange rates. Our simulation results show that our model significantly outperforms simulations by state-of-art ARIMA, GARCH, Markov Switching generalised Autoregressive conditional Heteroskedasticity (MS-GARCH), Markov Switching Regression (MSR) models and Markov chain Monte Carlo (MCMC) regression.

APA, Harvard, Vancouver, ISO, and other styles

24

Dalvi, Aditi. "Performance of One-class Support Vector Machine (SVM) in Detection of Anomalies in the Bridge Data." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin150478019017791.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Persson, Karl. "Predicting movie ratings : A comparative study on random forests and support vector machines." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11119.

Full text

Abstract:

The aim of this work is to evaluate the prediction performance of random forests in comparison to support vector machines, for predicting the numerical user ratings of a movie using pre-release attributes such as its cast, directors, budget and movie genres. In order to answer this question an experiment was conducted on predicting the overall user rating of 3376 hollywood movies, using data from the well established movie database IMDb. The prediction performance of the two algorithms was assessed and compared over three commonly used performance and error metrics, as well as evaluated by the means of significance testing in order to further investigate whether or not any significant differences could be identified. The results indicate some differences between the two algorithms, with consistently better performance from random forests in comparison to support vector machines over all of the performance metrics, as well as significantly better results for two out of three metrics. Although a slight difference has been indicated by the results one should also note that both algorithms show great similarities in terms of their prediction performance, making it hard to draw any general conclusions on which algorithm yield the most accurate movie predictions.

APA, Harvard, Vancouver, ISO, and other styles

26

Höglind, Sanna, and Emelie Sundström. "Klassificering av transkriberade telefonsamtal med Support Vector Machines för ökad effektivitet inom vården." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-262043.

Full text

Abstract:

Patientnämndens förvaltning i Stockholm tar årligen emot tusentals samtal som önskar framföra klagomål på vården i Region Stockholm. Syftet med arbetet är att undersöka hur en NLP-robot för klassificering av inkomna klagomål skulle kunna bidra till en ökad effektivitet av verksamheten. Klassificeringen av klagomålen har utförts med hjälp av en metod baserad på Support Vector Machines. För att optimera modellens korrekthet undersöktes hur längden av ordvektorerna påverkar korrektheten. Modellen gav en slutgiltig korrekthet 53,10 %. Detta resultat analyserades sedan med målsättningen att identifiera potentiella förbättringsmöjligheter hos modellen. För framtida arbeten kan det därför vara intressant att undersöka hur antalet samtal, antalet personer som spelar in samtal och klassfördelningen i datamängden påverkar korrektheten. För att undersöka hur effektiviteten hos Patientnämndens förvaltning i Stockholm skulle påverkas av implementeringen av en NLP-robot användes en SWOT-analys. Denna analys visade på tydliga fördelar med automatisering av klagomålshanteringen, men att en sådan implementation måste ske med försiktighet där det säkerställs att tillgången på kompetens är tillräcklig för att förebygga potentiella hot.<br>Every year Patientnämnden recieves thousands of phone calls from patients wishing to make complaints about the health care in Stockholm. The aim of this work is to investigate how an NLP-robot for classification of recieved phone calls would contribute to an increased efficiency of the operation. The classification of the complaints has been made using a method based on Support Vector Machines. In order to optimize the accuracy of the model the impact of the length of the word vector has been investigated. The final result was an accuracy of 53.10%. The result was analyzed with the goal to identify potential opportunities of improvement of the model. For future work it could be interesting to investigate in how the number of calls, the number of people recording the calls and the distribution between the classes affect the accuracy A SWOT-analysis was performed in order to investigate in how the efficiency of Patientnämnden would be affected by the implementation of an NLP-robot. The analysis showed apparent benefits of automation of complaint management, but also that such an implementation must be done with great caution in order to be able to ensure that the available competence is high enough to prevent potential threats.

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Ho-Jin. "Functional data analysis: classification and regression." Texas A&M University, 2004. http://hdl.handle.net/1969.1/2805.

Full text

Abstract:

Functional data refer to data which consist of observed functions or curves evaluated at a finite subset of some interval. In this dissertation, we discuss statistical analysis, especially classification and regression when data are available in function forms. Due to the nature of functional data, one considers function spaces in presenting such type of data, and each functional observation is viewed as a realization generated by a random mechanism in the spaces. The classification procedure in this dissertation is based on dimension reduction techniques of the spaces. One commonly used method is Functional Principal Component Analysis (Functional PCA) in which eigen decomposition of the covariance function is employed to find the highest variability along which the data have in the function space. The reduced space of functions spanned by a few eigenfunctions are thought of as a space where most of the features of the functional data are contained. We also propose a functional regression model for scalar responses. Infinite dimensionality of the spaces for a predictor causes many problems, and one such problem is that there are infinitely many solutions. The space of the parameter function is restricted to Sobolev-Hilbert spaces and the loss function, so called, e-insensitive loss function is utilized. As a robust technique of function estimation, we present a way to find a function that has at most e deviation from the observed values and at the same time is as smooth as possible.

APA, Harvard, Vancouver, ISO, and other styles

28

Tang, Yuchun. "Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/5.

Full text

Abstract:

With emergence of biomedical informatics, Web intelligence, and E-business, new challenges are coming for knowledge discovery and data mining modeling problems. In this dissertation work, a framework named Granular Support Vector Machines (GSVM) is proposed to systematically and formally combine statistical learning theory, granular computing theory and soft computing theory to address challenging predictive data modeling problems effectively and/or efficiently, with specific focus on binary classification problems. In general, GSVM works in 3 steps. Step 1 is granulation to build a sequence of information granules from the original dataset or from the original feature space. Step 2 is modeling Support Vector Machines (SVM) in some of these information granules when necessary. Finally, step 3 is aggregation to consolidate information in these granules at suitable abstract level. A good granulation method to find suitable granules is crucial for modeling a good GSVM. Under this framework, many different granulation algorithms including the GSVM-CMW (cumulative margin width) algorithm, the GSVM-AR (association rule mining) algorithm, a family of GSVM-RFE (recursive feature elimination) algorithms, the GSVM-DC (data cleaning) algorithm and the GSVM-RU (repetitive undersampling) algorithm are designed for binary classification problems with different characteristics. The empirical studies in biomedical domain and many other application domains demonstrate that the framework is promising. As a preliminary step, this dissertation work will be extended in the future to build a Granular Computing based Predictive Data Modeling framework (GrC-PDM) with which we can create hybrid adaptive intelligent data mining systems for high quality prediction.

APA, Harvard, Vancouver, ISO, and other styles

29

Zhang, Hang. "Distributed Support Vector Machine With Graphics Processing Units." ScholarWorks@UNO, 2009. http://scholarworks.uno.edu/td/991.

Full text

Abstract:

Training a Support Vector Machine (SVM) requires the solution of a very large quadratic programming (QP) optimization problem. Sequential Minimal Optimization (SMO) is a decomposition-based algorithm which breaks this large QP problem into a series of smallest possible QP problems. However, it still costs O(n2) computation time. In our SVM implementation, we can do training with huge data sets in a distributed manner (by breaking the dataset into chunks, then using Message Passing Interface (MPI) to distribute each chunk to a different machine and processing SVM training within each chunk). In addition, we moved the kernel calculation part in SVM classification to a graphics processing unit (GPU) which has zero scheduling overhead to create concurrent threads. In this thesis, we will take advantage of this GPU architecture to improve the classification performance of SVM.

APA, Harvard, Vancouver, ISO, and other styles

30

Delezoide, Bertrand. "Modèles d'indéxation multimédia pour la description automatique de films de cinéma." Paris 6, 2006. http://www.theses.fr/2006PA066108.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Liu, Jie. "Failure prognostics by support vector regression of time series data under stationary/nonstationary environmental and operational conditions." Thesis, Châtenay-Malabry, Ecole centrale de Paris, 2015. http://www.theses.fr/2015ECAP0019/document.

Full text

Abstract:

Ce travail de thèse est motivée par la possibilité de surveiller l'état des composants de systèmes d'énergie pour leur utilisation prolongée et sécuritaire, conformément à la pratique correcte de fonctionnement et des politiques adéquates de maintenance. La motivation est de développer des méthodes basées sur la régression à vecteurs de support pour la prédiction de données de séries chronologiques dans des conditions environnementales et opérationnelles stationnaire/ non-stationnaire. Les simples modèles et les ensembles de modèles à base de SVR sont développés pour attaquer la prédiction basée sur des petits et des grands ensembles de données. Des stratégies sont proposées pour la mise à jour de façon adaptative les simples modèles et les ensembles de modèles à base de SVR au cas du changement de la distribution générant les données. Les comparaisons avec d'autres méthodes d'apprentissage en ligne sont fournies en référence à des données de séries chronologiques d'un composant critique dans les centrales nucléaires fournis par Electricité de France (EDF). Les résultats montrent que les approches proposées permettent d'atteindre des résultats de prédiction comparables compte tenu de l'erreur quadratique moyenne et erreur relative, en beaucoup moins de temps de calcul. Par ailleurs, en analysant le sens géométrique de la méthode de la sélection de vecteurs caractéristiques(FVS) proposé dans la littérature, une nouvelle méthode géométriquement interprétable, nommé Reduced RankKernel Ridge Regression-II (RRKRR-II), est proposée pour décrire les relations linéaires entre un valeur prédite et les valeurs prédites des vecteurs caractéristiques sélectionné par FVS. Les comparaisons avec plusieurs méthodes sur un certain nombre de données publics prouvent la bonne précision de la prédiction et le réglage facile des hyperparamètres de RRKRR-II<br>This Ph. D. work is motivated by the possibility of monitoring the conditions of components of energy systems for their extended and safe use, under proper practice of operation and adequate policies of maintenance. The aim is to develop a Support Vector Regression (SVR)-based framework for predicting time series data under stationary/nonstationary environmental and operational conditions. Single SVR and SVR-based ensemble approaches are developed to tackle the prediction problem based on both small and large datasets. Strategies are proposed for adaptively updating the single SVR and SVR-based ensemble models in the existence of pattern drifts. Comparisons with other online learning approaches for kernel-based modelling are provided with reference to time series data from a critical component in Nuclear Power Plants (NPPs) provided by Electricité de France (EDF). The results show that the proposed approaches achieve comparable prediction results, considering the Mean Squared Error (MSE) and Mean Relative Error (MRE), in much less computation time. Furthermore, by analyzing the geometrical meaning of the Feature Vector Selection (FVS) method proposed in the literature, a novel geometrically interpretable kernel method, named Reduced Rank Kernel Ridge Regression-II (RRKRR-II), is proposed to describe the linear relations between a predicted value and the predicted values of the Feature Vectors (FVs) selected by FVS. Comparisons with several kernel methods on a number of public datasets prove the good prediction accuracy and the easy-of-tuning of the hyperparameters of RRKRR-II

APA, Harvard, Vancouver, ISO, and other styles

32

Henchiri, Yousri. "L'approche Support Vector Machines (SVM) pour le traitement des données fonctionnelles." Thesis, Montpellier 2, 2013. http://www.theses.fr/2013MON20187/document.

Full text

Abstract:

L'Analyse des Données Fonctionnelles est un domaine important et dynamique en statistique. Elle offre des outils efficaces et propose de nouveaux développements méthodologiques et théoriques en présence de données de type fonctionnel (fonctions, courbes, surfaces, ...). Le travail exposé dans cette thèse apporte une nouvelle contribution aux thèmes de l'apprentissage statistique et des quantiles conditionnels lorsque les données sont assimilables à des fonctions. Une attention particulière a été réservée à l'utilisation de la technique Support Vector Machines (SVM). Cette technique fait intervenir la notion d'Espace de Hilbert à Noyau Reproduisant. Dans ce cadre, l'objectif principal est d'étendre cette technique non-paramétrique d'estimation aux modèles conditionnels où les données sont fonctionnelles. Nous avons étudié les aspects théoriques et le comportement pratique de la technique présentée et adaptée sur les modèles de régression suivants. Le premier modèle est le modèle fonctionnel de quantiles de régression quand la variable réponse est réelle, les variables explicatives sont à valeurs dans un espace fonctionnel de dimension infinie et les observations sont i.i.d.. Le deuxième modèle est le modèle additif fonctionnel de quantiles de régression où la variable d'intérêt réelle dépend d'un vecteur de variables explicatives fonctionnelles. Le dernier modèle est le modèle fonctionnel de quantiles de régression quand les observations sont dépendantes. Nous avons obtenu des résultats sur la consistance et les vitesses de convergence des estimateurs dans ces modèles. Des simulations ont été effectuées afin d'évaluer la performance des procédures d'inférence. Des applications sur des jeux de données réelles ont été considérées. Le bon comportement de l'estimateur SVM est ainsi mis en évidence<br>Functional Data Analysis is an important and dynamic area of statistics. It offers effective new tools and proposes new methodological and theoretical developments in the presence of functional type data (functions, curves, surfaces, ...). The work outlined in this dissertation provides a new contribution to the themes of statistical learning and quantile regression when data can be considered as functions. Special attention is devoted to use the Support Vector Machines (SVM) technique, which involves the notion of a Reproducing Kernel Hilbert Space. In this context, the main goal is to extend this nonparametric estimation technique to conditional models that take into account functional data. We investigated the theoretical aspects and practical attitude of the proposed and adapted technique to the following regression models.The first model is the conditional quantile functional model when the covariate takes its values in a bounded subspace of the functional space of infinite dimension, the response variable takes its values in a compact of the real line, and the observations are i.i.d.. The second model is the functional additive quantile regression model where the response variable depends on a vector of functional covariates. The last model is the conditional quantile functional model in the dependent functional data case. We obtained the weak consistency and a convergence rate of these estimators. Simulation studies are performed to evaluate the performance of the inference procedures. Applications to chemometrics, environmental and climatic data analysis are considered. The good behavior of the SVM estimator is thus highlighted

APA, Harvard, Vancouver, ISO, and other styles

33

Jiang, Fuhua. "SVM-Based Negative Data Mining to Binary Classification." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/8.

Full text

Abstract:

The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner's result if audit acknowledges learner's result or learner agrees with audit's judgment, otherwise returns the booster's result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm.

APA, Harvard, Vancouver, ISO, and other styles

34

Gombos, Andrew David. "DETECTION OF ROOF BOUNDARIES USING LIDAR DATA AND AERIAL PHOTOGRAPHY." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/75.

Full text

Abstract:

The recent growth in inexpensive laser scanning sensors has created entire fields of research aimed at processing this data. One application is determining the polygonal boundaries of roofs, as seen from an overhead view. The resulting building outlines have many commercial as well as military applications. My work in this area has created a segmentation algorithm where the descriptive features are computationally and theoretically simpler than previous methods. A support vector machine is used to segment data points using these features, and their use is not common for roof detection to date. Despite the simplicity of the feature calculations, the accuracy of our algorithm is similar to previous work. I also describe a basic polygonal extraction method, which is acceptable for basic roofs.

APA, Harvard, Vancouver, ISO, and other styles

35

Hechter, Trudie. "A comparison of support vector machines and traditional techniques for statistical regression and classification." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/49810.

Full text

Abstract:

Thesis (MComm)--Stellenbosch University, 2004.<br>ENGLISH ABSTRACT: Since its introduction in Boser et al. (1992), the support vector machine has become a popular tool in a variety of machine learning applications. More recently, the support vector machine has also been receiving increasing attention in the statistical community as a tool for classification and regression. In this thesis support vector machines are compared to more traditional techniques for statistical classification and regression. The techniques are applied to data from a life assurance environment for a binary classification problem and a regression problem. In the classification case the problem is the prediction of policy lapses using a variety of input variables, while in the regression case the goal is to estimate the income of clients from these variables. The performance of the support vector machine is compared to that of discriminant analysis and classification trees in the case of classification, and to that of multiple linear regression and regression trees in regression, and it is found that support vector machines generally perform well compared to the traditional techniques.<br>AFRIKAANSE OPSOMMING: Sedert die bekendstelling van die ondersteuningspuntalgoritme in Boser et al. (1992), het dit 'n populêre tegniek in 'n verskeidenheid masjienleerteorie applikasies geword. Meer onlangs het die ondersteuningspuntalgoritme ook meer aandag in die statistiese gemeenskap begin geniet as 'n tegniek vir klassifikasie en regressie. In hierdie tesis word ondersteuningspuntalgoritmes vergelyk met meer tradisionele tegnieke vir statistiese klassifikasie en regressie. Die tegnieke word toegepas op data uit 'n lewensversekeringomgewing vir 'n binêre klassifikasie probleem sowel as 'n regressie probleem. In die klassifikasiegeval is die probleem die voorspelling van polisvervallings deur 'n verskeidenheid invoer veranderlikes te gebruik, terwyl in die regressiegeval gepoog word om die inkomste van kliënte met behulp van hierdie veranderlikes te voorspel. Die resultate van die ondersteuningspuntalgoritme word met dié van diskriminant analise en klassifikasiebome vergelyk in die klassifikasiegeval, en met veelvoudige linêere regressie en regressiebome in die regressiegeval. Die gevolgtrekking is dat ondersteuningspuntalgoritmes oor die algemeen goed vaar in vergelyking met die tradisionele tegnieke.

APA, Harvard, Vancouver, ISO, and other styles

36

Cai, Zipan. "Multitemporal Satellite Data for Monitoring Urbanization in Nanjing from 2001 to 2016." Thesis, KTH, Geoinformatik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214036.

Full text

Abstract:

Along with the increasing rate of urbanization takes place in the world, the population keeps shifting from rural to urban areas. China, as the country of the largest population, has the highest urban population growth in Asia, as well as the world. However, the urbanization in China, in turn, is leading to a lot of social issues which reshape the living environment and cultural fabric. A variety of these kinds of social issues emphasize the challenges regarding a healthy and sustainable urban growth particularly in the reasonable planning of urban land use and land cover features. Therefore, it is significant to establish a set of comprehensive urban sustainable development strategies to avoid detours in the urbanization process. Nowadays, faced with such as a series of the social phenomenon, the spatial and temporal technological means including Remote Sensing and Geographic Information System (GIS) can be used to help the city decision maker to make the right choices. The knowledge of land use and land cover changes in the rural and urban area assists in identifying urban growth rate and trend in both qualitative and quantitatively ways, which provides more basis for planning and designing a city in a more scientific and environmentally friendly way. This paper focuses on the urban sprawl analysis in Nanjing, Jiangsu, China that being analyzed by urban growth pattern monitoring during a study period. From 2001 to 2016, Nanjing Municipality has experienced a substantial increase in the urban area because of the growing population. In this paper, one optimal supervised classification with high accuracy which is Support Vector Machine (SVM) classifier was used to extract thematic features from multitemporal satellite data including Landsat 7 ETM+, Landsat 8, and Sentinel-2A MSI. It was interpreted to identify the existence of urban sprawl pattern based on the land use and land cover features in 2001, 2006, 2011, and 2016. Two different types of change detection analysis including post-classification comparison and change vector analysis (CVA) were performed to explore the detailed extent information of urban growth within the study region. A comparison study on these two change detection analysis methods was carried out by accuracy assessment. Based on the exploration of the change detection analysis combined with the current urban development actuality, some constructive recommendations and future research directions were given at last. By implementing the proposed methods, the urban land use and land cover changes were successfully captured. The results show there is a notable change in the urban or built-up land feature. Also, the urban area is increased by 610.98 km2 while the agricultural land area is decreased by 766.96 km2, which proved a land conversion among these land cover features in the study period. The urban area keeps growing in each particular study period while the growth rate value has a decreasing trend in the period of 2001 to 2016. Besides, both change detection techniques obtained the similar result of the distribution of urban expansion in the study area. According to the result images from two change detection methods, the expanded urban or built-up land in Nanjing distributes mainly in the surrounding area of the central city area, both side of Yangtze River, and Southwest area. The results of change detection accuracy assessment indicated the post-classification comparison has a higher overall accuracy 86.11% and a higher Kappa Coefficient 0.72 than CVA. The overall accuracy and Kappa Coefficient for CVA is 75.43% and 0.51 respectively. These results proved the strength of agreement between predicted and truth data is at ‘good’ level for post-classification comparison and ‘moderate’ for CVA. Also, the results further confirmed the expectation from previous studies that the empirical threshold determination of CVA always leads to relatively poor change detection accuracy. In general, the two change detection techniques are found to be effective and efficient in monitoring surface changes in the different class of land cover features within the study period. Nevertheless, they have their advantages and disadvantages on processing change detection analysis particularly for the topic of urban expansion.

APA, Harvard, Vancouver, ISO, and other styles

37

Lemon, Viktor. "Can a Support Vector Machine identify poor performance of dyslectic children playing a serious game?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294966.

Full text

Abstract:

This paper has been a part of developing the serious game Kunna, a web-based game with exercises targeting children diagnosed with dyslexia. This game currently consists of five different exercises aiming to practice reading and writing without a therapist or neuropsychologist present. As Kunna can be used anywhere, tools are needed to understand each individual's capacities and difficulties. Hence, this paper aims to present how a serious game and a support vector machine were used to identify children that performed poorly in Kunna’s exercises. Though, due to the current corona pandemic, Kunna could only be tested on children not diagnosed with dyslexia. Therefore, this paper should be seen as a proof of concept. As an initial step, several variables were identified to measure the performance of dyslectic children. Secondly, the variables were implemented into Kunna and tested on 16 Spanish-speaking children. The results were analyzed to identify how poor performance could be recognized using the identified variables. As a final step, the data was divided into two groups for each exercise, of which one group contained participants who appear to perform poorly. These were participants with clearly outlying values in the number of errors and duration. Thus, to train and evaluate if a Support Vector Machine (SVM) can separate the two groups and thereby identify the participants who performed poorly. From the discussion followed that the SVM is not the most efficient choice for this aim. Instead, it is suggested that future work should consider multiclassification algorithms.<br>Den här uppsatsen har varit en del i utvecklingen av det seriösa spelet Kunna, ett webbaserat spel för barn diagnostiserade med dyslexi. Spelet består av fem olika övningar som syftar till att öva och utveckla barnens läs- och skrivförmåga. Då Kunna kan användas var som helst behövs verktyg för att förstå varje individs kapaciteter och svårigheter. Därför syftar den här uppsatsen till att presentera hur ett seriöst spel och stödvektormaskiner (eng. support vector machine) kan användas för att identifiera de användare som inte uppnådde prestationskraven. På grund av den uppblossande coronapandemin kunde dock Kunna enbart testas på barn som inte var diagnostiserade med dyslexi och därför bör den här uppsatsen ses som en pilotstudie. Inledningsvis identifierades flera variabler för att mäta prestandan hos barn med dyslexi. Därefter implementerades variablerna i Kunna och testades på 16 spansktalande barn där resultaten analyserades i syfte att identifiera samband kopplade till svaga prestationer. Slutligen delades deltagarnas data upp i två grupper, varav en grupp innehöll deltagare med klart högre värden i tid och antal fel. Uppdelningen gjordes för att träna och utvärdera om en stödvektormaskin kan separera de två grupperna och därav identifiera de deltagare som inte uppnådde prestationskraven. De slutliga resultaten indikerar dock att en stödvektormaskin inte är det effektivaste valet för detta ändamål. Istället föreslås att framtida arbeten bör överväga multiklassificeringsalgoritmer.

APA, Harvard, Vancouver, ISO, and other styles

38

Dinerstein, Jared. "Learning-Based Fusion for Data Deduplication: A Robust and Automated Solution." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/787.

Full text

Abstract:

This thesis presents two deduplication techniques that overcome the following critical and long-standing weaknesses of rule-based deduplication: (1) traditional rule-based deduplication requires significant manual tuning of the individual rules, including the selection of appropriate thresholds; (2) the accuracy of rule-based deduplication degrades when there are missing data values, significantly reducing the efficacy of the expert-defined deduplication rules. The first technique is a novel rule-level match-score fusion algorithm that employs kernel-machine-based learning to discover the decision threshold for the overall system automatically. The second is a novel clue-level match-score fusion algorithm that addresses both Problem 1 and 2. This unique solution provides robustness against missing/incomplete record data via the selection of a best-fit support vector machine. Empirical evidence shows that the combination of these two novel solutions eliminates two critical long-standing problems in deduplication, providing accurate and robust results in a critical area of rule-based deduplication.

APA, Harvard, Vancouver, ISO, and other styles

39

Sakouvogui, Kekoura. "Comparative Classification of Prostate Cancer Data using the Support Vector Machine, Random Forest, Dualks and k-Nearest Neighbours." Thesis, North Dakota State University, 2015. https://hdl.handle.net/10365/27698.

Full text

Abstract:

This paper compares four classifications tools, Support Vector Machine (SVM), Random Forest (RF), DualKS and the k-Nearest Neighbors (kNN) that are based on different statistical learning theories. The dataset used is a microarray gene expression of 596 male patients with prostate cancer. After treatment, the patients were classified into one group of phenotype with three levels: PSA (Prostate-Specific Antigen), Systematic and NED (No Evidence of Disease). The purpose of this research is to determine the performance rate of each classifier by selecting the optimal kernels and parameters that give the best prediction rate of the phenotype. The paper begins with the discussion of previous implementations of the tools and their mathematical theories. The results showed that three classifiers achieved a comparable performance that was above the average while DualKS did not. We also observed that SVM outperformed the kNN, RF and DualKS classifiers.

APA, Harvard, Vancouver, ISO, and other styles

40

Buizza, Giulia. "Classifying patients' response to tumour treatment from PET/CT data: a machine learning approach." Thesis, KTH, Skolan för teknik och hälsa (STH), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-200916.

Full text

Abstract:

Early assessment of tumour response has lately acquired big interest in the medical field, given the possibility to modify treatments during their delivery. Radiomics aims to quantitatively describe images in radiology by automatically extracting a large number of image features. In this context, PET/CT (Positron Emission Tomography/Computed Tomography) images are of great interest since they encode functional and anatomical information, respectively. In order to assess the patients' responses from many image features appropriate methods should be applied. Machine learning offers different procedures that can deal with this, possibly high dimensional, problem. The main objective of this work was to develop a method to classify lung cancer patients as responding or not to chemoradiation treatment, relying on repeated PET/CT images. Patients were divided in two groups, based on the type of chemoradiation treatment they underwent (sequential or concurrent radiation therapy with respect to chemotherapy), but image features were extracted using the same procedure. Support vector machines performed classification using features from the Radiomics field, mostly describing tumour texture, or from handcrafted features, which described image intensity changes as a function of tumour depth. Classification performance was described by the area under the curve (AUC) of ROC (Receiving Operator Characteristic) curves after leave-one-out cross-validation. For sequential patients, 0.98 was the best AUC obtained, while for concurrent patients 0.93 was the best one. Handcrafted features were comparable to those from Radiomics and from previous studies, as for classification results. Also, features from PET alone and CT alone were found to be suitable for the task, entailing a performance better than random.

APA, Harvard, Vancouver, ISO, and other styles

41

Zhou, Bin. "Computational Analysis of LC-MS/MS Data for Metabolite Identification." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/36109.

Full text

Abstract:

Metabolomics aims at the detection and quantitation of metabolites within a biological system. As the most direct representation of phenotypic changes, metabolomics is an important component in system biology research. Recent development on high-resolution, high-accuracy mass spectrometers enables the simultaneous study of hundreds or even thousands of metabolites in one experiment. Liquid chromatography-mass spectrometry (LC-MS) is a commonly used instrument for metabolomic studies due to its high sensitivity and broad coverage of metabolome. However, the identification of metabolites remains a bottle-neck for current metabolomic studies. This thesis focuses on utilizing computational approaches to improve the accuracy and efficiency for metabolite identification in LC-MS/MS-based metabolomic studies. First, an outlier screening approach is developed to identify those LC-MS runs with low analytical quality, so they will not adversely affect the identification of metabolites. The approach is computationally simple but effective, and does not depend on any preprocessing approach. Second, an integrated computational framework is proposed and implemented to improve the accuracy of metabolite identification and prioritize the multiple putative identifications of one peak in LC-MS data. Through the framework, peaks are likely to have the m/z values that can give appropriate putative identifications. And important guidance for the metabolite verification is provided by prioritizing the putative identifications. Third, an MS/MS spectral matching algorithm is proposed based on support vector machine classification. The approach provides an improved retrieval performance in spectral matching, especially in the presence of data heterogeneity due to different instruments or experimental settings used during the MS/MS spectra acquisition.<br>Master of Science

APA, Harvard, Vancouver, ISO, and other styles

42

Tiensuu, Jacob, Maja Linderholm, Sofia Dreborg, and Fredrik Örn. "Detecting exoplanets with machine learning : A comparative study between convolutional neural networks and support vector machines." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385690.

Full text

Abstract:

In this project two machine learning methods, Support Vector Machine, SVM, and Convolutional Neural Network, CNN, are studied to determine which method performs best on a labeled data set containing time series of light intensity from extrasolar stars. The main difficulty is that in the data set there are a lot more non exoplanet stars than there are stars with orbiting exoplanets. This is causing a so called imbalanced data set which in this case is improved by i.e. mirroring the curves of stars with an orbiting exoplanet and adding them to the set. Trying to improve the results further, some preprocessing is done before implementing the methods on the data set. For the SVM, feature extraction and fourier transform of the time-series are important measures but further preprocessing alternatives are investigated. For the CNN-method the time-series are both detrended and smoothed, giving two inputs for the same light curve. All code is implemented in python. Of all the validation parameters recall is considered the main priority since it is more important to find all exoplanets than finding all non exoplanets. CNN turned out to be the best performing method for the chosen configurations with 1.000 in recall which exceeds SVM’s recall 0.800. Considering the second validation parameter precision CNN is also the best performing method with a precision of 0.769 over SVM's 0.571.

APA, Harvard, Vancouver, ISO, and other styles

43

Darnald, Johan. "Predicting Attrition in Financial Data with Machine Learning Algorithms." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-225852.

Full text

Abstract:

For most businesses there are costs involved when acquiring new customers and having longer relationships with customers is therefore often more profitable. Predicting if an individual is prone to leave the business is then a useful tool to help any company take actions to mitigate this cost. The event when a person ends their relationship with a business is called attrition or churn. Predicting peoples actions is however hard and many different factors can affect their choices. This paper investigates different machine learning methods for predicting attrition in the customer base of a bank. Four different methods are chosen based on the results they have shown in previous research and these are then tested and compared to find which works best for predicting these events. Four different datasets from two different products and with two different applications are created from real world data from a European bank. All methods are trained and tested on each dataset. The results of the tests are then evaluated and compared to find what works best. The methods found in previous research to most reliably achieve good results in predicting churn in banking customers are the Support Vector Machine, Neural Network, Balanced Random Forest, and the Weighted Random Forest. The results show that the Balanced Random Forest achieves the best results with an average AUC of 0.698 and an average F-score of 0.376. The accuracy and precision of the model are concluded to not be enough to make definite decisions but can be used with other factors such as profitability estimations to improve the effectiveness of any actions taken to prevent the negative effects of churn.<br>För de flesta företag finns det en kostnad involverad i att skaffa nya kunder. Längre relationer med kunder är därför ofta mer lönsamma. Att kunna förutsäga om en kund är nära att lämna företaget är därför ett användbart verktyg för att kunna utföra åtgärder för att minska denna kostnad. Händelsen när en kund avslutar sin relation med ett företag kallas här efter kundförlust. Att förutsäga människors handlingar är däremot svårt och många olika faktorer kan påverka deras val. Denna avhandling undersöker olika maskininlärningsmetoder för att förutsäga kundförluster hos en bank. Fyra metoder väljs baserat på tidigare forskning och dessa testas och jämförs sedan för att hitta vilken som fungerar bäst för att förutsäga dessa händelser. Fyra dataset från två olika produkter och med två olika användningsområden skapas från verklig data ifrån en Europeisk bank. Alla metoder tränas och testas på varje dataset. Resultaten från dessa test utvärderas och jämförs sedan för att få reda på vilken metod som fungerar bäst. Metoderna som enligt tidigare forskning ger de mest pålitliga och bästa resultaten för att förutsäga kundförluster hos banker är stödvektormaskin, neurala nätverk, balanserad slumpmässig skog och vägd slumpmässig skog. Resultatet av testerna visar att en balanserad slumpmässig skog får bäst resultat med en genomsnittlig AUC på 0.698 och ett F-värde på 0.376. Träffsäkerheten och det positiva prediktiva värdet på metoden är inte tillräckligt för att ta definitiva handlingar med men kan användas med andra faktorer så som lönsamhetsuträkningar för att förbättra effektiviteten av handlingar som tas för att minska de negativa effekterna av kundförluster.

APA, Harvard, Vancouver, ISO, and other styles

44

Lim, Hojung Goel Amrit L. "Support vector parameter selection using experimental design based generating set search (SVEG) with application to predictive software data modeling." Related electronic resource: Current Research at SU : database of SU dissertations, recent titles available full text, 2004. http://wwwlib.umi.com/cr/syr/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Song, Xiaohui. "FPGA Implementation of a Support Vector Machine based Classification System and its Potential Application in Smart Grid." University of Toledo / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1376579033.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Yu, Hsin-Min, and 余欣珉. "Applying Support Vector Data Description For Data Classification." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/61401033111710951818.

Full text

Abstract:

碩士<br>朝陽科技大學<br>工業工程與管理系碩士班<br>101<br>Support Vector Data Description (SVDD) was developed by Tax and Duin in 1999. The objective of SVDD is to obtain a shaped decision boundary with minimum volume around a dataset. SVDD was firstly developed to detecting outliers. In this study, the SVDD will be adopted as a classification tool. The SVDD is unlimited to the data assumption. Moreover, the decision boundary is formed by Support Vectors (SVs) which are obtained from solving convex quadratic programming problem. This study aims at evaluating the impacts of preprocessing methods on the SVDD classification efficiency. The evaluated preprocessing methods are the widely used dimension reduction techniques, including Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Three real cases will be implemented. Among which, both causes of gender prediction and mobile phone process are the continuous typed datasets. The other case related to nosocomial infection detection, that is the case from Taichung General Veteran hospital and it is a discrete typed dataset. From Kappa analysis, results demonstrated that SVDD without using preprocessing methods can pose higher classification consistence and lower misclassification rates.

APA, Harvard, Vancouver, ISO, and other styles

47

Yeh, Huang-Chih, and 葉皇志. "Appling Support Vector Data Description to Construct Colon Abnormalities Predictive Models." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/55038721642071368219.

Full text

Abstract:

碩士<br>朝陽科技大學<br>工業工程與管理系碩士班<br>102<br>In recent years, people’s diet habit has been rapidly changing in society. This is one of the main causes of anomaly colon and colon abnormalities are obviously increased in the society. Therefore, detecting colon abnormalities in the early stage is a very important issue. In this study, the health examination data and the examination data of colonoscopy are used to develop the predicting model of colon abnormalities using the historic data of some related literature, such as Chen (2012), Yu (2008), and Lin (2011). This study applied the support vector data description (SVDD) and experiment factorial Design method to develop the predicting model. The factors of the experiment include the different sets of critical items of the health examination, the number of folds of the cross validation, and the different definition of colon abnormalities. The performance of the predicting model is also studied. The results of this research include: (1) all the critical items obtained from the literature are the critical items; (2) the number of folds of the cross validation is not significant for the performance of the predicting model; (3) the best definition of the target (normal) dataset is assigned from the exactly normal class and the outlier (abnormal) dataset is assigned from the classes of tubular villous adenoma and serrated adenoma.

APA, Harvard, Vancouver, ISO, and other styles

48

Chien, Ming-Chia, and 簡茗家. "Applying Support Vector Data Description Method to Construct Multivariate Control Charts." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/61145166810700096480.

Full text

Abstract:

碩士<br>元智大學<br>工業工程與管理學系<br>105<br>In recent years, the applications of one-class classification method have received a great deal of attention. One application is the kernel distance-based control chart (K chart) using the theory of support vector data description (SVDD). The K chart is non-parametric in that it does not require any distribution assumptions. It has been shown to perform better than the conventional charts when the distribution of the quality characteristic is not multivariate normal. The main purpose of this research is to study the application of SVDD method to statistical process control. We address some important design issues of control charts, including determination of control limits and optimization of model parameters. Simulation studies were conducted to illustrate the effectiveness of the proposed approaches. The performances of control charts were evaluated using average run length (ARL). Various multivariate probability distributions were considered, including multivariate normal, multivariate t distribution, and multivariate gamma distribution. The results from the comparative study indicate that K chart performs better than T2 chart especially in non-normal distribution cases

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Yen-Chun, and 陳衍均. "Light-Emitting Diode Defect Detection based on Support Vector Data Description." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/34373705695385110602.

Full text

Abstract:

碩士<br>中原大學<br>機械工程研究所<br>100<br>Abstract The Light Emitting Diode (LED) is showing rapid progress in LED’s production line when environmental protection consciousness gains ground .It is an important goal which enhancing the production yield rate of LED’s products for Raising more profit .Therefore, using automatic defects inspection system for LED can reduce human mistake and inspection time, it also can find the problem of machine to avoid LED’s defects. In this research explores the detection of light-emitted area, P-electrode and N-electrode, this system would be inspecting the defect with three mechanisms: Vision Pre-processing, Feature extraction, Training Procedure. Vision Pre-processing is made an adjustment in original defect images, and then to decompose the image to several sub-images. Feature extraction is construction of Discrete Cosine Transform, Texture, Image Power, and Statistics, according to these features, Support Vector Data Description eigenvector is trained with these features. Using to the Support Vector Data Description and the binary image classification to class with LED images.

APA, Harvard, Vancouver, ISO, and other styles

50

Wang, Chii-Kae, and 王啟凱. "A Study of Improving Feature Classification Results for Support Vector Data Description." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/52649373293233544860.

Full text

Abstract:

博士<br>中原大學<br>機械工程研究所<br>97<br>Most pattern recognition tasks deal with classification and regression problems. But the data domain description problem is as important as them. In domain description, the task is not to part with overlapped or mixed objects, but to judge them into the same group or class. It means if we can find the boundary around the target data closely, we can get better accuracy of image judgment. Support Vector Data Description (SVDD) is inspired by the Support Vector Machine (SVM), and can provide an effect accuracy of data domain description. But the accuracy is blundered by the amount of samples. In this dissertation, we utilize max-min range method and mean-standard method to generate outlier objects around the target data artificially. By -fold times cross validation method, we can get the best ( , ) combination. At last, we use the UCI Machine Learning Dataset Repository to validate the effect of two methods.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!