Log in

Relevant bibliographies by topics / Support Vector Machine Regression / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Support Vector Machine Regression.

Dissertations / Theses on the topic 'Support Vector Machine Regression'

Author: Grafiati

Published: 4 June 2021

Last updated: 12 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Support Vector Machine Regression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lee, Keun Joo. "Geometric Tolerancing of Cylindricity Utilizing Support Vector Regression." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_theses/233.

Full text

Abstract:

In the age where quick turn around time and high speed manufacturing methods are becoming more important, quality assurance is a consistent bottleneck in production. With the development of cheap and fast computer hardware, it has become viable to use machine vision for the collection of data points from a machined part. The generation of these large sample points have necessitated a need for a comprehensive algorithm that will be able to provide accurate results while being computationally efficient. Current established methods are least-squares (LSQ) and non-linear programming (NLP). The LSQ method is often deemed too inaccurate and is prone to providing bad results, while the NLP method is computationally taxing. A novel method of using support vector regression (SVR) to solve the NP-hard problem of cylindricity of machined parts is proposed. This method was evaluated against LSQ and NLP in both accuracy and CPU processing time. An open-source, user-modifiable programming package was developed to test the model. Analysis of test results show the novel SVR algorithm to be a viable alternative in exploring different methods of cylindricity in real-world manufacturing.

APA, Harvard, Vancouver, ISO, and other styles

2

Wu, Zhili. "Regularization methods for support vector machines." HKBU Institutional Repository, 2008. http://repository.hkbu.edu.hk/etd_ra/912.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Shah, Rohan Shiloh. "Support vector machines for classification and regression." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100247.

Full text

Abstract:

In the last decade Support Vector Machines (SVMs) have emerged as an important learning technique for solving classification and regression problems in various fields, most notably in computational biology, finance and text categorization. This is due in part to built-in mechanisms to ensure good generalization which leads to accurate prediction, the use of kernel functions to model non-linear distributions, the ability to train relatively quickly on large data sets using novel mathematical optimization techniques and most significantly the possibility of theoretical analysis using computational learning theory. In this thesis, we discuss the theoretical basis and computational approaches to Support Vector Machines.

APA, Harvard, Vancouver, ISO, and other styles

4

OLIVEIRA, A. B. "Modelo de Predição para análise comparativa de Técnicas Neuro-Fuzzy e de Regressão." Universidade Federal do Espírito Santo, 2010. http://repositorio.ufes.br/handle/10/4218.

Full text

Abstract:

Made available in DSpace on 2016-08-29T15:33:12Z (GMT). No. of bitstreams: 1 tese_3521_.pdf: 2782962 bytes, checksum: d4b2294e5ee9ab86b7a35aec083af692 (MD5) Previous issue date: 2010-02-12<br>Os Modelos de Predição implementados pelos algoritmos de Aprendizagem de Máquina advindos como linha de pesquisa da Inteligência Computacional são resultantes de pesquisas e investigações empíricas em dados do mundo real. Neste contexto; estes modelos são extraídos para comparação de duas grandes técnicas de aprendizagem de máquina Redes Neuro-Fuzzy e de Regressão aplicadas no intuito de estimar um parâmetro de qualidade do produto em um ambiente industrial sob processo contínuo. Heuristicamente; esses Modelos de Predição são aplicados e comparados em um mesmo ambiente de simulação com intuito de mensurar os níveis de adequação dos mesmos, o poder de desempenho e generalização dos dados empíricos que compõem este cenário (ambiente industrial de mineração).

APA, Harvard, Vancouver, ISO, and other styles

5

Wågberg, Max. "Att förutspå Sveriges bistånd : En jämförelse mellan Support Vector Regression och ARIMA." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-36479.

Full text

Abstract:

In recent years, the use of machine learning has increased significantly. Its uses range from making the everyday life easier with voice-guided smart devices to image recognition, or predicting the stock market. Predicting economic values has long been possible by using methods other than machine learning, such as statistical algorithms. These algorithms and machine learning models use time series, which is a set of data points observed constantly over a given time interval, in order to predict data points beyond the original time series. But which of these methods gives the best results? The overall purpose of this project is to predict Sweden’s aid curve using the machine learning model Support Vector Regression and the classic statistical algorithm autoregressive integrated moving average which is abbreviated ARIMA. The time series used in the prediction are annual summaries of Sweden’s total aid to the world from openaid.se since 1998 and up to 2019. SVR and ARIMA are implemented in python with the help of the Scikit- and Statsmodels libraries. The results from SVR and ARIMA are measured in comparison with the original value and their predicted values, while the accuracy is measured in Root Square Mean Error and presented in the results chapter. The result shows that SVR with the RBF-kernel is the algorithm that provides the best results for the data series. All predictions beyond the times series are then visually presented on a openaid prototype page using D3.js<br>Under det senaste åren har användningen av maskininlärning ökat markant. Dess användningsområden varierar mellan allt från att göra vardagen lättare med röststyrda smarta enheter till bildigenkänning eller att förutspå börsvärden. Att förutspå ekonomiska värden har länge varit möjligt med hjälp av andra metoder än maskininlärning, såsom exempel statistiska algoritmer. Dessa algoritmer och maskininlärningsmodeller använder tidsserier, vilket är en samling datapunkter observerade konstant över en given tidsintervall, för att kunna förutspå datapunkter bortom den originella tidsserien. Men vilken av dessa metoder ger bäst resultat? Projektets övergripande syfte är att förutse sveriges biståndskurva med hjälp av maskininlärningsmodellen Support Vector Regression och den klassiska statistiska algoritmen autoregressive integrated moving average som förkortas ARIMA. Tidsserien som används vid förutsägelsen är årliga summeringar av biståndet från openaid.se sedan år 1998 och fram till 2019. SVR och ARIMA implementeras i python med hjälp av Scikit-learn och Statsmodelsbiblioteken. Resultatet från SVR och ARIMA mäts i jämförelse mellan det originala värdet och deras förutspådda värden medan noggrannheten mäts i root square mean error och presenteras under resultatkapitlet. Resultatet visar att SVR med RBF kärnan är den algoritm som ger det bästa testresultatet för dataserien. Alla förutsägelser bortom tidsserien presenteras därefter visuellt på en openaid prototypsida med hjälp av D3.js.

APA, Harvard, Vancouver, ISO, and other styles

6

Uslan, Volkan. "Support vector machine-based fuzzy systems for quantitative prediction of peptide binding affinity." Thesis, De Montfort University, 2015. http://hdl.handle.net/2086/11170.

Full text

Abstract:

Reliable prediction of binding affinity of peptides is one of the most challenging but important complex modelling problems in the post-genome era due to the diversity and functionality of the peptides discovered. Generally, peptide binding prediction models are commonly used to find out whether a binding exists between a certain peptide(s) and a major histocompatibility complex (MHC) molecule(s). Recent research efforts have been focused on quantifying the binding predictions. The objective of this thesis is to develop reliable real-value predictive models through the use of fuzzy systems. A non-linear system is proposed with the aid of support vector-based regression to improve the fuzzy system and applied to the real value prediction of degree of peptide binding. This research study introduced two novel methods to improve structure and parameter identification of fuzzy systems. First, the support-vector based regression is used to identify initial parameter values of the consequent part of type-1 and interval type-2 fuzzy systems. Second, an overlapping clustering concept is used to derive interval valued parameters of the premise part of the type-2 fuzzy system. Publicly available peptide binding affinity data sets obtained from the literature are used in the experimental studies of this thesis. First, the proposed models are blind validated using the peptide binding affinity data sets obtained from a modelling competition. In that competition, almost an equal number of peptide sequences in the training and testing data sets (89, 76, 133 and 133 peptides for the training and 88, 76, 133 and 47 peptides for the testing) are provided to the participants. Each peptide in the data sets was represented by 643 bio-chemical descriptors assigned to each amino acid. Second, the proposed models are cross validated using mouse class I MHC alleles (H2-Db, H2-Kb and H2-Kk). H2-Db, H2-Kb, and H2-Kk consist of 65 nona-peptides, 62 octa-peptides, and 154 octa-peptides, respectively. Compared to the previously published results in the literature, the support vector-based type-1 and support vector-based interval type-2 fuzzy models yield an improvement in the prediction accuracy. The quantitative predictive performances have been improved as much as 33.6\% for the first group of data sets and 1.32\% for the second group of data sets. The proposed models not only improved the performance of the fuzzy system (which used support vector-based regression), but the support vector-based regression benefited from the fuzzy concept also. The results obtained here sets the platform for the presented models to be considered for other application domains in computational and/or systems biology. Apart from improving the prediction accuracy, this research study has also identified specific features which play a key role(s) in making reliable peptide binding affinity predictions. The amino acid features "Polarity", "Positive charge", "Hydrophobicity coefficient", and "Zimm-Bragg parameter" are considered as highly discriminating features in the peptide binding affinity data sets. This information can be valuable in the design of peptides with strong binding affinity to a MHC I molecule(s). This information may also be useful when designing drugs and vaccines.

APA, Harvard, Vancouver, ISO, and other styles

7

Lee, Ho-Jin. "Functional data analysis: classification and regression." Texas A&M University, 2004. http://hdl.handle.net/1969.1/2805.

Full text

Abstract:

Functional data refer to data which consist of observed functions or curves evaluated at a finite subset of some interval. In this dissertation, we discuss statistical analysis, especially classification and regression when data are available in function forms. Due to the nature of functional data, one considers function spaces in presenting such type of data, and each functional observation is viewed as a realization generated by a random mechanism in the spaces. The classification procedure in this dissertation is based on dimension reduction techniques of the spaces. One commonly used method is Functional Principal Component Analysis (Functional PCA) in which eigen decomposition of the covariance function is employed to find the highest variability along which the data have in the function space. The reduced space of functions spanned by a few eigenfunctions are thought of as a space where most of the features of the functional data are contained. We also propose a functional regression model for scalar responses. Infinite dimensionality of the spaces for a predictor causes many problems, and one such problem is that there are infinitely many solutions. The space of the parameter function is restricted to Sobolev-Hilbert spaces and the loss function, so called, e-insensitive loss function is utilized. As a robust technique of function estimation, we present a way to find a function that has at most e deviation from the observed values and at the same time is as smooth as possible.

APA, Harvard, Vancouver, ISO, and other styles

8

Hechter, Trudie. "A comparison of support vector machines and traditional techniques for statistical regression and classification." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/49810.

Full text

Abstract:

Thesis (MComm)--Stellenbosch University, 2004.<br>ENGLISH ABSTRACT: Since its introduction in Boser et al. (1992), the support vector machine has become a popular tool in a variety of machine learning applications. More recently, the support vector machine has also been receiving increasing attention in the statistical community as a tool for classification and regression. In this thesis support vector machines are compared to more traditional techniques for statistical classification and regression. The techniques are applied to data from a life assurance environment for a binary classification problem and a regression problem. In the classification case the problem is the prediction of policy lapses using a variety of input variables, while in the regression case the goal is to estimate the income of clients from these variables. The performance of the support vector machine is compared to that of discriminant analysis and classification trees in the case of classification, and to that of multiple linear regression and regression trees in regression, and it is found that support vector machines generally perform well compared to the traditional techniques.<br>AFRIKAANSE OPSOMMING: Sedert die bekendstelling van die ondersteuningspuntalgoritme in Boser et al. (1992), het dit 'n populêre tegniek in 'n verskeidenheid masjienleerteorie applikasies geword. Meer onlangs het die ondersteuningspuntalgoritme ook meer aandag in die statistiese gemeenskap begin geniet as 'n tegniek vir klassifikasie en regressie. In hierdie tesis word ondersteuningspuntalgoritmes vergelyk met meer tradisionele tegnieke vir statistiese klassifikasie en regressie. Die tegnieke word toegepas op data uit 'n lewensversekeringomgewing vir 'n binêre klassifikasie probleem sowel as 'n regressie probleem. In die klassifikasiegeval is die probleem die voorspelling van polisvervallings deur 'n verskeidenheid invoer veranderlikes te gebruik, terwyl in die regressiegeval gepoog word om die inkomste van kliënte met behulp van hierdie veranderlikes te voorspel. Die resultate van die ondersteuningspuntalgoritme word met dié van diskriminant analise en klassifikasiebome vergelyk in die klassifikasiegeval, en met veelvoudige linêere regressie en regressiebome in die regressiegeval. Die gevolgtrekking is dat ondersteuningspuntalgoritmes oor die algemeen goed vaar in vergelyking met die tradisionele tegnieke.

APA, Harvard, Vancouver, ISO, and other styles

9

Thorén, Daniel. "Radar based tank level measurement using machine learning : Agricultural machines." Thesis, Linköpings universitet, Programvara och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176259.

Full text

Abstract:

Agriculture is becoming more dependent on computerized solutions to make thefarmer’s job easier. The big step that many companies are working towards is fullyautonomous vehicles that work the fields. To that end, the equipment fitted to saidvehicles must also adapt and become autonomous. Making this equipment autonomoustakes many incremental steps, one of which is developing an accurate and reliable tanklevel measurement system. In this thesis, a system for tank level measurement in a seedplanting machine is evaluated. Traditional systems use load cells to measure the weightof the tank however, these types of systems are expensive to build and cumbersome torepair. They also add a lot of weight to the equipment which increases the fuel consump-tion of the tractor. Thus, this thesis investigates the use of radar sensors together witha number of Machine Learning algorithms. Fourteen radar sensors are fitted to a tankat different positions, data is collected, and a preprocessing method is developed. Then,the data is used to test the following Machine Learning algorithms: Bagged RegressionTrees (BG), Random Forest Regression (RF), Boosted Regression Trees (BRT), LinearRegression (LR), Linear Support Vector Machine (L-SVM), Multi-Layer Perceptron Re-gressor (MLPR). The model with the best 5-fold crossvalidation scores was Random For-est, closely followed by Boosted Regression Trees. A robustness test, using 5 previouslyunseen scenarios, revealed that the Boosted Regression Trees model was the most robust.The radar position analysis showed that 6 sensors together with the MLPR model gavethe best RMSE scores.In conclusion, the models performed well on this type of system which shows thatthey might be a competitive alternative to load cell based systems.

APA, Harvard, Vancouver, ISO, and other styles

10

Persson, Karl. "Predicting movie ratings : A comparative study on random forests and support vector machines." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11119.

Full text

Abstract:

The aim of this work is to evaluate the prediction performance of random forests in comparison to support vector machines, for predicting the numerical user ratings of a movie using pre-release attributes such as its cast, directors, budget and movie genres. In order to answer this question an experiment was conducted on predicting the overall user rating of 3376 hollywood movies, using data from the well established movie database IMDb. The prediction performance of the two algorithms was assessed and compared over three commonly used performance and error metrics, as well as evaluated by the means of significance testing in order to further investigate whether or not any significant differences could be identified. The results indicate some differences between the two algorithms, with consistently better performance from random forests in comparison to support vector machines over all of the performance metrics, as well as significantly better results for two out of three metrics. Although a slight difference has been indicated by the results one should also note that both algorithms show great similarities in terms of their prediction performance, making it hard to draw any general conclusions on which algorithm yield the most accurate movie predictions.

APA, Harvard, Vancouver, ISO, and other styles

11

Shen, Judong. "Fusing support vector machines and soft computing for pattern recognition and regression /." Search for this dissertation online, 2005. http://wwwlib.umi.com/cr/ksu/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Kerek, Hanna. "Product Similarity Matching for Food Retail using Machine Learning." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273606.

Full text

Abstract:

Product similarity matching for food retail is studied in this thesis. The goal is to find products that are similar but not necessarily of the same brand which can be used as a replacement product for a product that is out of stock or does not exist in a specific store. The aim of the thesis is to examine which machine learning model that is best suited to perform the product similarity matching. The product data used for training the models were name, description, nutrients, weight and filters (labels, for example organic). Product similarity matching was performed pairwise and the similarity between the products was measured by jaccard distance for text attributes and relative difference for numeric values. Random Forest, Logistic Regression and Support Vector Machines were tested and compared to a baseline. The baseline computed the jaccard distance for the product names and did the classification based on a threshold value of the jaccard distance. The result was measured by accuracy, F-measure and AUC score. Random Forest performed best in terms of all evaluation metrics and Logistic Regression, Random Forest and Support Vector Machines all performed better than the baseline.<br>I den här rapporten studeras produktliknande matchning för livsmedel. Målet är att hitta produkter som är liknande men inte nödvändigtvis har samma märke som kan vara en ersättningsprodukt till en produkt som är slutsåld eller inte säljs i en specifik affär. Syftet med den här rapporten är att undersöka vilken maskininlärningsmodel som är bäst lämpad för att göra produktliknande matchning. Produktdatan som användes för att träna modellerna var namn, beskrivning, näringsvärden, vikt och märkning (exempelvis ekologisk). Produktmatchningen gjordes parvis och likhet mellan produkterna beräknades genom jaccard index för textattribut och relativ differens för numeriska värden. Random Forest, logistisk regression och Support Vector Machines testades och jämfördes mot en baslinje. I baslinjen räknades jaccard index ut enbart för produkternas namn och klassificeringen gjordes genom att använda ett tröskelvärde för jaccard indexet. Resultatet mättes genom noggrannhet, F-measure och AUC. Random Forest presterade bäst sett till alla prestationsmått och logistisk regression, Random Forest och Support Vector Machines gav alla bättre resultat än baslinjen.

APA, Harvard, Vancouver, ISO, and other styles

13

Melki, Gabriella A. "Novel Support Vector Machines for Diverse Learning Paradigms." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5630.

Full text

Abstract:

This dissertation introduces novel support vector machines (SVM) for the following traditional and non-traditional learning paradigms: Online classification, Multi-Target Regression, Multiple-Instance classification, and Data Stream classification. Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets' correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model's prediction performance, while reducing computational complexity. Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model. Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy. OLLAWV's online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm - List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2's low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV's fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods. Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields.

APA, Harvard, Vancouver, ISO, and other styles

14

Salman, Raied. "CONTRIBUTIONS TO K-MEANS CLUSTERING AND REGRESSION VIA CLASSIFICATION ALGORITHMS." VCU Scholars Compass, 2012. http://scholarscompass.vcu.edu/etd/2738.

Full text

Abstract:

The dissertation deals with clustering algorithms and transforming regression prob-lems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learn-ing environment for solving regression problems as classification tasks by using support vector machines (SVMs). An extension to the most popular unsupervised clustering meth-od, k-means algorithm, is proposed, dubbed k-means2 (k-means squared) algorithm, appli-cable to ultra large datasets. The main idea is based on using a small portion of the dataset in the first stage of the clustering. Thus, the centers of such a smaller dataset are computed much faster than if computing the centers based on the whole dataset. These final centers of the first stage are naturally much closer to the locations of the final centers rendering a great reduction in the total computational cost. For large datasets the speed up in computa-tion exhibited a trend which is shown to be high and rising with the increase in the size of the dataset. The total transient time for the fast stage was found to depend largely on the portion of the dataset selected in the stage. For medium size datasets it has been shown that an 8-10% portion of data used in the fast stage is a reasonable choice. The centers of the 8-10% samples computed during the fast stage may oscillate towards the final centers' positions of the fast stage along the centers' movement path. The slow stage will start with the final centers of the fast phase and the paths of the centers in the second stage will be much shorter than the ones of a classic k-means algorithm. Additionally, the oscillations of the slow stage centers' trajectories along the path to the final centers' positions are also greatly minimized. In the second part of the dissertation, a novel approach of posing a solution of re-gression problems as the multiclass classification tasks within the common framework of kernel machines is proposed. Based on such an approach both the nonlinear (NL) regression problems and NL multiclass classification tasks will be solved as multiclass classification problems by using SVMs. The accuracy of an approximating classification (hyper)Surface (averaged over several benchmarking data sets used in this study) to the data points over a given high-dimensional input space created by a nonlinear multiclass classifier is slightly superior to the solution obtained by regression (hyper)Surface. In terms of the CPU time needed for training (i.e. for tuning the hyperparameters of the models), the nonlinear SVM classifier also shows significant advantages. Here, the comparisons between the solutions obtained by an SVM solving given regression problem as a classic SVM regressor and as the SVM classifier have been performed. In order to transform a regression problem into a classification task, four possible discretizations of a continuous output (target) vector y are introduced and compared. A very strict double (nested) cross-validation technique has been used for measuring the performances of regression and multiclass classification SVMs. In order to carry out fair comparisons, SVMs are used for solving both tasks - regression and multiclass classification. The readily available and most popular benchmarking SVM tool, LibSVM, was used in all experiments. The results in solving twelve benchmarking regression tasks shown here will present SVM regression and classification algorithms as strongly competing models where each approach shows merits for a specific class of high-dimensional function approximation problems.

APA, Harvard, Vancouver, ISO, and other styles

15

Amlathe, Prakhar. "Standard Machine Learning Techniques in Audio Beehive Monitoring: Classification of Audio Samples with Logistic Regression, K-Nearest Neighbor, Random Forest and Support Vector Machine." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7050.

Full text

Abstract:

Honeybees are one of the most important pollinating species in agriculture. Every three out of four crops have honeybee as their sole pollinator. Since 2006 there has been a drastic decrease in the bee population which is attributed to Colony Collapse Disorder(CCD). The bee colonies fail/ die without giving any traditional health symptoms which otherwise could help in alerting the Beekeepers in advance about their situation. Electronic Beehive Monitoring System has various sensors embedded in it to extract video, audio and temperature data that could provide critical information on colony behavior and health without invasive beehive inspections. Previously, significant patterns and information have been extracted by processing the video/image data, but no work has been done using audio data. This research inaugurates and takes the first step towards the use of audio data in the Electronic Beehive Monitoring System (BeePi) by enabling a path towards the automatic classification of audio samples in different classes and categories within it. The experimental results give an initial support to the claim that monitoring of bee buzzing signals from the hive is feasible, it can be a good indicator to estimate hive health and can help to differentiate normal behavior against any deviation for honeybees.

APA, Harvard, Vancouver, ISO, and other styles

16

Zhao, Xiaochuang. "Ensemble Learning Method on Machine Maintenance Data." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/6056.

Full text

Abstract:

In the industry, a lot of companies are facing the explosion of big data. With this much information stored, companies want to make sense of the data and use it to help them for better decision making, especially for future prediction. A lot of money can be saved and huge revenue can be generated with the power of big data. When building statistical learning models for prediction, companies in the industry are aiming to build models with efficiency and high accuracy. After the learning models have been developed for production, new data will be generated. With the updated data, the models have to be updated as well. Due to this nature, the model performs best today doesn’t mean it will necessarily perform the same tomorrow. Thus, it is very hard to decide which algorithm should be used to build the learning model. This paper introduces a new method that ensembles the information generated by two different classification statistical learning algorithms together as inputs for another learning model to increase the final prediction power. The dataset used in this paper is NASA’s Turbofan Engine Degradation data. There are 49 numeric features (X) and the response Y is binary with 0 indicating the engine is working properly and 1 indicating engine failure. The model’s purpose is to predict whether the engine is going to pass or fail. The dataset is divided in training set and testing set. First, training set is used twice to build support vector machine (SVM) and neural network models. Second, it used the trained SVM and neural network model taking X of the training set as input to predict Y1 and Y2. Then, it takes Y1 and Y2 as inputs to build the Penalized Logistic Regression model, which is the ensemble model here. Finally, use the testing set follow the same steps to get the final prediction result. The model accuracy is calculated using overall classification accuracy. The result shows that the ensemble model has 92% accuracy. The prediction accuracies of SVM, neural network and ensemble models are compared to prove that the ensemble model successfully captured the power of the two individual learning model.

APA, Harvard, Vancouver, ISO, and other styles

17

Chen, Li. "Integrative Modeling and Analysis of High-throughput Biological Data." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30192.

Full text

Abstract:

Computational biology is an interdisciplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. In this dissertation research work, we propose novel methods to integrate, model and analyze multiple biological data, including microarray gene expression data, protein-DNA interaction data and protein-protein interaction data. These methods will help improve our understanding of biological systems. First, we propose a knowledge-guided multi-scale independent component analysis (ICA) method for biomarker identification on time course microarray data. Guided by a knowledge gene pool related to a specific disease under study, the method can determine disease relevant biological components from ICA modes and then identify biologically meaningful markers related to the specific disease. We have applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Second, we propose a novel method for transcriptional regulatory network identification by integrating gene expression data and protein-DNA binding data. The approach is built upon a multi-level analysis strategy designed for suppressing false positive predictions. With this strategy, a regulatory module becomes increasingly significant as more relevant gene sets are formed at finer levels. At each level, a two-stage support vector regression (SVR) method is utilized to reduce false positive predictions by integrating binding motif information and gene expression data; a significance analysis procedure is followed to assess the significance of each regulatory module. The resulting performance on simulation data and yeast cell cycle data shows that the multi-level SVR approach outperforms other existing methods in the identification of both regulators and their target genes. We have further applied the proposed method to breast cancer cell line data to identify condition-specific regulatory modules associated with estrogen treatment. Experimental results show that our method can identify biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Third, we propose a bootstrapping Markov Random Filed (MRF)-based method for subnetwork identification on microarray data by incorporating protein-protein interaction data. Methodologically, an MRF-based network score is first derived by considering the dependency among genes to increase the chance of selecting hub genes. A modified simulated annealing search algorithm is then utilized to find the optimal/suboptimal subnetworks with maximal network score. A bootstrapping scheme is finally implemented to generate confident subnetworks. Experimentally, we have compared the proposed method with other existing methods, and the resulting performance on simulation data shows that the bootstrapping MRF-based method outperforms other methods in identifying ground truth subnetwork and hub genes. We have then applied our method to breast cancer data to identify significant subnetworks associated with drug resistance. The identified subnetworks not only show good reproducibility across different data sets, but indicate several pathways and biological functions potentially associated with the development of breast cancer and drug resistance. In addition, we propose to develop network-constrained support vector machines (SVM) for cancer classification and prediction, by taking into account the network structure to construct classification hyperplanes. The simulation study demonstrates the effectiveness of our proposed method. The study on the real microarray data sets shows that our network-constrained SVM, together with the bootstrapping MRF-based subnetwork identification approach, can achieve better classification performance compared with conventional biomarker selection approaches and SVMs. We believe that the research presented in this dissertation not only provides novel and effective methods to model and analyze different types of biological data, the extensive experiments on several real microarray data sets and results also show the potential to improve the understanding of biological mechanisms related to cancers by generating novel hypotheses for further study.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

18

Erdas, Ozlem. "Modelling And Predicting Binding Affinity Of Pcp-like Compounds Using Machine Learning Methods." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/3/12608792/index.pdf.

Full text

Abstract:

Machine learning methods have been promising tools in science and engineering fields. The use of these methods in chemistry and drug design has advanced after 1990s. In this study, molecular electrostatic potential (MEP) surfaces of PCP-like compounds are modelled and visualized in order to extract features which will be used in predicting binding affinity. In modelling, Cartesian coordinates of MEP surface points are mapped onto a spherical self-organizing map. Resulting maps are visualized by using values of electrostatic potential. These values also provide features for prediction system. Support vector machines and partial least squares method are used for predicting binding affinity of compounds, and results are compared.

APA, Harvard, Vancouver, ISO, and other styles

19

Falk, Anton, and Daniel Holmgren. "Sales Forecasting by Assembly of Multiple Machine Learning Methods : A stacking approach to supervised machine learning." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184317.

Full text

Abstract:

Today, digitalization is a key factor for businesses to enhance growth and gain advantages and insight in their operations. Both in planning operations and understanding customers the digitalization processes today have key roles, and companies are spending more and more resources in this fields to gain critical insights and enhance growth. The fast-food industry is no exception where restaurants need to be highly flexible and agile in their work. With this, there exists an immense demand for knowledge and insights to help restaurants plan their daily operations and there is a great need for organizations to continuously adapt new technological solutions into their existing processes. Well implemented Machine Learning solutions in combination with feature engineering are likely to bring value into the existing processes. Sales forecasting, which is the main field of study in this thesis work, has a vital role in planning of fast food restaurant's operations, both for budgeting purposes, but also for staffing purposes. The word fast food describes itself. With this comes a commitment to provide high quality food and rapid service to the customers. Understaffing can risk violating either quality of the food or service while overstaffing leads to low overall productivity. Generating highly reliable sales forecasts are thus vital to maximize profits and minimize operational risk. SARIMA, XGBoost and Random Forest were evaluated on training data consisting of sales numbers, business hours and categorical variables describing date and month. These models worked as base learners where sales predictions from a specific dataset were used as training data for a Support Vector Regression model (SVR). A stacking approach to this type of project shows sufficient results with a significant gain in prediction accuracy for all investigated restaurants on a 6-week aggregated timeline compared to the existing solution.<br>Digitalisering har idag en nyckelroll för att skapa tillväxt och insikter för företag, dessa insikter ger fördelar både inom planering och i förståelsen om deras kunder. Det här är ett område som företag lägger mer och mer resurser på för att skapa större förståelse om sin verksamhet och på så sätt öka tillväxten. Snabbmatsindustrin är inget undantag då restauranger behöver en hög grad av flexibilitet i sina arbetssätt för att möta kundbehovet. Det här skapar en stor efterfrågan av kunskap och insikter för att hjälpa dem i planeringen av deras dagliga arbete och det finns ett stort behov från företagen att kontinuerligt implementera nya tekniska lösningar i befintliga processer. Med väl implementerade maskininlärningslösningar i kombination med att skapa mer informativa variabler från befintlig data kan aktörer skapa mervärde till redan existerande processer. Försäljningsprognostisering, som är huvudområdet för den här studien, har en viktig roll för verksamhetsplaneringen inom snabbmatsindustrin, både inom budgetering och bemanning. Namnet snabbmat beskriver sig själv, med det följer ett löfte gentemot kunden att tillhandahålla hög kvalitet på maten samt att kunna tillhandahålla snabb service. Underbemanning kan riskera att bryta någon av dessa löften, antingen i undermålig kvalitet på maten eller att inte kunna leverera snabb service. Överbemanning riskerar i stället att leda till ineffektivitet i användandet av resurser. Att generera högst tillförlitliga prognoser är därför avgörande för att kunna maximera vinsten och minimera operativ risk. SARIMA, XGBoost och Random Forest utvärderades på ett träningsset bestående av försäljningssiffror, timme på dygnet och kategoriska variabler som beskriver dag och månad. Dessa modeller fungerar som basmodeller vars prediktioner från ett specifikt testset används som träningsdata till en Stödvektorsreggresionsmodell (SVR). Att använda stapling av maskininlärningsmodeller till den här typen av problem visade tillfredställande resultat där det påvisades en signifikant förbättring i prediktionssäkerhet under en 6 veckors aggregerad period gentemot den redan existerande modellen.

APA, Harvard, Vancouver, ISO, and other styles

20

Gyawali, Sanij. "Dynamic Load Modeling from PSSE-Simulated Disturbance Data using Machine Learning." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/100591.

Full text

Abstract:

Load models have evolved from simple ZIP model to composite model that incorporates the transient dynamics of motor loads. This research utilizes the latest trend on Machine Learning and builds reliable and accurate composite load model. A composite load model is a combination of static (ZIP) model paralleled with a dynamic model. The dynamic model, recommended by Western Electricity Coordinating Council (WECC), is an induction motor representation. In this research, a dual cage induction motor with 20 parameters pertaining to its dynamic behavior, starting behavior, and per unit calculations is used as a dynamic model. For machine learning algorithms, a large amount of data is required. The required PMU field data and the corresponding system models are considered Critical Energy Infrastructure Information (CEII) and its access is limited. The next best option for the required amount of data is from a simulating environment like PSSE. The IEEE 118 bus system is used as a test setup in PSSE and dynamic simulations generate the required data samples. Each of the samples contains data on Bus Voltage, Bus Current, and Bus Frequency with corresponding induction motor parameters as target variables. It was determined that the Artificial Neural Network (ANN) with multivariate input to single parameter output approach worked best. Recurrent Neural Network (RNN) is also experimented side by side to see if an additional set of information of timestamps would help the model prediction. Moreover, a different definition of a dynamic model with a transfer function-based load is also studied. Here, the dynamic model is defined as a mathematical representation of the relation between bus voltage, bus frequency, and active/reactive power flowing in the bus. With this form of load representation, Long-Short Term Memory (LSTM), a variation of RNN, performed better than the concurrent algorithms like Support Vector Regression (SVR). The result of this study is a load model consisting of parameters defining the load at load bus whose predictions are compared against simulated parameters to examine their validity for use in contingency analysis.<br>Master of Science<br>Independent system Operators (ISO) and Distribution system operators (DSO) have a responsibility to provide uninterrupted power supply to consumers. That along with the longing to keep operating cost minimum, engineers and planners study the system beforehand and seek to find the optimum capacity for each of the power system elements like generators, transformers, transmission lines, etc. Then they test the overall system using power system models, which are mathematical representation of the real components, to verify the stability and strength of the system. However, the verification is only as good as the system models that are used. As most of the power systems components are controlled by the operators themselves, it is easy to develop a model from their perspective. The load is the only component controlled by consumers. Hence, the necessity of better load models. Several studies have been made on static load modeling and the performance is on par with real behavior. But dynamic loading, which is a load behavior dependent on time, is rather difficult to model. Some attempts on dynamic load modeling can be found already. Physical component-based and mathematical transfer function based dynamic models are quite widely used for the study. These load structures are largely accepted as a good representation of the systems dynamic behavior. With a load structure in hand, the next task is estimating their parameters. In this research, we tested out some new machine learning methods to accurately estimate the parameters. Thousands of simulated data are used to train machine learning models. After training, we validated the models on some other unseen data. This study finally goes on to recommend better methods to load modeling.

APA, Harvard, Vancouver, ISO, and other styles

21

Meyer, Rory George Vincent. "Classification of ocean vessels from low resolution satellite SAR images." Diss., University of Pretoria, 2005. http://hdl.handle.net/2263/66224.

Full text

Abstract:

In the long term it is beneficial to a country's economy to exploit the maritime environment surrounding it responsibly. It is also beneficial to protect this environment from poaching and pollution. To achieve this the responsible parties of a country must have an awareness of what is transpiring in the maritime domain. Synthetic aperture radar can provide an image, regardless of weather or light conditions, of the ocean showing most vessels therein. To monitor the ocean, using synthetic aperture radar imagery, at the lowest cost would require large swath synthetic aperture radar imagery. There exists a trade-off between large swath imagery and the image's resolution resulting in the largest swath image having the poorest resolution. Existing research has shown that it is possible to use coarse resolution synthetic aperture radar imagery to detect vessels at sea, but little work has been done on classifying those vessels. This research aims to investigate the coarse resolution classification information gap. This is done by using a dataset of matching synthetic aperture radar and ship transponder data to train a statistical classification algorithm in order to classify or estimate the length of vessels based on features extracted from their synthetic aperture radar image. The results of this research show that coarse resolution (approximately 40 m per pixel) synthetic aperture radar imagery is able to estimate vessel size for larger classes and provides insight on which vessel classes would require finer resolutions in order to be detected and classified reliably. The range of smaller vessel classes is usually limited to ports and fishing zones. These zones can be mapped using historical vessel transponder data and so a dedicated surveillance campaign can be optimised to use higher resolution products in these areas. The size estimation from the machine learning algorithm performs better than current techniques.<br>Dissertation (MEng)--University of Pretoria, 2017.<br>Electrical, Electronic and Computer Engineering<br>MEng<br>Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

22

Almér, Henrik. "Machine learning and statistical analysis in fuel consumption prediction for heavy vehicles." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-172306.

Full text

Abstract:

I investigate how to use machine learning to predict fuel consumption in heavy vehicles. I examine data from several different sources describing road, vehicle, driver and weather characteristics and I find a regression to a fuel consumption measured in liters per distance. The thesis is done for Scania and uses data sources available to Scania. I evaluate which machine learning methods are most successful, how data collection frequency affects the prediction and which features are most influential for fuel consumption. I find that a lower collection frequency of 10 minutes is preferable to a higher collection frequency of 1 minute. I also find that the evaluated models are comparable in their performance and that the most important features for fuel consumption are related to the road slope, vehicle speed and vehicle weight.<br>Jag undersöker hur maskininlärning kan användas för att förutsäga bränsleförbrukning i tunga fordon. Jag undersöker data från flera olika källor som beskriver väg-, fordons-, förar- och väderkaraktäristiker. Det insamlade datat används för att hitta en regression till en bränsleförbrukning mätt i liter per sträcka. Studien utförs på uppdrag av Scania och jag använder mig av datakällor som är tillgängliga för Scania. Jag utvärderar vilka maskininlärningsmetoder som är bäst lämpade för problemet, hur insamlingsfrekvensen påverkar resultatet av förutsägelsen samt vilka attribut i datat som är mest inflytelserika för bränsleförbrukning. Jag finner att en lägre insamlingsfrekvens av 10 minuter är att föredra framför en högre frekvens av 1 minut. Jag finner även att de utvärderade modellerna ger likvärdiga resultat samt att de viktigaste attributen har att göra med vägens lutning, fordonets hastighet och fordonets vikt.

APA, Harvard, Vancouver, ISO, and other styles

23

Jansson, Daniel, and Rasmus Blomstrand. "REAL-TIME PREDICTION OF SHIMS DIMENSIONS IN POWER TRANSFER UNITS USING MACHINE LEARNING." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-45615.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Henchiri, Yousri. "L'approche Support Vector Machines (SVM) pour le traitement des données fonctionnelles." Thesis, Montpellier 2, 2013. http://www.theses.fr/2013MON20187/document.

Full text

Abstract:

L'Analyse des Données Fonctionnelles est un domaine important et dynamique en statistique. Elle offre des outils efficaces et propose de nouveaux développements méthodologiques et théoriques en présence de données de type fonctionnel (fonctions, courbes, surfaces, ...). Le travail exposé dans cette thèse apporte une nouvelle contribution aux thèmes de l'apprentissage statistique et des quantiles conditionnels lorsque les données sont assimilables à des fonctions. Une attention particulière a été réservée à l'utilisation de la technique Support Vector Machines (SVM). Cette technique fait intervenir la notion d'Espace de Hilbert à Noyau Reproduisant. Dans ce cadre, l'objectif principal est d'étendre cette technique non-paramétrique d'estimation aux modèles conditionnels où les données sont fonctionnelles. Nous avons étudié les aspects théoriques et le comportement pratique de la technique présentée et adaptée sur les modèles de régression suivants. Le premier modèle est le modèle fonctionnel de quantiles de régression quand la variable réponse est réelle, les variables explicatives sont à valeurs dans un espace fonctionnel de dimension infinie et les observations sont i.i.d.. Le deuxième modèle est le modèle additif fonctionnel de quantiles de régression où la variable d'intérêt réelle dépend d'un vecteur de variables explicatives fonctionnelles. Le dernier modèle est le modèle fonctionnel de quantiles de régression quand les observations sont dépendantes. Nous avons obtenu des résultats sur la consistance et les vitesses de convergence des estimateurs dans ces modèles. Des simulations ont été effectuées afin d'évaluer la performance des procédures d'inférence. Des applications sur des jeux de données réelles ont été considérées. Le bon comportement de l'estimateur SVM est ainsi mis en évidence<br>Functional Data Analysis is an important and dynamic area of statistics. It offers effective new tools and proposes new methodological and theoretical developments in the presence of functional type data (functions, curves, surfaces, ...). The work outlined in this dissertation provides a new contribution to the themes of statistical learning and quantile regression when data can be considered as functions. Special attention is devoted to use the Support Vector Machines (SVM) technique, which involves the notion of a Reproducing Kernel Hilbert Space. In this context, the main goal is to extend this nonparametric estimation technique to conditional models that take into account functional data. We investigated the theoretical aspects and practical attitude of the proposed and adapted technique to the following regression models.The first model is the conditional quantile functional model when the covariate takes its values in a bounded subspace of the functional space of infinite dimension, the response variable takes its values in a compact of the real line, and the observations are i.i.d.. The second model is the functional additive quantile regression model where the response variable depends on a vector of functional covariates. The last model is the conditional quantile functional model in the dependent functional data case. We obtained the weak consistency and a convergence rate of these estimators. Simulation studies are performed to evaluate the performance of the inference procedures. Applications to chemometrics, environmental and climatic data analysis are considered. The good behavior of the SVM estimator is thus highlighted

APA, Harvard, Vancouver, ISO, and other styles

25

Linton, Thomas. "Forecasting hourly electricity consumption for sets of households using machine learning algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186592.

Full text

Abstract:

To address inefficiency, waste, and the negative consequences of electricity generation, companies and government entities are looking to behavioural change among residential consumers. To drive behavioural change, consumers need better feedback about their electricity consumption. A monthly or quarterly bill provides the consumer with almost no useful information about the relationship between their behaviours and their electricity consumption. Smart meters are now widely dispersed in developed countries and they are capable of providing electricity consumption readings at an hourly resolution, but this data is mostly used as a basis for billing and not as a tool to assist the consumer in reducing their consumption. One component required to deliver innovative feedback mechanisms is the capability to forecast hourly electricity consumption at the household scale. The work presented by this thesis is an evaluation of the effectiveness of a selection of kernel based machine learning methods at forecasting the hourly aggregate electricity consumption for different sized sets of households. The work of this thesis demonstrates that k-Nearest Neighbour Regression and Gaussian process Regression are the most accurate methods within the constraints of the problem considered. In addition to accuracy, the advantages and disadvantages of each machine learning method are evaluated, and a simple comparison of each algorithms computational performance is made.<br>För att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod.

APA, Harvard, Vancouver, ISO, and other styles

26

Kinalwa-Nalule, Myra. "Using machine learning to determine fold class and secondary structure content from Raman optical activity and Raman vibrational spectroscopy." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/using-machine-learning-to-determine-fold-class-and-secondary-structure-content-from-raman-optical-activity-and-raman-vibrational-spectroscopy(7382043d-748c-4d29-ba75-67fb35ccdb19).html.

Full text

Abstract:

The objective of this project was to apply machine learning methods to determine protein secondary structure content and protein fold class from ROA and Raman vibrational spectral data. Raman and ROA are sensitive to biomolecular structure with the bands of each spectra corresponding to structural elements in proteins and when combined give a fingerprint of the protein. However, there are many bands of which little is known. There is a need, therefore, to find ways of extrapolating information from spectral bands and investigate which regions of the spectra contain the most useful structural information. Support Vector Machines (SVM) classification and Random Forests (RF) trees classification were used to mine protein fold class information and Partial Least Squares (PLS) regression was used to determine secondary structure content of proteins. The classification methods were used to group proteins into α-helix, β-sheet, α/β and disordered fold classes. The PLS regression was used to determine percentage protein structural content from Raman and ROA spectral data. The analyses were performed on spectral bin widths of 10cm-1 and on the spectral amide regions I, II and III. The full spectra and different combinations of the amide regions were also analysed. The SVM analyses, classification and regression, generally did not perform well. SVM classification models for example, had low Matthew Correlation Coefficient (MCC) values below 0.5 but this is better than a negative value which would indicate a random chance prediction. The SVM regression analyses also showed very poor performances with average R2 values below 0.5. R2 is the Pearson's correlations coefficient and shows how well predicted and observed structural content values correlate. An R2 value 1 indicates a good correlation and therefore a good prediction model. The Partial Least Squares regression analyses yielded much improved results with very high accuracies. Analyses of full spectrum and the spectral amide regions produced high R2 values of 0.8-0.9 for both ROA and Raman spectral data. This high accuracy was also seen in the analysis of the 850-1100 cm-1 backbone region for both ROA and Raman spectra which indicates that this region could have an important contribution to protein structure analysis. 2nd derivative Raman spectra PLS regression analysis showed very improved performance with high accuracy R2 values of 0.81-0.97. The Random Forest algorithm used here for classification showed good performance. The 2-dimensional plots used to visualise the classification clusters showed clear clusters in some analyses, for example tighter clustering was observed for amide I, amide I & III and amide I & II & III spectral regions than for amide II, amide III and amide II&III spectra analysis. The Random Forest algorithm also determines variable importance which showed spectral bins were crucial in the classification decisions. The ROA Random Forest analyses performed generally better than Raman Random Forest analyses. ROA Random Forest analyses showed 75% as the highest percentage of correctly classified proteins while Raman analyses reported 50% as the highest percentage. The analyses presented in this thesis have shown that Raman and ROA vibrational spectral contains information about protein secondary structure and these data can be extracted using mathematical methods such as the machine learning techniques presented here. The machine learning methods applied in this project were used to mine information about protein secondary structure and the work presented here demonstrated that these techniques are useful and could be powerful tools in the determination protein structure from spectral data.

APA, Harvard, Vancouver, ISO, and other styles

27

Yoldas, Mine. "Predicting The Effect Of Hydrophobicity Surface On Binding Affinity Of Pcp-like Compounds Using Machine Learning Methods." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613215/index.pdf.

Full text

Abstract:

This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which affects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained from SOM and k-means clustering are used in order to predict binding affinity of molecules individually. Support vector regression and partial least squares regression are used for prediction.

APA, Harvard, Vancouver, ISO, and other styles

28

Adelore, Temilade Adediwura. "Determining fixation stability of amd patients using predictive eye estimation regression." Thesis, Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26495.

Full text

Abstract:

Patients with macular degeneration (MD) often fixate with a preferred retinal locus (PRL). Eye movements made while fixating with the PRL (in MD patients) has been observed to be maladaptive compared to those made while fixating with the fovea (normal sighted individuals). For example, in MD patients, PRL eye movements negatively affect fixation stability and re-fixation precision; consequently creating difficulty in reading and limits to their execution of other everyday activities. Abnormal eye movements from the PRL affect research on the physiological adaptations to MD. Specifically, previous research on cortical reorganization using functional magnetic resonance imaging (fMRI), indicates a critical need to accurately determine a MD patient's point of gaze in order to better infer existence of cortical reorganization. Unfortunately, standard MR compatible hardware eye-tracking systems do not work well with these patients. Their reduction in fixation stability often overwhelms the tracking algorithms used by these systems. This research investigates the use of an existing magnetic resonance imaging (MRI) based technique called Predictive Eye Estimation Regression (PEER) to determine the point of gaze of MD patients and thus control for fixation instability. PEER makes use of the fluctuations in the MR signal caused by eye movements to identify position of gaze. Engineering adaptations such as temporal resolution and brain coverage were applied to tailor PEER to MD patients. Also participants were evaluated on different fixation protocols and the results compared to that of the micro-perimeter MP-1 to test the efficacy of PEER. The fixation stability results obtained from PEER were similar to that obtained from the eye tracking results of the micro-perimeter MP-1. However, PEER's point of gaze estimations was different from the MP-1's in the fixation tests. The difference in this result cannot be concluded to be specific to PEER. In order to resolve this issue, advancements to PEER by the inclusion of an eye tracker in the scanner to run concurrently with PEER could provide more evidence of PEER's reliability. In addition, increasing the diversity of AMD patients in terms of the different scotoma types will help provide a better estimate of PEER flexibility and robustness.

APA, Harvard, Vancouver, ISO, and other styles

29

Khawaja, Taimoor Saleem. "A Bayesian least squares support vector machines based framework for fault diagnosis and failure prognosis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34758.

Full text

Abstract:

A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear, non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators, and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classication for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to nd a good trade-o between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data, is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate (possibly) non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines , (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines,(c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes.

APA, Harvard, Vancouver, ISO, and other styles

30

Hao, Haiyan. "Understanding Fixed Object Crashes with SHRP2 Naturalistic Driving Study Data." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/84942.

Full text

Abstract:

Fixed-object crashes have long time been considered as major roadway safety concerns. While previous relevant studies tended to address such crashes in the context of roadway departures, and heavily relied on police-reported accidents data, this study integrated the SHRP2 NDS and RID data for analyses, which fully depicted the prior to, during, and after crash scenarios. A total of 1,639 crash, near-crash events, and 1,050 baseline events were acquired. Three analysis methods: logistic regression, support vector machine (SVM) and artificial neural network (ANN) were employed for two responses: crash occurrence and severity level. Logistic regression analyses identified 16 and 10 significant variables with significance levels of 0.1, relevant to driver, roadway, environment, etc. for two responses respectively. The logistic regression analyses led to a series of findings regarding the effects of explanatory variables on fixed-object event occurrence and associated severity level. SVM classifiers and ANN models were also constructed to predict these two responses. Sensitivity analyses were performed for SVM classifiers to infer the contributing effects of input variables. All three methods obtained satisfactory prediction performance, that was around 88% for fixed-object event occurrence and 75% for event severity level, which indicated the effectiveness of NDS event data on depicting crash scenarios and roadway safety analyses.<br>Master of Science

APA, Harvard, Vancouver, ISO, and other styles

31

Wirgen, Isak, and Douglas Rube. "Supervised fraud detection of mobile money transactions on different distributions of imbalanced data : A comparative study of the classification methods logistic regression, random forest, and support vector machine." Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446108.

Full text

Abstract:

The purpose of this paper is to compare the classification methods logistic regression, random forest, and support vector machine´s performance of detecting mobile money transaction fraud. Their performance will be evaluated on different distributions of imbalanced data in a supervised framework. Model performance will be evaluated from a variety of metrics to capture the full model performance. The results show that random forest attained the highest overall performance, followed by logistic regression. Support vector machine attained the worst overall performance and produced no useful classification of fraudulent transactions. In conclusion, the study suggests that better results could be achieved with actions such as improvements of the classification algorithms as well as better feature selection, among others.

APA, Harvard, Vancouver, ISO, and other styles

32

Ashrafi, Parivash. "Predicting the absorption rate of chemicals through mammalian skin using machine learning algorithms." Thesis, University of Hertfordshire, 2016. http://hdl.handle.net/2299/17310.

Full text

Abstract:

Machine learning (ML) methods have been applied to the analysis of a range of biological systems. This thesis evaluates the application of these methods to the problem domain of skin permeability. ML methods offer great potential in both predictive ability and their ability to provide mechanistic insight to, in this case, the phenomena of skin permeation. Historically, refining mathematical models used to predict percutaneous drug absorption has been thought of as a key factor in this field. Quantitative Structure-Activity Relationships (QSARs) models are used extensively for this purpose. However, advanced ML methods successfully outperform the traditional linear QSAR models. In this thesis, the application of ML methods to percutaneous absorption are investigated and evaluated. The major approach used in this thesis is Gaussian process (GP) regression method. This research seeks to enhance the prediction performance by using local non-linear models obtained from applying clustering algorithms. In addition, to increase the model's quality, a kernel is generated based on both numerical chemical variables and categorical experimental descriptors. Monte Carlo algorithm is also employed to generate reliable models from variable data which is inevitable in biological experiments. The datasets used for this study are small and it may raise the over-fitting/under-fitting problem. In this research I attempt to find optimal values of skin permeability using GP optimisation algorithms within small datasets. Although these methods are applied here to the field of percutaneous absorption, it may be applied more broadly to any biological system.

APA, Harvard, Vancouver, ISO, and other styles

33

Khizra, Shufa. "Using Natural Language Processing and Machine Learning for Analyzing Clinical Notes in Sickle Cell Disease Patients." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright154759374321405.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Bogdanov, Daniil. "The development and analysis of a computationally efficient data driven suit jacket fit recommendation system." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222341.

Full text

Abstract:

In this master thesis work we design and analyze a data driven suit jacket ﬁt recommendation system which aim to guide shoppers in the process of assessing garment ﬁt over the web. The system is divided into two stages. In the ﬁrst stage we analyze labelled customer data, train supervised learning models as to be able to predict optimal suit jacket dimensions of unseen shoppers and determine appropriate models for each suit jacket dimension. In stage two the recommendation system uses the results from stage one and sorts a garment collection from best ﬁt to least ﬁt. The sorted collection is what the ﬁt recommendation system is to return. In this thesis work we propose a particular design of stage two that aim to reduce the complexity of the system but at a cost of reduced quality of the results. The trade-oﬀs are identiﬁed and weighed against each other. The results in stage one show that simple supervised learning models with linear regression functions suﬃce when the independent and dependent variables align at particular landmarks on the body. If style preferences are also to be incorporated into the supervised learning models, non-linear regression functions should be considered as to account for increased complexity. The results in stage two show that the complexity of the recommendation system can be made independent from the complexity of how ﬁt is assessed. And as technology is enabling for more advanced ways of assessing garment ﬁt, such as 3D body scanning techniques, the proposed design of reducing the complexity of the recommendation system enables for highly complex techniques to be utilized without aﬀecting the responsiveness of the system in run-time.<br>I detta masterexamensarbete designar och analyserar vi ett datadrivet rekommendationssystem för kavajer med mål att vägleda nät-handlare i deras process i att bedöma passform över internet. Systemet är uppdelat i två steg. I det första steget analyserar vi märkt data och tränar modeller i att lära sig att framställa prognoser av optimala kavajmått för shoppare som inte systemet har tidigare exponeras för. I steg två tar rekommendationssystemet resultatet ifrån steg ett och sorterar plaggkollektionen från bästa till sämsta passform. Den sorterade kollektionen är vad systemet är tänkt att retunera. I detta arbete föreslåar vi en speciﬁk utformning gällande steg två med mål att reducera komplexiteten av systemet men till en kostnad i noggrannhet vad det gäller resultat. För- och nackdelar identiﬁeras och vägs mot varandra. Resultatet i steg två visar att enkla modeller med linjära regressionsfunktioner räcker när de obereoende och beroende variabler sammanfaller på speciﬁka punkter på kroppen. Om stil-preferenser också vill inkorpereras i dessa modeller bör icke-linjära regressionsfunktioner betraktas för att redogöra för den ökade komplexitet som medföljer. Resultaten i steg två visar att komplexiteten av rekommendationssystemet kan göras obereoende av komplexiteten för hur passform bedöms. Och då teknologin möjliggör för allt mer avancerade sätt att bedöma passform, såsom 3D-scannings tekniker, kan mer komplexa tekniker utnyttjas utan att påverka responstiden för systemet under körtid.

APA, Harvard, Vancouver, ISO, and other styles

35

Yusuf, Adeel. "Advanced machine learning models for online travel-time prediction on freeways." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50408.

Full text

Abstract:

The objective of the research described in this dissertation is to improve the travel-time prediction process using machine learning methods for the Advanced Traffic In-formation Systems (ATIS). Travel-time prediction has gained significance over the years especially in urban areas due to increasing traffic congestion. The increased demand of the traffic flow has motivated the need for development of improved applications and frameworks, which could alleviate the problems arising due to traffic flow, without the need of addition to the roadway infrastructure. In this thesis, the basic building blocks of the travel-time prediction models are discussed, with a review of the significant prior art. The problem of travel-time prediction was addressed by different perspectives in the past. Mainly the data-driven approach and the traffic flow modeling approach are the two main paths adopted viz. a viz. travel-time prediction from the methodology perspective. This dissertation, works towards the im-provement of the data-driven method. The data-driven model, presented in this dissertation, for the travel-time predic-tion on freeways was based on wavelet packet decomposition and support vector regres-sion (WPSVR), which uses the multi-resolution and equivalent frequency distribution ability of the wavelet transform to train the support vector machines. The results are compared against the classical support vector regression (SVR) method. Our results indi-cate that the wavelet reconstructed coefficients when used as an input to the support vec-tor machine for regression (WPSVR) give better performance (with selected wavelets on-ly), when compared against the support vector regression (without wavelet decomposi-tion). The data used in the model is downloaded from California Department of Trans-portation (Caltrans) of District 12 with a detector density of 2.73, experiencing daily peak hours except most weekends. The data was stored for a period of 214 days accumulated over 5 minute intervals over a distance of 9.13 miles. The results indicate an improvement in accuracy when compared against the classical SVR method. The basic criteria for selection of wavelet basis for preprocessing the inputs of support vector machines are also explored to filter the set of wavelet families for the WDSVR model. Finally, a configuration of travel-time prediction on freeways is present-ed with interchangeable prediction methods along with the details of the Matlab applica-tion used to implement the WPSVR algorithm. The initial results are computed over the set of 42 wavelets. To reduce the compu-tational cost involved in transforming the travel-time data into the set of wavelet packets using all possible mother wavelets available, a methodology of filtering the wavelets is devised, which measures the cross-correlation and redundancy properties of consecutive wavelet transformed values of same frequency band. An alternate configuration of travel-time prediction on freeways using the con-cepts of cloud computation is also presented, which has the ability to interchange the pre-diction modules with an alternate method using the same time-series data. Finally, a graphical user interface is described to connect the Matlab environment with the Caltrans data server for online travel-time prediction using both SVR and WPSVR modules and display the errors and plots of predicted values for both methods. The GUI also has the ability to compute forecast of custom travel-time data in the offline mode.

APA, Harvard, Vancouver, ISO, and other styles

36

Engström, Freja, and Rojas Disa Nilsson. "Prediction of the future trend of e-commerce." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301950.

Full text

Abstract:

In recent years more companies have invested in electronic commerce as a result of more customers using the internet as a tool for shopping. However, the basics of marketing still apply to online stores, and thus companies need to conduct market analyses of customers and the online market to be able to successfully target customers online. In this report, we propose the use of machine learning, a tool that has received a lot of attention and positive affirmation for the ability to tackle a range of problems, to predict future trends of electronic commerce in Sweden. More precise, to predict the future share of users of electronic commerce in general and for certain demographics. We will build three different models, polynomial regression, SVR and ARIMA. The findings from the constructed forecasts were that there are differences between different demographics of customers and between groups within a certain demographic. Furthermore, the result showed that the forecast was more accurate when modelling a certain demographic than the entire population. Companies can thereby possibly use the models to predict the behaviour of certain smaller segments of the market and use that in their marketing to attract these customers.<br>Pa senare år har många företag investerat i elektronisk handel, även kallat e-handel, vilket är ett resultat av att individer i samhället i större utsträckning använder internet som ett redskap. Grunderna för marknadsföring gäller fortfarande för webbaserade butiker, och därmed behöver företag genomföra marknadsanalyser över potentiella kunder och internet-marknaden för att kunna lansera starka marknadsföringskampanjer. I denna rapport föreslår vi användning av maskininlärning, ett verktyg som har fått mycket uppmärksamhet på senaste tiden för dess förmåga att hantera olika problem kring data och för att prognostisera framtida trender för e-handel i Sverige. Mer exakt kommer andelen användare av e-handel i framtiden prognostiseras, både generellt och för enskilda demografier. Vi kommer att implementera tre olika modeller, polynomisk regression, SVR och ARIMA. Resultaten från de konstruerade prognoserna visar att det finns tydliga skillnader mellan olika demografier av kunder och mellan grupper inom en viss demografi. Dessutom visade resultaten att prognoserna var mer exakta vid modellering av en viss demografi än över hela befolkningen. Företag kan därmed möjligtvis använda modellerna för att förutsäga beteendet hos vissa mindre segment av marknaden.

APA, Harvard, Vancouver, ISO, and other styles

37

Barretto, Mateus Ymanaka. "Aplicação de máquinas de vetor de suporte e modelos auto-regressivos de média móvel na classificação de sinais eletromiográficos." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-28032017-100828/.

Full text

Abstract:

O diagnóstico de doenças neuromusculares é feito pelo uso conjunto de várias ferramentas. Dentre elas, o exame de eletromiografia clínica fornece informações vitais ao diagnóstico. A aplicação de alguns classificadores (discriminante linear e redes neurais artificiais) aos diversos parâmetros dos sinais de eletromiografia (número de fases, de reversões e de cruzamentos de zero, freqüência mediana, coeficientes auto-regressivos) tem fornecido resultados promissores na literatura. No entanto, a necessidade de um número grande de coeficientes auto-regressivos direcionou este mestrado ao uso de modelos auto-regressivos de média móvel com um número menor de coeficientes. A classificação (em normal, neuropático ou miopático) foi feita pela máquina de vetor de suporte, um tipo de rede neural artificial de uso recente. O objetivo deste trabalho foi o de estudar a viabilidade do uso de modelos auto-regressivos de média móvel (ARMA) de ordem baixa, em vez de auto-regressivos de ordem alta, em conjunção com a máquina de vetor de suporte, para auxílio ao diagnóstico. Os resultados indicam que a máquina de vetor de suporte tem desempenho melhor que o discriminante linear de Fisher e que os modelos ARMA(1,11) e ARMA(1,12) fornecem altas taxas de classificação (81,5%), cujos valores são próximos ao máximo obtido com modelos auto-regressivos de ordem 39. Portanto, recomenda-se o uso da máquina de vetor de suporte e de modelos ARMA (1,11) ou ARMA(1,12) para a classificação de sinais de eletromiografia de agulha, de 800ms de duração e amostrados a 25kHz.<br>The diagnosis of neuromuscular diseases is attained by the combined use of several tools. Among these tools, clinical electromyography provides key information to the diagnosis. In the literature, the application of some classifiers (linear discriminant and artificial neural networks) to a variety of electromyography parameters (number of phases, turns and zero crossings; median frequency, auto-regressive coefficients) has provided promising results. Nevertheless, the need of a large number of auto-regressive coefficients has guided this Master\'s thesis to the use of a smaller number of auto-regressive moving-average coefficients. The classification task (into normal, neuropathic or myopathic) was achieved by support vector machines, a type of artificial neural network recently proposed. This work\'s objective was to study if low-order auto-regressive moving-average (ARMA) models can or cannot be used to substitute high-order auto-regressive models, in combination with support vector machines, for diagnostic purposes. Results point that support vector machines have better performance than Fisher linear discriminants. They also show that ARMA(1,11) and ARMA(1,12) models provide high classification rates (81.5%). These values are close to the maximum obtained by using 39 auto-regressive coefficients. So, we recommend the use of support vector machines and ARMA(1,11) or ARMA(1,12) to the classification of 800ms needle electromyography signals acquired at 25kHz.

APA, Harvard, Vancouver, ISO, and other styles

38

Bodin, Camilla. "Automatic Flight Maneuver Identification Using Machine Learning Methods." Thesis, Linköpings universitet, Reglerteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-165844.

Full text

Abstract:

This thesis proposes a general approach to solve the offline flight-maneuver identification problem using machine learning methods. The purpose of the study was to provide means for the aircraft professionals at the flight test and verification department of Saab Aeronautics to automate the procedure of analyzing flight test data. The suggested approach succeeded in generating binary classifiers and multiclass classifiers that identified six flight maneuvers of different complexity from real flight test data. The binary classifiers solved the problem of identifying one maneuver from flight test data at a time, while the multiclass classifiers solved the problem of identifying several maneuvers from flight test data simultaneously. To achieve these results, the difficulties that this time series classification problem entailed were simplified by using different strategies. One strategy was to develop a maneuver extraction algorithm that used handcrafted rules. Another strategy was to represent the time series data by statistical measures. There was also an issue of an imbalanced dataset, where one class far outweighed others in number of samples. This was solved by using a modified oversampling method on the dataset that was used for training. Logistic Regression, Support Vector Machines with both linear and nonlinear kernels, and Artifical Neural Networks were explored, where the hyperparameters for each machine learning algorithm were chosen during model estimation by 4-fold cross-validation and solving an optimization problem based on important performance metrics. A feature selection algorithm was also used during model estimation to evaluate how the performance changes depending on how many features were used. The machine learning models were then evaluated on test data consisting of 24 flight tests. The results given by the test data set showed that the simplifications done were reasonable, but the maneuver extraction algorithm could sometimes fail. Some maneuvers were easier to identify than others and the linear machine learning models resulted in a poor fit to the more complex classes. In conclusion, both binary classifiers and multiclass classifiers could be used to solve the flight maneuver identification problem, and solving a hyperparameter optimization problem boosted the performance of the finalized models. Nonlinear classifiers performed the best on average across all explored maneuvers.

APA, Harvard, Vancouver, ISO, and other styles

39

Carrión, Brännström Robin. "Aggregating predictions using Non-Disclosed Conformal Prediction." Thesis, Uppsala universitet, Statistiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385098.

Full text

Abstract:

When data are stored in different locations and pooling of such data is not allowed, there is an informational loss when doing predictive modeling. In this thesis, a new method called Non-Disclosed Conformal Prediction (NDCP) is adapted into a regression setting, such that predictions and prediction intervals can be aggregated from different data sources without interchanging any data. The method is built upon the Conformal Prediction framework, which produces predictions with confidence measures on top of any machine learning method. The method is evaluated on regression benchmark data sets using Support Vector Regression, with different sizes and settings for the data sources, to simulate real life scenarios. The results show that the method produces conservatively valid prediction intervals even though in some settings, the individual data sources do not manage to create valid intervals. NDCP also creates more stable intervals than the individual data sources. Thanks to its straightforward implementation, data owners which cannot share data but would like to contribute to predictive modeling, would benefit from using this method.

APA, Harvard, Vancouver, ISO, and other styles

40

Simões, Rodolfo da Silva. "Técnicas de transferência de aprendizagem aplicadas a modelos QSAR para regressão." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-07062018-120939/.

Full text

Abstract:

Para desenvolver um novo medicamento, pesquisadores devem analisar os alvos biológicos de uma dada doença, descobrir e desenvolver candidatos a fármacos para este alvo biológico, realizando em paralelo, testes em laboratório para validar a eficiência e os efeitos colaterais da substância química. O estudo quantitativo da relação estrutura-atividade (QSAR) envolve a construção de modelos de regressão que relacionam um conjunto de descritores de um composto químico e a sua atividade biológica com relação a um ou mais alvos no organismo. Os conjuntos de dados manipulados pelos pesquisadores para análise QSAR são caracterizados geralmente por um número pequeno de instâncias e isso torna mais complexa a construção de modelos preditivos. Nesse contexto, a transferência de conhecimento utilizando informações de outros modelos QSAR\'s com mais dados disponíveis para o mesmo alvo biológico seria desejável, diminuindo o esforço e o custo do processo para gerar novos modelos de descritores de compostos químicos. Este trabalho apresenta uma abordagem de transferência de aprendizagem indutiva (por parâmetros), tal proposta baseia-se em uma variação do método de Regressão por Vetores Suporte adaptado para transferência de aprendizagem, a qual é alcançada ao aproximar os modelos gerados separadamente para cada tarefa em questão. Considera-se também um método de transferência de aprendizagem por instâncias, denominado de TrAdaBoost. Resultados experimentais mostram que as abordagens de transferência de aprendizagem apresentam bom desempenho quando aplicadas a conjuntos de dados de benchmark e a conjuntos de dados químicos<br>To develop a new medicament, researches must analyze the biological targets of a given disease, discover and develop drug candidates for this biological target, performing in parallel, biological tests in laboratory to validate the effectiveness and side effects of the chemical substance. The quantitative study of structure-activity relationship (QSAR) involves building regression models that relate a set of descriptors of a chemical compound and its biological activity with respect to one or more targets in the organism. Datasets manipulated by researchers to QSAR analysis are generally characterized by a small number of instances and this makes it more complex to build predictive models. In this context, the transfer of knowledge using information other\'s QSAR models with more data available to the same biological target would be desirable, nince its reduces the effort and cost to generate models of chemical descriptors. This work presents an inductive learning transfer approach (by parameters), such proposal is based on a variation of the Vector Regression method Adapted support for learning transfer, which is achieved by approaching the separately generated models for each task. It is also considered a method of learning transfer by instances, called TrAdaBoost. Experimental results show that learning transfer approaches perform well when applied to some datasets of benchmark and dataset chemical

APA, Harvard, Vancouver, ISO, and other styles

41

Granström, Daria, and Johan Abrahamsson. "Loan Default Prediction using Supervised Machine Learning Algorithms." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252312.

Full text

Abstract:

It is essential for a bank to estimate the credit risk it carries and the magnitude of exposure it has in case of non-performing customers. Estimation of this kind of risk has been done by statistical methods through decades and with respect to recent development in the field of machine learning, there has been an interest in investigating if machine learning techniques can perform better quantification of the risk. The aim of this thesis is to examine which method from a chosen set of machine learning techniques exhibits the best performance in default prediction with regards to chosen model evaluation parameters. The investigated techniques were Logistic Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificial Neural Network and Support Vector Machine. An oversampling technique called SMOTE was implemented in order to treat the imbalance between classes for the response variable. The results showed that XGBoost without implementation of SMOTE obtained the best result with respect to the chosen model evaluation metric.<br>Det är nödvändigt för en bank att ha en bra uppskattning på hur stor risk den bär med avseende på kunders fallissemang. Olika statistiska metoder har använts för att estimera denna risk, men med den nuvarande utvecklingen inom maskininlärningsområdet har det väckt ett intesse att utforska om maskininlärningsmetoder kan förbättra kvaliteten på riskuppskattningen. Syftet med denna avhandling är att undersöka vilken metod av de implementerade maskininlärningsmetoderna presterar bäst för modellering av fallissemangprediktion med avseende på valda modelvaldieringsparametrar. De implementerade metoderna var Logistisk Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificiella neurala nätverk och Stödvektormaskin. En översamplingsteknik, SMOTE, användes för att behandla obalansen i klassfördelningen för svarsvariabeln. Resultatet blev följande: XGBoost utan implementering av SMOTE visade bäst resultat med avseende på den valda metriken.

APA, Harvard, Vancouver, ISO, and other styles

42

Melo, Davyd Bandeira de. "Algoritmos de aprendizagem para aproximaÃÃo da cinemÃtica inversa de robÃs manipuladores: um estudo comparativo." Universidade Federal do CearÃ, 2015. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=16997.

Full text

Abstract:

In this dissertation it is reported the results of a comprehensive comparative study involving seven machine learning algorithms applied to the task of approximating the inverse kinematic model of 3 robotic arms (planar, PUMA 560 and Motoman HP6). The evaluated algorithm are the following ones: Multilayer Perceptron (MLP), Extreme Learning Machine (ELM), Least Squares Support Vector Regression (LS-SVR), Minimal Learning Machine (MLM), Gaussian Processes (GP), Adaptive Network-Based Fuzzy Inference Systems (ANFIS) and Local Linear Mapping (LLM). Each algorithm is evaluated with respect to its accuracy in estimating the joint angles given the cartesian coordinates which comprise end-effector trajectories within the robot workspace. A comprehensive evaluation of the performances of the aforementioned algorithms is carried out based on correlation analysis of the residuals. Finally, hypothesis testing procedures are also executed in order to verifying if there are significant differences in performance among the best algorithms.<br>Nesta dissertaÃÃo sÃo reportados os resultados de um amplo estudo comparativo envolvendo sete algoritmos de aprendizado de mÃquinas aplicados Ã tarefa de aproximaÃÃo do modelo cinemÃtico inverso de 3 robÃs manipuladores (planar, PUMA 560 e Motoman HP6). Os algoritmos avaliados sÃo os seguintes: Perceptron Multicamadas (MLP), MÃquina de Aprendizado Extremo (ELM), RegressÃo de MÃnimos Quadrados via Vetores-Suporte (LS-SVR), MÃquina de Aprendizado MÃnimo (MLM), Processos Gaussianos (PG), Sistema de InferÃncia Fuzzy Baseado em Rede Adaptativa (ANFIS) e Mapeamento Linear Local (LLM). Estes algoritmos sÃo avaliados quanto Ã acurÃcia na estimaÃÃo dos Ãngulos das juntas dos robÃs manipuladores em experimentos envolvendo a geraÃÃo de vÃrios tipos de trajetÃrias no volume de trabalho dos referidos robÃs. Uma avaliaÃÃo abrangente do desempenho de cada algoritmo Ã feito com base na anÃlise dos resÃduos e testes de hipÃteses sÃo executados para verificar se hÃ diferenÃas significativas entre os desempenhos dos melhores algoritmos.

APA, Harvard, Vancouver, ISO, and other styles

43

Axén, Maja, and Jennifer Karlberg. "Binary Classification for Predicting Customer Churn." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-171892.

Full text

Abstract:

Predicting when a customer is about to turn to a competitor can be difficult, yet extremely valuable from a business perspective. The moment a customer stops being considered a customer is known as churn, a widely researched topic in several industries when dealing with subscription-services. However, in industries with non-subscription services and products, defining churn can be a daunting task and the existing literature does not fully cover this field. Therefore, this thesis can be seen as a contribution to current research, specially when not having a set definition for churn. A definition for churn, adjusted to DIAKRIT’s business, is created. DIAKRIT is a company working in the real estate industry, which faces many challenges, such as a huge seasonality. The prediction was approached as a supervised problem, where three different Machine Learning methods were used: Logistic Regression, Random Forest and Support Vector Machine. The variables used in the predictions are predominantly activity data. With a relatively high accuracy and AUC-score, Random Forest was concluded to be the most reliable model. It is however clear that the model cannot separate between the classes perfectly. It was also visible that the Random Forest model produces a relatively high precision. Thereby, it can be settled that even though the model is not flawless the customers predicted to churn are very likely to churn.<br>Att prediktera när en kund är påväg att vända sig till en konkurrent kan vara svårt, dock kan det visa sig extremt värdefullt ur ett affärsperspektiv. När en kund slutar vara kund benäms det ofta som kundbortfall eller ”churn”. Detta är ett ämne som är brett forskat på i flertalet olika industrier, men då ofta i situationer med prenumenationstjänster. När man inte har en prenumerationstjänst försvåras uppgiften att definera churn och existerande studier brister i att analysera detta. Denna uppsats kan därför ses som ett bidrag till nuvarande litteratur, i synnerhet i fall där ingen tydlig definition för churn existerar. En definition för churn, anpassad efter DIAKRIT och deras affärsstruktur har skapats i det här projektet. DIAKRIT är verksamma i fastighetsbranschen, en industri som har flera utmaningar, bland annat en extrem säsongsvariaton. För att genomföra prediktionerna användes tre olika maskininlärningamodeller: Logistisk Regression, Random Forest och Support Vector Machine. De variabler som användes är mestadels aktivitetsdata. Med relativt hög noggranhet och AUC-värde anses Random Forest vara mest pålitlig. Modellen kan dock inte separera mellan de två klasserna perfekt. Random Forest modellen visade sig också genera en hög precision. Därför kan slutsatsen dras att även om modellen inte är felfri verkar det som att kunderna predikterade som churn mest sannolikt kommer churna.

APA, Harvard, Vancouver, ISO, and other styles

44

Okujeni, Akpona. "Quantifying urban land cover by means of machine learning and imaging spectrometer data at multiple spatial scales." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät, 2014. http://dx.doi.org/10.18452/17082.

Full text

Abstract:

Das weltweite Ausmaß der Urbanisierung zählt zu den großen ökologischen Herausforderungen des 21. Jahrhunderts. Die Fernerkundung bietet die Möglichkeit das Verständnis dieses Prozesses und seiner Auswirkungen zu erweitern. Der Fokus dieser Arbeit lag in der Quantifizierung der städtischen Landbedeckung mittels Maschinellen Lernens und räumlich unterschiedlich aufgelöster Hyperspektraldaten. Untersuchungen berücksichtigten innovative methodische Entwicklungen und neue Möglichkeiten, die durch die bevorstehende Satellitenmission EnMAP geschaffen werden. Auf Basis von Bilder des flugzeugestützten HyMap Sensors mit Auflösungen von 3,6 m und 9 m sowie simulierten EnMAP-Daten mit einer Auflösung von 30 m wurde eine Kartierung entlang des Stadt-Umland-Gradienten Berlins durchgeführt. Im ersten Teil der Arbeit wurde die Kombination von Support Vektor Regression mit synthetischen Trainingsdaten für die Subpixelkartierung eingeführt. Ergebnisse zeigen, dass sich der Ansatz gut zur Quantifizierung thematisch relevanter und spektral komplexer Oberflächenarten eignet, dass er verbesserte Ergebnisse gegenüber weiteren Subpixelverfahren erzielt, und sich als universell einsetzbar hinsichtlich der räumlichen Auflösung erweist. Im zweiten Teil der Arbeit wurde der Wert zukünftiger EnMAP-Daten für die städtische Fernerkundung abgeschätzt. Detaillierte Untersuchungen unterstreichen deren Eignung für eine verbesserte und erweiterte Beschreibung der Stadt nach dem bewährten Vegetation-Impervious-Soil-Schema. Analysen der Möglichkeiten und Grenzen zeigen sowohl Nachteile durch die höhere Anzahl von Mischpixel im Vergleich zu hyperspektralen Flugzeugdaten als auch Vorteile aufgrund der verbesserten Differenzierung städtischer Materialien im Vergleich zu multispektralen Daten. Insgesamt veranschaulicht diese Arbeit, dass die Kombination von hyperspektraler Satellitenbildfernerkundung mit Methoden des Maschinellen Lernens eine neue Qualität in die städtische Fernerkundung bringen kann.<br>The global dimension of urbanization constitutes a great environmental challenge for the 21st century. Remote sensing is a valuable Earth observation tool, which helps to better understand this process and its ecological implications. The focus of this work was to quantify urban land cover by means of machine learning and imaging spectrometer data at multiple spatial scales. Experiments considered innovative methodological developments and novel opportunities in urban research that will be created by the upcoming hyperspectral satellite mission EnMAP. Airborne HyMap data at 3.6 m and 9 m resolution and simulated EnMAP data at 30 m resolution were used to map land cover along an urban-rural gradient of Berlin. In the first part of this work, the combination of support vector regression with synthetically mixed training data was introduced as sub-pixel mapping technique. Results demonstrate that the approach performs well in quantifying thematically meaningful yet spectrally challenging surface types. The method proves to be both superior to other sub-pixel mapping approaches and universally applicable with respect to changes in spatial scales. In the second part of this work, the value of future EnMAP data for urban remote sensing was evaluated. Detailed explorations on simulated data demonstrate their suitability for improving and extending the approved vegetation-impervious-soil mapping scheme. Comprehensive analyses of benefits and limitations of EnMAP data reveal both challenges caused by the high numbers of mixed pixels, when compared to hyperspectral airborne imagery, and improvements due to the greater material discrimination capability when compared to multispectral spaceborne imagery. In summary, findings demonstrate how combining spaceborne imaging spectrometry and machine learning techniques could introduce a new quality to the field of urban remote sensing.

APA, Harvard, Vancouver, ISO, and other styles

45

Gebresilassie, Mesele Atsbeha. "Spatio-temporal Traffic Flow Prediction." Thesis, KTH, Geoinformatik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-212323.

Full text

Abstract:

The advancement in computational intelligence and computational power and the explosionof traffic data continues to drive the development and use of Intelligent TransportSystem and smart mobility applications. As one of the fundamental components of IntelligentTransport Systems, traffic flow prediction research has been advancing from theclassical statistical and time-series based techniques to data–driven methods mainly employingdata mining and machine learning algorithms. However, significant number oftraffic flow prediction studies have overlooked the impact of road network topology ontraffic flow. Thus, the main objective of this research is to show that traffic flow predictionproblems are not only affected by temporal trends of flow history, but also by roadnetwork topology by developing prediction methods in the spatio-temporal.In this study, time–series operators and data mining techniques are used by definingfive partially overlapping relative temporal offsets to capture temporal trends in sequencesof non-overlapping history windows defined on stream of historical record of traffic flowdata. To develop prediction models, two sets of modeling approaches based on LinearRegression and Support Vector Machine for Regression are proposed. In the modelingprocess, an orthogonal linear transformation of input data using Principal ComponentAnalysis is employed to avoid any potential problem of multicollinearity and dimensionalitycurse. Moreover, to incorporate the impact of road network topology in thetraffic flow of individual road segments, shortest path network–distance based distancedecay function is used to compute weights of neighboring road segment based on theprinciple of First Law of Geography. Accordingly, (a) Linear Regression on IndividualSensors (LR-IS), (b) Joint Linear Regression on Set of Sensors (JLR), (c) Joint LinearRegression on Set of Sensors with PCA (JLR-PCA) and (d) Spatially Weighted Regressionon Set of Sensors (SWR) models are proposed. To achieve robust non-linear learning,Support Vector Machine for Regression (SVMR) based models are also proposed.Thus, (a) SVMR for Individual Sensors (SVMR-IS), (b) Joint SVMR for Set of Sensors(JSVMR), (c) Joint SVMR for Set of Sensors with PCA (JSVMR-PCA) and (d) SpatiallyWeighted SVMR (SWSVMR) models are proposed. All the models are evaluatedusing the data sets from 2010 IEEE ICDM international contest acquired from TrafficSimulation Framework (TSF) developed based on the NagelSchreckenberg model.Taking the competition’s best solutions as a benchmark, even though different setsof validation data might have been used, based on k–fold cross validation method, withthe exception of SVMR-IS, all the proposed models in this study provide higher predictionaccuracy in terms of RMSE. The models that incorporated all neighboring sensorsdata into the learning process indicate the existence of potential interdependence amonginterconnected roads segments. The spatially weighted model in SVMR (SWSVMR) revealedthat road network topology has clear impact on traffic flow shown by the varyingand improved prediction accuracy of road segments that have more neighbors in a closeproximity. However, the linear regression based models have shown slightly low coefficientof determination indicating to the use of non-linear learning methods. The resultsof this study also imply that the approaches adopted for feature construction in this studyare effective, and the spatial weighting scheme designed is realistic. Hence, road networktopology is an intrinsic characteristic of traffic flow so that prediction models should takeit into consideration.

APA, Harvard, Vancouver, ISO, and other styles

46

Janson, Lisa, and Minna Mathisson. "Data mining inom tillverkningsindustrin : En fallstudie om möjligheten att förutspå kvalitetsutfall i produktionslinjer." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301246.

Full text

Abstract:

I detta arbete har en fallstudie utförts på Volvo Group i Köping. I takt med ¨övergången till industri 4.0, ökar möjligheterna att använda maskininlärning som ett verktyg i analysen av industriell data och vidareutvecklingen av industriproduktionen. Detta arbete syftar till att undersöka möjligheten att förutspå kvalitetsutfall vid sammanpressning av nav och huvudaxel. Metoden innefattar implementering av tre maskininlärningsmodeller samt evaluering av dess prestation i förhållande till varandra. Vid applicering av modellerna på monteringsdata från fabriken erhölls ett bristfälligt resultat, vilket indikerar att det utifrån de inkluderade variablerna inte är möjligt att förutspå kvalitetsutfallet. Orsakerna som låg till grund för resultatet granskades, och det resulterade i att det förmodligen berodde på att modellerna var oförmögna att finna samband i datan eller att det inte fanns något samband i datasetet. För att avgöra vilken av dessa två faktorer som var avgörande skapades ett fabricerat dataset där tre nya variabler introducerades. De fabricerade värdena på dessa variabler skapades på sådant sätt att det fanns syntetisk kausalitet mellan två av variablerna och kvalitetsutfallet. Vid applicering av modellerna på den fabricerade datan, lyckades samtliga modeller identifiera det syntetiska sambandet. Utifrån det drogs slutsatsen att det bristfälliga resultatet inte berodde på modellernas prestation utan att det inte fanns något samband i datasetet bestående av verklig monteringsdata. Det här bidrog till bedömningen att om spårbarheten på komponenterna hade ökat i framtiden, i kombination med att fler maskiner i produktionslinjen genererade data till ett sammankopplat system, skulle denna studie kunna utföras igen, men med fler variabler och ett större dataset. Support vector machine var den modell som presterade bäst, givet de prestationsmått som användes i denna studie. Det faktum att modellerna som inkluderats i den här studien lyckades identifiera sambandet i datan, när det fanns vetskap om att sambandet existerade, motiverar användandet av dessa modeller i framtida studier. Avslutningsvis kan det konstateras att med förbättrad spårbarhet och en allt mer uppkopplad fabrik, finns det möjlighet att använda maskininlärningsmodeller som komponenter i större system för att kunna uppnå effektiviseringar.<br>As the adaptation towards Industry 4.0 proceeds, the possibility of using machine learning as a tool for further development of industrial production, becomes increasingly profound. In this paper, a case study has been conducted at Volvo Group in Köping, in order to investigate the wherewithals of predicting quality outcomes in the compression of hub and mainshaft. In the conduction of this study, three different machine learning models were implemented and compared amongst each other. A dataset containing data from Volvo’s production site in Köping was utilized when training and evaluating the models. However, the low evaluation scores acquired from this, indicate that the quality outcome of the compression could not be predicted given solely the variables included in that dataset. Therefore, a dataset containing three additional variables consisting of fabricated values and a known causality between two of the variables and the quality outcome, was also utilized. The purpose of this was to investigate whether the poor evaluation metrics resulted from a non-existent pattern between the included variables and the quality outcome, or from the models not being able to find the pattern. The performance of the models, when trained and evaluated on the fabricated dataset, indicate that the models were in fact able to find the pattern that was known to exist. Support vector machine was the model that performed best, given the evaluation metrics that were chosen in this study. Consequently, if the traceability of the components were to be enhanced in the future and an additional number of machines in the production line would transmit production data to a connected system, it would be possible to conduct the study again with additional variables and a larger data set. The fact that the models included in this study succeeded in finding patterns in the dataset when such patterns were known to exist, motivates the use of the same models. Furthermore, it can be concluded that with enhanced traceability of the components and a larger amount of machines transmitting production data to a connected system, there is a possibility that machine learning models could be utilized as components in larger business monitoring systems, in order to achieve efficiencies.

APA, Harvard, Vancouver, ISO, and other styles

47

Jiao, Weiwei. "Predictive Analysis for Trauma Patient Readmission Database." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492718909631318.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Alquier, Pierre. "Transductive and inductive adaptative inference for regression and density estimation." Paris 6, 2006. http://www.theses.fr/2006PA066436.

Full text

Abstract:

Inférence Adaptative, Inductive et Transductive, pour l'Estimation de la Régression et de la Densité (Pierre Alquier) Cette thèse a pour objet l'étude des propriétés statistiques de certains algorithmes d'apprentissage dans le cas de l'estimation de la régression et de la densité. Elle est divisée en trois parties. La première partie consiste en une généralisation des théorèmes PAC-Bayésiens, sur la classification, d'Olivier Catoni, au cas de la régression avec une fonction de perte générale. Dans la seconde partie, on étudie plus particulièrement le cas de la régression aux moindres carrés et on propose un nouvel algorithme de sélection de variables. Cette méthode peut être appliquée notamment au cas d'une base de fonctions orthonormales, et conduit alors à des vitesses de convergence optimales, mais aussi au cas de fonctions de type noyau, elle conduit alors à une variante des méthodes dites "machines à vecteurs supports" (SVM). La troisième partie étend les résultats de la seconde au cas de l'estimation de densité avec perte quadratique.

APA, Harvard, Vancouver, ISO, and other styles

49

Andersson, Martin, and Marcus Mazouch. "Binary classification for predicting propensity to buy flight tickets. : A study on whether binary classification can be used to predict Scandinavian Airlines customers’ propensity to buy a flight ticket within the next seven days." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160855.

Full text

Abstract:

A customers propensity to buy a certain product is a widely researched field and is applied in multiple industries. In this thesis it is showed that using binary classification on data from Scandinavian Airlines can predict their customers propensity to book a flight within the next coming seven days. A comparison between logistic regression and support vector machine is presented and logistic regression with reduced number of variables is chosen as the final model, due to it’s simplicity and accuracy. The explanatory variables contains exclusively booking history, whilst customer demographics and search history is showed to be insignificant.<br>En kunds benägenhet att göra ett visst köp är ett allmänt undersökt område som applicerats i flera olika branscher. I den här studien visas det att statistiska binära klassificeringsmodeller kan användas för att prediktera Scandinavian Airlines kunders benägenhet att köpa en resa de kommande sju dagarna. En jämförelse är presenterad mellan logistisk regression och stödvektormaskin och logistisk regression med reducerat antal parametrar väljs som den slutgiltiga modellen tack vare sin enkelhet och träffsäkerhet. De förklarande variablerna är uteslutande bokningshistorik medan kundens demografi och sökdata visas vara insignifikant.

APA, Harvard, Vancouver, ISO, and other styles

50

Deivard, Johannes. "How accuracy of estimated glottal flow waveforms affects spoofed speech detection performance." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-48414.

Full text

Abstract:

In the domain of automatic speaker verification, one of the challenges is to keep the malevolent people out of the system. One way to do this is to create algorithms that are supposed to detect spoofed speech. There are several types of spoofed speech and several ways to detect them, one of which is to look at the glottal flow waveform (GFW) of a speech signal. This waveform is often estimated using glottal inverse filtering (GIF), since, in order to create the ground truth GFW, special invasive equipment is required. To the author’s knowledge, no research has been done where the correlation of GFW accuracy and spoofed speech detection (SSD) performance is investigated. This thesis tries to find out if the aforementioned correlation exists or not. First, the performance of different GIF methods is evaluated, then simple SSD machine learning (ML) models are trained and evaluated based on their macro average precision. The ML models use different datasets composed of parametrized GFWs estimated with the GIF methods from the previous step. Results from the previous tasks are then combined in order to spot any correlations. The evaluations of the different methods showed that they created GFWs of varying accuracy. The different machine learning models also showed varying performance depending on what type of dataset that was being used. However, when combining the results, no obvious correlations between GFW accuracy and SSD performance were detected. This suggests that the overall accuracy of a GFW is not a substantial factor in the performance of machine learning-based SSD algorithms.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!