Log in

Relevant bibliographies by topics / Machine learning analysis / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Machine learning analysis.

Dissertations / Theses on the topic 'Machine learning analysis'

Author: Grafiati

Published: 4 June 2025

Last updated: 23 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Machine learning analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Bengtsson, Sebastian. "MACHINE LEARNING FOR MECHANICAL ANALYSIS." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44325.

Full text

Abstract:

It is not reliable to depend on a persons inference on dense data of high dimensionality on a daily basis. A person will grow tired or become distracted and make mistakes over time. Therefore it is desirable to study the feasibility of replacing a persons inference with that of Machine Learning in order to improve reliability. One-Class Support Vector Machines (SVM) with three different kernels (linear, Gaussian and polynomial) are implemented and tested for Anomaly Detection. Principal Component Analysis is used for dimensionality reduction and autoencoders are used with the intention to increase performance. Standard soft-margin SVMs were used for multi-class classification by utilizing the 1vsAll and 1vs1 approaches with the same kernels as for the one-class SVMs. The results for the one-class SVMs and the multi-class SVM methods are compared against each other within their respective applications but also against the performance of Back-Propagation Neural Networks of varying sizes. One-Class SVMs proved very effective in detecting anomalous samples once both Principal Component Analysis and autoencoders had been applied. Standard SVMs with Principal Component Analysis produced promising classification results. Twin SVMs were researched as an alternative to standard SVMs.

APA, Harvard, Vancouver, ISO, and other styles

2

Roderus, Jens, Simon Larson, and Eric Pihl. "Hadoop scalability evaluation for machine learning algorithms on physical machines : Parallel machine learning on computing clusters." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20102.

Full text

Abstract:

The amount of available data has allowed the field of machine learning to flourish. But with growing data set sizes comes an increase in algorithm execution times. Cluster computing frameworks provide tools for distributing data and processing power on several computer nodes and allows for algorithms to run in feasible time frames when data sets are large. Different cluster computing frameworks come with different trade-offs. In this thesis, the scalability of the execution time of machine learning algorithms running on the Hadoop cluster computing framework is investigated. A recent version of Hadoop and algorithms relevant in industry machine learning, namely K-means, latent Dirichlet allocation and naive Bayes are used in the experiments. This paper provides valuable information to anyone choosing between different cluster computing frameworks. The results show everything from moderate scalability to no scalability at all. These results indicate that Hadoop as a framework may have serious restrictions in how well tasks are actually parallelized. Possible scalability improvements could be achieved by modifying the machine learning library algorithms or by Hadoop parameter tuning.

APA, Harvard, Vancouver, ISO, and other styles

3

Wood, Ian Andrew. "Boltzmann machine learning : analysis and improvements /." St. Lucia, Qld, 2003. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe17753.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Agrawal, Punit. "Program navigation analysis using machine learning." Thesis, McGill University, 2009. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=32599.

Full text

Abstract:

Developers invest a large portion of their development time exploring program source code to find task-related code elements and to understand the context of their task. The task context is usually not recorded at the end of the task and is forgotten over time. Similarly, it is not possible to share the task context with other developers working on related tasks. Proposed solutions to automatically record the summary of the code investigation suffer from methodological limitations related to the techniques and the data sources used to generate the summary as well as the granularity at which it is generated. To overcome these limitations, we investigate the use of machine learning techniques, in particular decision tree learning, to predict automatically the task context from session navigation transcripts obtained from developers performing tasks on the source code. We conducted a user study to collect navigation transcripts from developers engaged in source code exploration tasks. We used the data from the user study to train and test decision tree classifiers. We compared the decision tree algorithm with two existing approaches, and found that it compares positively in most cases. Additionally, we developed an Eclipse plug-in that generates automatically a developer session summary using the decision tree classifier learned from the data collected during the user study. We provide qualitative analysis of the effectiveness of this plug-in.<br>Les d\'eveloppeurs de logiciels investissent une grande partie de leur temps \`a explorer le code source pour trouver des \'el\'ements du code reli\'es \`a leurs t\^aches, et aussi pour mieux comprendre le contexte de leur t\^ache. Le contexte de leur t\^ache n'est g\'en\'eralement pas enregistr\'ee \`a la fin de leur s\'eance d'exploration de code et est oubli\'e au fil du temps. De m\^eme, il n'est pas possible de partager le contexte de leur t\^ache avec d'autres d\'eveloppeurs travaillant sur des t\^aches reli\'ees. Les solutions propos\'ees pour enregistrer automatiquement le r\'esum\'e de leur exploration du code souffrent de limitations m\'ethodologiques li\'ees aux techniques et aux sources de donn\'ees utilis\'ees pour g\'en\'erer le r\'esum\'e, ainsi qu'\`a la granularit\'e \`a laquelle il est g\'en\'er\'e. Pour surmonter ces limitations, nous \'etudions l'emploi de techniques d'apprentissage machine, en particulier l'arbre de d\'ecision d'apprentissage, pour pr\'evoir automatiquement le contexte de la t\^ache \`a partir des transcriptes de navigation d'une session d'exploration de code du d\'eveloppeur. Nous avons effectu\'e une \'etude de cas afin de recueillir des transcriptions de navigation g\'en\'er\'es par des d\'eveloppeurs lors de l'exploration du code source. Nous avons utilis\'e les donn\'ees de cette \'etude pour tester les classifications de l'arbre de d\'ecision. Nous avons compar\'e l'algorithme \`a arbre \`a d\'ecision avec deux approches existantes, et avons d\'emontr\'e que cette nouvelle approche se compare favorablement dans la plupart des cas. Additionnellement, nous avons d\'evelopp\'e un plug-in Eclipse qui g\'en\`ere automatiquement un

APA, Harvard, Vancouver, ISO, and other styles

5

Rogers, Simon David. "Machine learning techniques for microarray analysis." Thesis, University of Bristol, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.409426.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Alviste, Joosep Franz Moorits. "Deployment failure analysis using machine learning." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420321.

Full text

Abstract:

Manually diagnosing recurrent faults in software systems can be an inefficient use of time for engineers. Manual diagnosis of faults is commonly performed by inspecting system logs during the failure time. The DevOps engineers in Pipedrive, a SaaS business offering a sales CRM platform, have developed a simple regular-expression-based service for automatically classifying failed deployments. However, such a solution is not scalable, and a more sophisticated solution isrequired. In this thesis, log mining was used to automatically diagnose Pipedrive's failed deployments based on the deployment logs. Multiple log parsing and machine learning algorithms were compared based on the resulting log mining pipeline's F1 score. A proof of concept log mining pipeline was created that consisted of log parsing with the Drain algorithm, transforming the log files into event count vectors and finally training a random forest machine learning model to classify the deployment logs. The pipeline gave an F1 score of 0.75 when classifying testing data and a lower score of 0.65 when classifying the evaluation dataset.

APA, Harvard, Vancouver, ISO, and other styles

7

Nicolini, Olivier. "LIBS Multivariate Analysis with Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286595.

Full text

Abstract:

Laser-Induced Breakdown Spectroscopy (LIBS) is a spectroscopic technique used for chemical analysis of materials. By analyzing the spectrum obtained with this technique it is possible to understand the chemical composition of a sample. The possibility to analyze materials in a contactless and online fashion, without sample preparation make LIBS one of the most interesting techniques for chemical composition analysis. However, despite its intrinsic advantages, LIBS analysis suffers from poor accuracy and limited reproducibility of the results due to interference effects caused by the chemical composition of the sample or other experimental factors. How to improve the accuracy of the analysis by extracting useful information from LIBS high dimensionality data remains the main challenge of this technique. In the present work, with the purpose to propose a robust analysis method, I present a pipeline for multivariate regression on LIBS data composed of preprocessing, feature selection, and regression. First raw data is preprocessed by application of intensity filtering, normalization and baseline correction to mitigate the effect of interference factors such as laser energy fluctuations or the presence of baseline in the spectrum. Feature selection allows finding the most informative lines for an element that are then used as input in the subsequent regression phase to predict the element concentration. Partial Least Squares (PLS) and Elastic Net showed the best predictive ability among the regression methods investigated, while Interval PLS (iPLS) and Iterative Predictor Weighting PLS (IPW-PLS) proved to be the best feature selection algorithms for this type of data. By applying these feature selection algorithms on the full LIBS spectrum before regression with PLS or Elastic Net it is possible to get accurate predictions in a robust fashion.<br>Laser-Induced Breakdown Spectroscopy (LIBS) är en spektroskopisk teknik som används för kemisk analys av material. Genom att analysera det spektrum som erhållits med denna teknik är det möjligt att förstå den kemiska sammansättningen av ett prov. Möjligheten att analysera material på ett kontaktlöst och online sätt utan förberedelse av prov gör LIBS till en av de mest intressanta teknikerna för kemisk sammansättning analys. Trots dess inneboende fördelar lider LIBS-analysen av dålig noggrannhet och begränsad reproducerbarhet av resultaten på grund av interferenseffekter orsakade av provets kemiska sammansättning eller andra experimentella faktorer. Hur man kan förbättra analysens noggrannhet genom att extrahera användbar information från LIBS-data med hög dimensionering är fortfarande den största utmaningen med denna teknik. I det nuvarande arbetet, med syftet att föreslå en robust analysmetod, presenterar jag en pipeline för multivariat regression på LIBS-data som består av förbehandling, val av funktioner och regression. Första rådata förbehandlas genom tillämpning av intensitetsfiltrering, normalisering och baslinjekorrektion för att mildra effekten av interferensfaktorer såsom laserens energifluktuationer eller närvaron av baslinjen i spektrumet. Funktionsval gör det möjligt att hitta de mest informativa linjerna för ett element som sedan används som input i den efterföljande regressionsfasen för att förutsäga elementkoncentrationen. Partial Least Squares (PLS) och Elastic Net visade den bästa förutsägelseförmågan bland de undersökta regressionsmetoderna, medan Interval PLS (iPLS) och Iterative PredictorWeighting PLS (IPW-PLS) visade sig vara de bästa funktionsval algoritmerna för denna typ av data. Genom att tillämpa dessa funktionsval algoritmer på hela LIBS-spektrumet före regression med PLS eller Elastic Net är det möjligt att få exakta förutsägelser på ett robust sätt.

APA, Harvard, Vancouver, ISO, and other styles

8

Vu, Viet, and Mariaguadaloppe Farah. "Bank risk analysis with machine learning." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-44954.

Full text

Abstract:

Nowadays, the time and resources needed to get an accurate estima-tion of a client’s ability to pay back a loan has gone up. With theamount of data complexity it involves to do the credit risk analysis, the machine learning technique has been used to ease the process.To help a bank institute get a better insight into their client’s eco-nomic state. The thesis is to present a model that could help themfind interesting information using machine learning.With many clients having nonlinear income and expenses, it madethe machine learning algorithm of choosing, in this case, Linear Re-gression, very hard to predict an accurate output, the next month’ssalary. However, interesting relations between trends and the datahave been found.<br>\noindent Numera har tiden och resurserna som behövs för att få en exakt uppskattning av en kunds förmåga att betala tillbaka ett lån ökat. Med mängden datakomplexitet som det innebär att göra kredit riskanalys har maskininlärnings tekniken använts för att underlätta processen. För att hjälpa ett bank institutet att få en bättre inblick i kundens ekonomiska tillstånd. Avhandlingen är att presentera en modell som kan hjälpa dem att hitta intressant information med hjälp av maskininlärning. Med många kunder som har icke-linjära inkomster och kostnader gör det maskininlärnings algoritmen att välja, i detta fall linjär regression, mycket svårt att förutsäga en exakt produktion, nästa månadslön. Intressanta relationer mellan trender och data har dock hittats.

APA, Harvard, Vancouver, ISO, and other styles

9

Asta, Shahriar. "Machine learning for improving heuristic optimisation." Thesis, University of Nottingham, 2015. http://eprints.nottingham.ac.uk/34216/.

Full text

Abstract:

Heuristics, metaheuristics and hyper-heuristics are search methodologies which have been preferred by many researchers and practitioners for solving computationally hard combinatorial optimisation problems, whenever the exact methods fail to produce high quality solutions in a reasonable amount of time. In this thesis, we introduce an advanced machine learning technique, namely, tensor analysis, into the field of heuristic optimisation. We show how the relevant data should be collected in tensorial form, analysed and used during the search process. Four case studies are presented to illustrate the capability of single and multi-episode tensor analysis processing data with high and low abstraction levels for improving heuristic optimisation. A single episode tensor analysis using data at a high abstraction level is employed to improve an iterated multi-stage hyper-heuristic for cross-domain heuristic search. The empirical results across six different problem domains from a hyper-heuristic benchmark show that significant overall performance improvement is possible. A similar approach embedding a multi-episode tensor analysis is applied to the nurse rostering problem and evaluated on a benchmark of a diverse collection of instances, obtained from different hospitals across the world. The empirical results indicate the success of the tensor-based hyper-heuristic, improving upon the best-known solutions for four particular instances. Genetic algorithm is a nature inspired metaheuristic which uses a population of multiple interacting solutions during the search. Mutation is the key variation operator in a genetic algorithm and adjusts the diversity in a population throughout the evolutionary process. Often, a fixed mutation probability is used to perturb the value at each locus, representing a unique component of a given solution. A single episode tensor analysis using data with a low abstraction level is applied to an online bin packing problem, generating locus dependent mutation probabilities. The tensor approach improves the performance of a standard genetic algorithm on almost all instances, significantly. A multi-episode tensor analysis using data with a low abstraction level is embedded into multi-agent cooperative search approach. The empirical results once again show the success of the proposed approach on a benchmark of flow shop problem instances as compared to the approach which does not make use of tensor analysis. The tensor analysis can handle the data with different levels of abstraction leading to a learning approach which can be used within different types of heuristic optimisation methods based on different underlying design philosophies, indeed improving their overall performance.

APA, Harvard, Vancouver, ISO, and other styles

10

Tar, Paul David. "Quantitative planetary image analysis via machine learning." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/quantitative-planetary-image-analysis-via-machine-learning(9188b4ff-8e5d-4181-b389-1db3e2a85c7b).html.

Full text

Abstract:

Over recent decades enormous quantities of image data have been acquired from planetary missions. High resolution imagery is available for many of the inner planets, gas giant systems, and some asteroids and comets. Yet, the scientific value of these images will only be fully realised if sufficient analytic power can be applied to their large scale and detailed interpretation. Unfortunately, the quantity of data has now surpassed researchers' abilities to manually analyse each image, whilst available automated approaches are limited in their scope and reliability. To mitigate against this citizen science projects are becoming increasingly common allowing large numbers of volunteers, using web-based resources, to assist in image interpretation. Yet human involvement, expert or otherwise, introduces additional problems of subjectivity and consistency. This thesis argues that what is required is an objective, quantitative, automated alternative. This thesis advocates a quantitative approach to making automated measurements from a range of surface features, including varied terrains and the counting of impact craters. Existing pattern recognition systems, and established practices, found within the imaging science and machine learning communities will be critically assessed with reference to strict quantitative criteria. This criteria is designed to accommodate the needs of scientists wishing to undertake quantitative research into the evolution of planetary surfaces, permitting measurements to be used with confidence. A new and unique method of pattern recognition, facilitating the meaningful interpretation of extracted information, will be presented. What makes the new system unique is the inclusion of a comprehensive predictive theory of measurement errors and additional safeguards to ensure the trustworthiness and integrity of results. The resulting supervised machine learning/pattern recognition system is applied to Monte-Carlo distributions, martian image data and citizen science lunar crater data. Conclusions are drawn that applying such quantitative techniques in practice is difficult, but possible, given appropriately encoded data and application specific extensions to theories and methods. It is also concluded that existing imaging science practices and methods would benefit from a change in ethos towards a quantitative agenda, and that planetary scientists wishing to use such methods will need to develop an understanding of their properties and limitations.

APA, Harvard, Vancouver, ISO, and other styles

11

Gillblad, Daniel. "On practical machine learning and data analysis." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4788.

Full text

Abstract:

This thesis discusses and addresses some of the difficulties associated with practical machine learning and data analysis. Introducing data driven meth- ods in e. g. industrial and business applications can lead to large gains in productivity and efficiency, but the cost and complexity are often overwhelm- ing. Creating machine learning applications in practise often involves a large amount of manual labour, which often needs to be performed by an experi- enced analyst without significant experience with the application area. We will here discuss some of the hurdles faced in a typical analysis project and suggest measures and methods to simplify the process. One of the most important issues when applying machine learning meth- ods to complex data, such as e. g. industrial applications, is that the processes generating the data are modelled in an appropriate way. Relevant aspects have to be formalised and represented in a way that allow us to perform our calculations in an efficient manner. We present a statistical modelling framework, Hierarchical Graph Mixtures, based on a combination of graphi- cal models and mixture models. It allows us to create consistent, expressive statistical models that simplify the modelling of complex systems. Using a Bayesian approach, we allow for encoding of prior knowledge and make the models applicable in situations when relatively little data are available. Detecting structures in data, such as clusters and dependency structure, is very important both for understanding an application area and for speci- fying the structure of e. g. a hierarchical graph mixture. We will discuss how this structure can be extracted for sequential data. By using the inherent de- pendency structure of sequential data we construct an information theoretical measure of correlation that does not suffer from the problems most common correlation measures have with this type of data. In many diagnosis situations it is desirable to perform a classification in an iterative and interactive manner. The matter is often complicated by very limited amounts of knowledge and examples when a new system to be diag- nosed is initially brought into use. We describe how to create an incremental classification system based on a statistical model that is trained from empiri- cal data, and show how the limited available background information can still be used initially for a functioning diagnosis system. To minimise the effort with which results are achieved within data anal- ysis projects, we need to address not only the models used, but also the methodology and applications that can help simplify the process. We present a methodology for data preparation and a software library intended for rapid analysis, prototyping, and deployment. Finally, we will study a few example applications, presenting tasks within classification, prediction and anomaly detection. The examples include de- mand prediction for supply chain management, approximating complex simu- lators for increased speed in parameter optimisation, and fraud detection and classification within a media-on-demand system.<br>QC 20100727

APA, Harvard, Vancouver, ISO, and other styles

12

Bougher, Benjamin Bryan. "Machine learning applications to geophysical data analysis." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/58972.

Full text

Abstract:

The sedimentary layers of the Earth are a complex amorphous material formed from chaotic, turbulent, and random natural processes. Exploration geophysicists use a combination of assumptions, approximate physical models, and trained pattern recognition to extract useful information from complex remote sensing data such as seismic and well logs. In this thesis I investigate supervised and unsupervised machine learning models in geophysical data analysis and present two novel applications to exploration geophysics. Firstly, interpreted well logs from the Trenton-Black River study are used to train a classifier that results in a success rate of 67% at predicting stratigraphic units from gamma ray logs. I use the scattering transform, a multiscale analysis transform, to extract discriminating features to feed a K-nearest neighbour classifier. A second experiment frames a conventional pre-stack seismic data characterization workflow as an unsupervised machine learning problem that is free from physical assumptions. Conventionally, the Shuey model is used to fit the angle dependent reflectivity response of seismic data. I instead use principle component based approaches to learn projections from the data that improve classification. Results on the Marmousi II elastic model and an industry field dataset show that unsupervised learning models can be effective at segmenting hydrocarbon reservoirs from seismic data.<br>Science, Faculty of<br>Earth, Ocean and Atmospheric Sciences, Department of<br>Graduate

APA, Harvard, Vancouver, ISO, and other styles

13

Feffer, Michael A. (Michael Anthony). "Personalized machine learning for facial expression analysis." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119763.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.<br>This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Cataloged from student-submitted PDF version of thesis.<br>Includes bibliographical references (pages 35-36).<br>For this MEng Thesis Project, I investigated the personalization of deep convolutional networks for facial expression analysis. While prior work focused on population-based ("one-size-fits-all") models for prediction of affective states (valence/arousal), I constructed personalized versions of these models to improve upon state-of-the-art general models through solving a domain adaptation problem. This was done by starting with pre-trained deep models for face analysis and fine-tuning the last layers to specific subjects or subpopulations. For prediction, a "mixture of experts" (MoE) solution was employed to select the proper outputs based on the given input. The research questions answered in this project are: (1) What are the effects of model personalization on the estimation of valence and arousal from faces? (2) What is the amount of (un)supervised data needed to reach a target performance? Models produced in this research provide the foundation of a novel tool for personalized real-time estimation of target metrics.<br>by Michael A. Feffer.<br>M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

14

Cherrier, Noëlie. "Interpretable machine learning for CLAS12 data analysis." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASP017.

Full text

Abstract:

L’intelligence artificielle rencontre un succès indéniable dans de nombreuses applications, surtout depuis l’essor de l’apprentissage profond. Cependant, certaines de ces applications nécessitent une étude et une validation du raisonnement du modèle induit. C’est le cas en physique expérimentale : les performances des modèles sur les données réelles doivent être connues et maîtrisées, et leur raisonnement expliqué afin de permettre une validation par les pairs. Dans le cas particulier de l’expérience CLAS12 au Jefferson Laboratory, un faisceau d’électrons est envoyé sur une cible de protons afin d’en sonder la structure interne. Pour pouvoir accéder à certaines fonctions de structure du proton, un sous-ensemble des données récoltées doit être isolé correspondant à une réaction exclusive : la diffusion Compton profondément virtuelle. C’est sur la sélection de ces événements que porte cette thèse. Pour améliorer les techniques classiques d’analyse en physique, une approche utilisant des modèles de machine learning intrinsèquement interprétables, dits également transparents, est proposée. De cette façon, le fonctionnement du modèle peut être plus facilement compris et les erreurs de sélection maîtrisées et minimisées<br>Artificial intelligence is used massively in numerous applications, especially since the rise of deep learning techniques. However, some of these applications require a careful study and validation of the inducted model functioning. Considering experimental physics, the performances of the models on real data must be known and controlled, and their functioning explained to enable a validation via peer review. In the particular case of the CLAS12 experiment at Jefferson Laboratory, an electron beam is sent onto a proton target to probe its inner structure. To access certain structure functions of the proton, a subset of the collected data must be selected corresponding to an exclusive interaction: deeply virtual Compton scattering. This thesis focuses on this event selection. To improve the classical physics analysis, an approach exploiting intrinsically interpretable machine learning models, also called transparent models, is proposed. In this way, the functioning of the model is understood more easily and the selection errors are minimized and controlled

APA, Harvard, Vancouver, ISO, and other styles

15

Gabbur, Prasad. "Machine Learning Methods for Microarray Data Analysis." Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/195829.

Full text

Abstract:

Microarrays emerged in the 1990s as a consequence of the efforts to speed up the process of drug discovery. They revolutionized molecular biological research by enabling monitoring of thousands of genes together. Typical microarray experiments measure the expression levels of a large numberof genes on very few tissue samples. The resulting sparsity of data presents major challenges to statistical methods used to perform any kind of analysis on this data. This research posits that phenotypic classification and prediction serve as good objective functions for both optimization and evaluation of microarray data analysis methods. This is because classification measures whatis needed for diagnostics and provides quantitative performance measures such as leave-one-out (LOO) or held-out prediction accuracy and confidence. Under the classification framework, various microarray data normalization procedures are evaluated using a class label hypothesis testing framework and also employing Support Vector Machines (SVM) and linear discriminant based classifiers. A novel normalization technique based on minimizing the squared correlation coefficients between expression levels of gene pairs is proposed and evaluated along with the other methods. Our results suggest that most normalization methods helped classification on the datasets considered except the rank method, most likely due to its quantization effects.Another contribution of this research is in developing machine learning methods for incorporating an independent source of information, in the form of gene annotations, to analyze microarray data. Recently, genes of many organisms have been annotated with terms from a limited vocabulary called Gene Ontologies (GO), describing the genes' roles in various biological processes, molecular functions and their locations within the cell. Novel probabilistic generative models are proposed for clustering genes using both their expression levels and GO tags. These models are similar in essence to the ones used for multimodal data, such as images and words, with learning and inference done in a Bayesian framework. The multimodal generative models are used for phenotypic class prediction. More specifically, the problems of phenotype prediction for static gene expression data and state prediction for time-course data are emphasized. Using GO tags for organisms whose genes have been studied more comprehensively leads to an improvement in prediction. Our methods also have the potential to provide a way to assess the quality of available GO tags for the genes of various model organisms.

APA, Harvard, Vancouver, ISO, and other styles

16

Björk, Friström Viking. "Mapping of open-answers using machine learning." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-228616.

Full text

Abstract:

This thesis investigates if a model can be created to map misspelled answers from open-ended questions to a ﬁnite set of brands. The data used for the paper comes from the company Nepa that uses open-questions to measure brand-awareness and consists of misspelled answers and brands to be mapped to. A data structure called match candidate was created and consists of a misspelled answer and brand that it poten-tially be mapped to. Features for the match candidates were engineered and based on the edited distances, posterior probability and common misspellings among other. Multiple machine learning models were tested for classifying the match candidates as positive if the mapping was correct and negative otherwise. The model was tested in two scenarios, one when the answers in the training and testing data came from the same questions and secondly when they came from diﬀerent ones. Among the classiﬁers tested, the random forest model performed best in terms of PPV as well as sensitivity. The resulting mapping identiﬁed on average 92% of the misspelled answers and map then with 98% accuracy in the ﬁrst scenario. While in the second scenario 70% of the answers were identiﬁed with 95% conﬁdence in the mapping on average.<br>Detta examensarbete undersöker huruvida en modell kan skapas för att kartlägga fel-stavade svar till öppna frågor till ett ﬁnit set av företagsnamn. Datan till denna uppsats kommer ifrån företaget Nepa som använder öppna frågor för att mäta märkesmedvetenhet. Denna data består av öppna svar samt företagsnamn som dessa kan matchas till. En datastruktur skapades som kallas för match candidate och består av ett felstavat svar samt ett företagsnamn som svaret kan matchas med. Attribut skapades till match candidate och bygger bland annat på sträng likhet, aposteriorisan-nolikhet samt vanliga fel stavningar med mera. Ett ﬂertal maskininlärningsmodeller testades för att klassiﬁera match candidates som korrekt om och endast om svaret och företagsnamnet matchade och inkorrekt annars. Modellen testades i två olika scenarior. I det första kom datan som modellen tränade och testade på ifrån samma frågor. I det andra scenariot var det olika frågor som tränings och test data byggdes på. Av de maskininlärningsmodeller som testades så presterade radom forest modellen bäst i avseende på PPV och sensitivity. Den resulterande kartläggningen lyckades i genomsnitt identiﬁera 92% av alla felstavade svar och matchades i 98% till korrekt företagsnamn i det första scenariot. I det andra scenariot identiﬁera 70% av alla felstavade svar och matchades i 95% till korrekt företagsnamn i genomsnitt.

APA, Harvard, Vancouver, ISO, and other styles

17

Evans, Daniel T. "A SNP Microarray Analysis Pipeline Using Machine Learning Techniques." Ohio University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Rapadamnaba, Robert. "Uncertainty analysis, sensitivity analysis, and machine learning in cardiovascular biomechanics." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS058.

Full text

Abstract:

Cette thèse fait suite à une étude récente, menée par quelques chercheurs de l'Université de Montpellier, dans le but de proposer à la communauté scientifique une procédure d'inversion capable d'estimer de manière non invasive la pression dans les artères cérébrales d'un patient.Son premier objectif est, d'une part, d'examiner la précision et la robustesse de la procédure d'inversion proposée par ces chercheurs, en lien avec diverses sources d'incertitude liées aux modèles utilisés, aux hypothèses formulées et aux données cliniques du patient, et d'autre part, de fixer un critère d'arrêt pour l'algorithme basé sur le filtre de Kalman d'ensemble utilisé dans leur procédure d'inversion. À cet effet, une analyse d'incertitude et plusieurs analyses de sensibilité sont effectuées. Le second objectif est d'illustrer comment l'apprentissage machine, orienté réseaux de neurones convolutifs, peut être une très bonne alternative à la longue et coûteuse procédure mise en place par ces chercheurs pour l'estimation de la pression.Une approche prenant en compte les incertitudes liées au traitement des images médicales du patient et aux hypothèses formulées sur les modèles utilisés, telles que les hypothèses liées aux conditions limites, aux paramètres physiques et physiologiques, est d'abord présentée pour quantifier les incertitudes sur les résultats de la procédure. Les incertitudes liées à la segmentation des images sont modélisées à l'aide d'une distribution gaussienne et celles liées au choix des hypothèses de modélisation sont analysées en testant plusieurs scénarios de choix d'hypothèses possibles. De cette démarche, il ressort que les incertitudes sur les résultats de la procédure sont du même ordre de grandeur que celles liées aux erreurs de segmentation. Par ailleurs, cette analyse montre que les résultats de la procédure sont très sensibles aux hypothèses faites sur les conditions aux limites du modèle du flux sanguin. En particulier, le choix des conditions limites symétriques de Windkessel pour le modèle s'avère être le plus approprié pour le cas du patient étudié.Ensuite, une démarche permettant de classer les paramètres estimés à l'aide de la procédure par ordre d'importance et de fixer un critère d'arrêt pour l'algorithme utilisé dans cette procédure est proposée. Les résultats de cette stratégie montrent, d'une part, que la plupart des résistances proximales sont les paramètres les plus importants du modèle pour l'estimation du débit sanguin dans les carotides internes et, d'autre part, que l'algorithme d'inversion peut être arrêté dès qu'un certain seuil de convergence raisonnable de ces paramètres les plus influents est atteint.Enfin, une nouvelle plateforme numérique basée sur l'apprentissage machine permettant d'estimer la pression artérielle spécifique au patient dans les artères cérébrales beaucoup plus rapidement qu'avec la procédure d'inversion mais avec la même précision, est présentée. L'application de cette plateforme aux données du patient utilisées dans la procédure d'inversion permet une estimation non invasive et en temps réel de la pression dans les artères cérébrales du patient cohérente avec l'estimation de la procédure d'inversion<br>This thesis follows on from a recent study conducted by a few researchers from University of Montpellier, with the aim of proposing to the scientific community an inversion procedure capable of noninvasively estimating patient-specific blood pressure in cerebral arteries. Its first objective is, on the one hand, to examine the accuracy and robustness of the inversion procedure proposed by these researchers with respect to various sources of uncertainty related to the models used, formulated assumptions and patient-specific clinical data, and on the other hand, to set a stopping criterion for the ensemble Kalman filter based algorithm used in their inversion procedure. For this purpose, uncertainty analysis and several sensitivity analyses are carried out. The second objective is to illustrate how machine learning, mainly focusing on convolutional neural networks, can be a very good alternative to the time-consuming and costly inversion procedure implemented by these researchers for cerebral blood pressure estimation.An approach taking into account the uncertainties related to the patient-specific medical images processing and the blood flow model assumptions, such as assumptions about boundary conditions, physical and physiological parameters, is first presented to quantify uncertainties in the inversion procedure outcomes. Uncertainties related to medical images segmentation are modelled using a Gaussian distribution and uncertainties related to modeling assumptions choice are analyzed by considering several possible hypothesis choice scenarii. From this approach, it emerges that the uncertainties on the procedure results are of the same order of magnitude as those related to segmentation errors. Furthermore, this analysis shows that the procedure outcomes are very sensitive to the assumptions made about the model boundary conditions. In particular, the choice of the symmetrical Windkessel boundary conditions for the model proves to be the most relevant for the case of the patient under study.Next, an approach for ranking the parameters estimated during the inversion procedure in order of importance and setting a stopping criterion for the algorithm used in the inversion procedure is presented. The results of this strategy show, on the one hand, that most of the model proximal resistances are the most important parameters for blood flow estimation in the internal carotid arteries and, on the other hand, that the inversion algorithm can be stopped as soon as a certain reasonable convergence threshold for the most influential parameter is reached.Finally, a new numerical platform, based on machine learning and allowing to estimate the patient-specific blood pressure in the cerebral arteries much faster than with the inversion procedure but with the same accuracy, is presented. The application of this platform to the patient-specific data used in the inversion procedure provides noninvasive and real-time estimate of patient-specific cerebral pressure consistent with the inversion procedure estimation

APA, Harvard, Vancouver, ISO, and other styles

19

Badayos, Noah Garcia. "Machine Learning-Based Parameter Validation." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/47675.

Full text

Abstract:

As power system grids continue to grow in order to support an increasing energy demand, the system's behavior accordingly evolves, continuing to challenge designs for maintaining security. It has become apparent in the past few years that, as much as discovering vulnerabilities in the power network, accurate simulations are very critical. This study explores a classification method for validating simulation models, using disturbance measurements from phasor measurement units (PMU). The technique used employs the Random Forest learning algorithm to find a correlation between specific model parameter changes, and the variations in the dynamic response. Also, the measurements used for building and evaluating the classifiers were characterized using Prony decomposition. The generator model, consisting of an exciter, governor, and its standard parameters have been validated using short circuit faults. Single-error classifiers were first tested, where the accuracies of the classifiers built using positive, negative, and zero sequence measurements were compared. The negative sequence measurements have consistently produced the best classifiers, with majority of the parameter classes attaining F-measure accuracies greater than 90%. A multiple-parameter error technique for validation has also been developed and tested on standard generator parameters. Only a few target parameter classes had good accuracies in the presence of multiple parameter errors, but the results were enough to permit a sequential process of validation, where elimination of a highly detectable error can improve the accuracy of suspect errors dependent on the former's removal, and continuing the procedure until all corrections are covered.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

20

MINERVINI, MARCELLO. "Multi-sensor analysis and machine learning classification approach for diagnostics of electrical machines." Doctoral thesis, Università degli studi di Pavia, 2022. http://hdl.handle.net/11571/1464785.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Montgomery, Dean. "Improving radiotherapy using image analysis and machine learning." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/23554.

Full text

Abstract:

With ever increasing advancements in imaging, there is an increasing abundance of images being acquired in the clinical environment. However, this increase in information can be a burden as well as a blessing as it may require significant amounts of time to interpret the information contained in these images. Computer assisted evaluation is one way in which better use could be made of these images. This thesis presents the combination of texture analysis of images acquired during the treatment of cancer with machine learning in order to improve radiotherapy. The first application is to the prediction of radiation induced pneumonitis. In 13- 37% of cases, lung cancer patients treated with radiotherapy develop radiation induced lung disease, such as radiation induced pneumonitis. Three dimensional texture analysis, combined with patient-specific clinical parameters, were used to compute unique features. On radiotherapy planning CT data of 57 patients, (14 symptomatic, 43 asymptomatic), a Support Vector Machine (SVM) obtained an area under the receiver operator curve (AUROC) of 0.873 with sensitivity, specificity and accuracy of 92%, 72% and 87% respectively. Furthermore, it was demonstrated that a Decision Tree classifier was capable of a similar level of performance using sub-regions of the lung volume. The second application is related to prostate cancer identification. T2 MRI scans are used in the diagnosis of prostate cancer and in the identification of the primary cancer within the prostate gland. The manual identification of the cancer relies on the assessment of multiple scans and the integration of clinical information by a clinician. This requires considerable experience and time. As MRI becomes more integrated within the radiotherapy work flow and as adaptive radiotherapy (where the treatment plan is modified based on multi-modality image information acquired during or between RT fractions) develops it is timely to develop automatic segmentation techniques for reliably identifying cancerous regions. In this work a number of texture features were coupled with a supervised learning model for the automatic segmentation of the main cancerous focus in the prostate - the focal lesion. A mean AUROC of 0.713 was demonstrated with 10-fold stratified cross validation strategy on an aggregate data set. On a leave one case out basis a mean AUROC of 0.60 was achieved which resulted in a mean DICE coefficient of 0.710. These results showed that is was possible to delineate the focal lesion in the majority (11) of the 14 cases used in the study.

APA, Harvard, Vancouver, ISO, and other styles

22

Thomas, Sabin M. (Sabin Mammen). "A system analysis of improvements in machine learning." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/100386.

Full text

Abstract:

Thesis: S.M. in Engineering and Management, Massachusetts Institute of Technology, Engineering Systems Division, System Design and Management Program, February 2015.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 50-51).<br>Machine learning algorithms used for natural language processing (NLP) currently take too long to complete their learning function. This slow learning performance tends to make the model ineffective for an increasing requirement for real time applications such as voice transcription, language translation, text summarization topic extraction and sentiment analysis. Moreover, current implementations are run in an offline batch-mode operation and are unfit for real time needs. Newer machine learning algorithms are being designed that make better use of sampling and distributed methods to speed up the learning performance. In my thesis, I identify unmet market opportunities where machine learning is not employed in an optimum fashion. I will provide system level suggestions and analyses that could improve the performance, accuracy and relevance.<br>by Sabin M. Thomas.<br>S.M. in Engineering and Management

APA, Harvard, Vancouver, ISO, and other styles

23

Yuan, Danny. "Applications of machine learning : consumer credit risk analysis." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100614.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.<br>This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Cataloged from student-submitted PDF version of thesis.<br>Includes bibliographical references (pages 65-66).<br>Current credit bureau analytics, such as credit scores, are based on slowly varying consumer characteristics, and thus, they are not adaptable to changes in customers behaviors and market conditions over time. In this paper, we would like to apply machine-learning techniques to construct forecasting models of consumer credit risk. By aggregating credit accounts, credit bureau, and customer data given to us from a major commercial bank (which we will call the Bank, as per confidentiality agreement), we expect to be able to construct out-of-sample forecasts. The resulting models would be able to tackle common challenges faced by chief risk officers and policymakers, such as deciding when and how much to cut individuals account credit lines, evaluating the credit score for current and prospective customers, and forecasting aggregate consumer credit defaults and delinquencies for the purpose of enterprise-wide and macroprudential risk management.<br>by Danny Yuan.<br>M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

24

Kashif, Muhammad. "Analysis and Evaluation of Tiny Machine Learning applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

The aim of TinyML is to bring the capability of Machine Learning to ultra-low-power devices, typically under a milliwatt, and with this it breaks the traditional power barrier that prevents the widely distributed machine intelligence. TinyML allows greater reactivity and privacy by conducting inference on the computer and near-sensor while avoiding the energy cost associated with wireless communication, which is far higher at this scale than that of computing. In addition, TinyML’s efficiency makes a class of smart, battery-powered, always-on applications that can revolutionize the collection and processing of data in real time. This emerging field, which is the end of a lot of innovation, is ready to speed up its growth in the coming years. In this thesis, we deploy three model on a microcontroller. For the model, datasets are retrieved from an online repository and are preprocessed as per our requirement. The model is then trained on the split of preprocessed data at its best to get the most accuracy out of it. Later the trained model is converted to C language to make it possible to deploy on the microcontroller. Finally, we take step towards incorporating the model into the microcontroller by implementing and evaluating an interface for the user to utilize the microcontroller’s sensors. In our thesis, we will have 4 chapters. The first will give us an introduction of TinyML. The second chapter will help setup the TinyML Environment. The third chapter will be about a major use of TinyML in Wake Word Detection. The final chapter will deal with Gesture Recognition in TinyML.

APA, Harvard, Vancouver, ISO, and other styles

25

Nicolaou, Michael (Mihalis). "Machine learning for automatic analysis of affective behaviour." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/44543.

Full text

Abstract:

The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data'' era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ''ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.

APA, Harvard, Vancouver, ISO, and other styles

26

Öberg, Filip. "Football analysis using machine learning and computer vision." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85276.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Purcaro, Michael J. "Analysis, Visualization, and Machine Learning of Epigenomic Data." eScholarship@UMMS, 2017. https://escholarship.umassmed.edu/gsbs_diss/938.

Full text

Abstract:

The goal of the Encyclopedia of DNA Elements (ENCODE) project has been to characterize all the functional elements of the human genome. These elements include expressed transcripts and genomic regions bound by transcription factors (TFs), occupied by nucleosomes, occupied by nucleosomes with modified histones, or hypersensitive to DNase I cleavage, etc. Chromatin Immunoprecipitation (ChIP-seq) is an experimental technique for detecting TF binding in living cells, and the genomic regions bound by TFs are called ChIP-seq peaks. ENCODE has performed and compiled results from tens of thousands of experiments, including ChIP-seq, DNase, RNA-seq and Hi-C. These efforts have culminated in two web-based resources from our lab—Factorbook and SCREEN—for the exploration of epigenomic data for both human and mouse. Factorbook is a peak-centric resource presenting data such as motif enrichment and histone modification profiles for transcription factor binding sites computed from ENCODE ChIP-seq data. SCREEN provides an encyclopedia of ~2 million regulatory elements, including promoters and enhancers, identified using ENCODE ChIP-seq and DNase data, with an extensive UI for searching and visualization. While we have successfully utilized the thousands of available ENCODE ChIP-seq experiments to build the Encyclopedia and visualizers, we have also struggled with the practical and theoretical inability to assay every possible experiment on every possible biosample under every conceivable biological scenario. We have used machine learning techniques to predict TF binding sites and enhancers location, and demonstrate machine learning is critical to help decipher functional regions of the genome.

APA, Harvard, Vancouver, ISO, and other styles

28

SCALAS, MICHELE. "Malware Analysis and Detection with Explainable Machine Learning." Doctoral thesis, Università degli Studi di Cagliari, 2021. http://hdl.handle.net/11584/310630.

Full text

Abstract:

Malware detection is one of the areas where machine learning is successfully employed due to its high discriminating power and the capability of identifying novel variants of malware samples. Typically, the problem formulation is strictly correlated to the use of a wide variety of features covering several characteristics of the entities to classify. Apparently, this practice allows achieving considerable detection performance. However, it hardly permits us to gain insights into the knowledge extracted by the learning algorithm, causing two main issues. First, detectors might learn spurious patterns; thus, undermining their effectiveness in real environments. Second, they might be particularly vulnerable to adversarial attacks; thus, weakening their security. These concerns give rise to the necessity to develop systems that are tailored to the specific peculiarities of the attacks to detect. Within malware detection, Android ransomware represents a challenging yet illustrative domain for assessing the relevance of this issue. Ransomware represents a serious threat that acts by locking the compromised device or encrypting its data, then forcing the device owner to pay a ransom in order to restore the device functionality. Attackers typically develop such dangerous apps so that normally-legitimate components and functionalities perform malicious behaviour; thus, making them harder to be distinguished from genuine applications. In this sense, adopting a well-defined variety of features and relying on some kind of explanations about the logic behind such detectors could improve their design process since it could reveal truly characterising features; hence, guiding the human expert towards the understanding of the most relevant attack patterns. Given this context, the goal of the thesis is to explore strategies that may improve the design process of malware detectors. In particular, the thesis proposes to evaluate and integrate approaches based on rising research on Explainable Machine Learning. To this end, the work follows two pathways. The first and main one focuses on identifying the main traits that result to be characterising and effective for Android ransomware detection. Then, explainability techniques are used to propose methods to assess the validity of the considered features. The second pathway broadens the view by exploring the relationship between explainable machine learning and adversarial attacks. In this regard, the contribution consists of pointing out metrics extracted from explainability techniques that can reveal models' robustness to adversarial attacks, together with an assessment of the practical feasibility for attackers to alter the features that affect models' output the most. Ultimately, this work highlights the necessity to adopt a design process that is aware of the weaknesses and attacks against machine learning-based detectors, and proposes explainability techniques as one of the tools to counteract them.

APA, Harvard, Vancouver, ISO, and other styles

29

Frandsen, Abraham Jacob. "Machine Learning for Disease Prediction." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5975.

Full text

Abstract:

Millions of people in the United States alone suffer from undiagnosed or late-diagnosed chronic diseases such as Chronic Kidney Disease and Type II Diabetes. Catching these diseases earlier facilitates preventive healthcare interventions, which in turn can lead to tremendous cost savings and improved health outcomes. We develop algorithms for predicting disease occurrence by drawing from ideas and techniques in the field of machine learning. We explore standard classification methods such as logistic regression and random forest, as well as more sophisticated sequence models, including recurrent neural networks. We focus especially on the use of medical code data for disease prediction, and explore different ways for representing such data in our prediction algorithms.

APA, Harvard, Vancouver, ISO, and other styles

30

Shortreed, Susan. "Learning in spectral clustering /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/8977.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Craddock, Richard Cameron. "Support vector classification analysis of resting state functional connectivity fMRI." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/31774.

Full text

Abstract:

Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2010.<br>Committee Chair: Hu, Xiaoping; Committee Co-Chair: Vachtsevanos, George; Committee Member: Butera, Robert; Committee Member: Gurbaxani, Brian; Committee Member: Mayberg, Helen; Committee Member: Yezzi, Anthony. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

32

Manda, Kundan Reddy. "Sentiment Analysis of Twitter Data Using Machine Learning and Deep Learning Methods." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18447.

Full text

Abstract:

Background: Twitter, Facebook, WordPress, etc. act as the major sources of information exchange in today's world. The tweets on Twitter are mainly based on the public opinion on a product, event or topic and thus contains large volumes of unprocessed data. Synthesis and Analysis of this data is very important and difficult due to the size of the dataset. Sentiment analysis is chosen as the apt method to analyse this data as this method does not go through all the tweets but rather relates to the sentiments of these tweets in terms of positive, negative and neutral opinions. Sentiment Analysis is normally performed in 3 ways namely Machine learning-based approach, Sentiment lexicon-based approach, and Hybrid approach. The Machine learning based approach uses machine learning algorithms and deep learning algorithms for analysing the data, whereas the sentiment lexicon-based approach uses lexicons in analysing the data and they contain vocabulary of positive and negative words. The Hybrid approach uses a combination of both Machine learning and sentiment lexicon approach for classification. Objectives: The primary objectives of this research are: To identify the algorithms and metrics for evaluating the performance of Machine Learning Classifiers. To compare the metrics from the identified algorithms depending on the size of the dataset that affects the performance of the best-suited algorithm for sentiment analysis. Method: The method chosen to address the research questions is Experiment. Through which the identified algorithms are evaluated with the selected metrics. Results: The identified machine learning algorithms are Naïve Bayes, Random Forest, XGBoost and the deep learning algorithm is CNN-LSTM. The algorithms are evaluated with respect to the metrics namely precision, accuracy, F1 score, recall and compared. CNN-LSTM model is best suited for sentiment analysis on twitter data with respect to the selected size of the dataset. Conclusion: Through the analysis of results, the aim of this research is achieved in identifying the best-suited algorithm for sentiment analysis on twitter data with respect to the selected dataset. CNN-LSTM model results in having the highest accuracy of 88% among the selected algorithms for the sentiment analysis of Twitter data with respect to the selected dataset.

APA, Harvard, Vancouver, ISO, and other styles

33

Hjalmarsson, Victoria. "Machine learning and Multi-criteria decision analysis in healthcare : A comparison of machine learning algorithms for medical diagnosis." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-33940.

Full text

Abstract:

Medical records consist of a lot of data. Nevertheless, in today’s digitized society it is difficult for humans to convert data into information and recognize hidden patterns. Effective decision support tools can assist medical staff to reveal important information hidden in the vast amount of data and support their medical decisions. The objective of this thesis is to compare five machine learning algorithms for clinical diagnosis. The selected machine learning algorithms are C4.5, Random Forest, Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Naïve Bayes classifier. First, the machine learning algorithms are applied on three publicly available datasets. Next, the Analytic hierarchy process (AHP) is applied to evaluate which algorithms are more suitable than others for medical diagnosis. Evaluation criteria are chosen with respect to typical clinical criteria and were narrowed down to five; sensitivity, specificity, positive predicted value, negative predicted value and interpretability. Given the results, Naïve Bayes and SVM are given the highest AHP-scores indicating they are more suitable than the other tested algorithm as clinical decision support. In most cases kNN performed the worst and also received the lowest AHP-score which makes it the least suitable algorithm as support for medical diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

34

Qin, Lei. "Online machine learning methods for visual tracking." Thesis, Troyes, 2014. http://www.theses.fr/2014TROY0017/document.

Full text

Abstract:

Nous étudions le problème de suivi de cible dans une séquence vidéo sans aucune connaissance préalable autre qu'une référence annotée dans la première image. Pour résoudre ce problème, nous proposons une nouvelle méthode de suivi temps-réel se basant sur à la fois une représentation originale de l’objet à suivre (descripteur) et sur un algorithme adaptatif capable de suivre la cible même dans les conditions les plus difficiles comme le cas où la cible disparaît et réapparait dans le scène (ré-identification). Tout d'abord, pour la représentation d’une région de l’image à suivre dans le temps, nous proposons des améliorations au descripteur de covariance. Ce nouveau descripteur est capable d’extraire des caractéristiques spécifiques à la cible, tout en ayant la capacité à s’adapter aux variations de l’apparence de la cible. Ensuite, l’étape algorithmique consiste à mettre en cascade des modèles génératifs et des modèles discriminatoires afin d’exploiter conjointement leurs capacités à distinguer la cible des autres objets présents dans la scène. Les modèles génératifs sont déployés dans les premières couches afin d’éliminer les candidats les plus faciles alors que les modèles discriminatoires sont déployés dans les couches suivantes afin de distinguer la cibles des autres objets qui lui sont très similaires. L’analyse discriminante des moindres carrés partiels (AD-MCP) est employée pour la construction des modèles discriminatoires. Enfin, un nouvel algorithme d'apprentissage en ligne AD-MCP a été proposé pour la mise à jour incrémentale des modèles discriminatoires<br>We study the challenging problem of tracking an arbitrary object in video sequences with no prior knowledge other than a template annotated in the first frame. To tackle this problem, we build a robust tracking system consisting of the following components. First, for image region representation, we propose some improvements to the region covariance descriptor. Characteristics of a specific object are taken into consideration, before constructing the covariance descriptor. Second, for building the object appearance model, we propose to combine the merits of both generative models and discriminative models by organizing them in a detection cascade. Specifically, generative models are deployed in the early layers for eliminating most easy candidates whereas discriminative models are in the later layers for distinguishing the object from a few similar "distracters". The Partial Least Squares Discriminant Analysis (PLS-DA) is employed for building the discriminative object appearance models. Third, for updating the generative models, we propose a weakly-supervised model updating method, which is based on cluster analysis using the mean-shift gradient density estimation procedure. Fourth, a novel online PLS-DA learning algorithm is developed for incrementally updating the discriminative models. The final tracking system that integrates all these building blocks exhibits good robustness for most challenges in visual tracking. Comparing results conducted in challenging video sequences showed that the proposed tracking system performs favorably with respect to a number of state-of-the-art methods

APA, Harvard, Vancouver, ISO, and other styles

35

DI, GIACOMO UMBERTO ANTONIO. "Machine learning and formal methods for sport analytics." Doctoral thesis, Università degli studi del Molise, 2022. https://hdl.handle.net/11695/115353.

Full text

Abstract:

L’obiettivo principale del mio periodo di Dottorato riguarda il concetto della Soccer Analytics. Durante gli ultimi anni la Soccer Analytics ha registrato un’enorme diffusione. Di solito, in quest’ambito, i ricercatori si sono concentrati sulla predizione dei risultati delle partite anche se, soprattutto negli ultimi tempi, c’è stata un’attenzione particolare nello studio dei dati posizionali. Il mio periodo di Dottorato può essere diviso in tre parti fondamentali: nella prima parte ho preso in considerazione l’analisi e lo studio delle attività svolte dall’uomo; nella seconda parte, mi sono soffermato sull’applicazione di tecniche di Machine Learning applicate al contesto della Soccer Analytics per poi passare, nell’ultimo periodo, all’utilizzo delle tecniche di verifica formale sugli stessi dati. Alla fine di questo percorso ho potuto confrontare i punti di forza e di debolezza di tutte le tecniche utilizzate. L’analisi delle attività umane (Behavioural Analysis) è di grande interesse per i ricercatori a causa della grande quantità di applicazioni possibili di questa disciplina. Quando si parla di questo tipo di disciplina, il primo problema da affrontare consiste nel dover gestire e modellizzare le attività e gli scenari presi in considerazione che, chiaramente, possono essere di diversa natura. Questi aspetti possono essere molto difficili da risolvere. Dopo aver trattato lo studio e l’analisi delle attività umane, sono passato all’applicazione delle tecniche di Machine Learning su dati relativi al contesto calcistico, in maniera tale da poter gestire la grande mole di dati disponibili. Nello specifico, sono state utilizzate delle tecniche supervisionate per predire il risultato di partite di calcio e per identificare la posizione occupata dai giocatori in campo in diverse situazioni di gioco. L’ultimo step riguarda l’utilizzo di tecniche di verifica formale sugli stessi dati presi in considerazione nella seconda parte del lavoro, cercando di ottenere una migliore “explainability” riguardo ai dati a disposizione. L’obiettivo, in questa ultima parte del mio lavoro, riguarda l’identificazione dello stile di gioco delle squadre prese in esame. Per ogni squadra, che viene rappresentata mediante un automa, cerchiamo di identificare lo stile di gioco (offensivo o difensivo) utilizzando tecniche di Model Checking. Le proprietà che vengono verificate sono definite grazie all’aiuto di un esperto di dominio. Questa informazione può essere utile per l’allenatore, che può verificare, durante lo svolgimento della partita, se la squadra sta rispettando le sue direttive. Tutte le analisi, svolte nei vari step del progetto, hanno portato a risultati molto promettenti se confrontati con quelli presenti in letteratura.<br>The main focus of my PhD period is related to the concept of sport analytics. During the last few years, sports analytics has been growing rapidly. The main usage of this discipline is the prediction of soccer match results, even if it can be applied with interesting results in different areas, such as analysis based on the player position information. This context has been explored through three different steps: the analysis and recognition of human activities, the adoption of machine learning on soccer data and, at the end, the application of formal methods on sport analytics scenario. In this way, I want to explore the strengths and the weaknesses of different techniques. Human activity recognition is attracting interest from researchers and developers in recent years due to its immense applications in wide area of human endeavors. The main issue in human behaviour modeling is represented by the diverse nature of human activities and the scenarios in which they are performed. These factors make this aspects challenging to deal with. Then, machine learning techniques have been used in order to make some consideration on a great amount of data related to soccer matches. Specifically, these type of techniques have been used in order to predict soccer game results and player positions during a match. The last step is about the usage of formal methods in order to provide more explainability and interpretability of the results obtained. With the application of formal methods based approach, I try to detect the playing style of a soccer teams, providing transparency of the results obtained. I model soccer teams in terms of automata and, by exploiting model verification techniques, I verify whether a playing style, expressed by means of a temporal logic formula, is exhibited by the team under analysis. This information can support the coach in determining the strategy of the team while the match is in progress. The experimental analysis confirms the effectiveness of the proposed method in soccer team behaviour detection, obtaining promising results, compared with standard baseline approaches.

APA, Harvard, Vancouver, ISO, and other styles

36

Larsen, Jan Ivar. "Predicting Stock Prices Using Technical Analysis and Machine Learning." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-11027.

Full text

Abstract:

Historical stock prices are used to predict the direction of future stock prices. The developed stock price prediction model uses a novel two-layer reasoning approach that employs domain knowledge from technical analysis in the first layer of reasoning to guide a second layer of reasoning based on machine learning. The model is supplemented by a money management strategy that use the historical success of predictions made by the model to determine the amount of capital to invest on future predictions. Based on a number of portfolio simulations with trade signals generated by the model, we conclude that the prediction model successfully outperforms the Oslo Benchmark Index (OSEBX).

APA, Harvard, Vancouver, ISO, and other styles

37

Kihlström, Gustav, and Przybysz Patryk. "Technical analysis inspired machine learning for stock market data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186863.

Full text

Abstract:

In this thesis we evaluate four different machine learning algorithms, namely Naive Bayes Classifier, Support Vector Machines, Extreme Learning Machine and Random Forest in the context of stock market investments. The aim is to provide additional information that can be beneficial when creating stock market models to be used in a machine learning setting. All four algorithms are trained on different configurations of data, based on concepts from technical analysis. The configurations contain closing prices, volatility and trading volume in different combinations. These variables are taken from past trading days, where the number of days from which data is to be collected ranges from 2 to 30. The resulting predictors attained from the various algorithms and configurations above reach accuracy rates between $50-54\%$. This thesis concludes that the effect of the different evaluated features vary depending on which algorithm is used as well as how many past trading days are included. Concluding, it is ascertained that the usage of volatility features should at least be considered when building a machine learning model in a stock market context.

APA, Harvard, Vancouver, ISO, and other styles

38

Letzner, Josefine. "Analysis of Emergency Medical Transport Datasets using Machine Learning." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215162.

Full text

Abstract:

The selection of hospital once an ambulance has picked up its patient is today decided by the ambulance staff. This report describes a supervised machinelearning approach for predicting hospital selection. This is a multi-classclassification problem. The performance of random forest, logistic regression and neural network were compared to each other and to a baseline, namely the one rule-algorithm. The algorithms were applied to real world data from SOS-alarm, the company that operate Sweden’s emergency call services. Performance was measured with accuracy and f1-score. Random Forest got the best result followed by neural network. Logistic regression exhibited slightly inferior results but still performed far better than the baseline. The results point toward machine learning being a suitable method for learning the problem of hospital selection.<br>Beslutet om till vilket sjukhus en ambulans ska köra patienten till bestäms idag av ambulanspersonalen. Den här rapporten beskriver användandet av övervakad maskininlärning för att förutsåga detta beslut. Resultaten från algoritmerna slumpmässig skog, logistisk regression och neurala nätvärk jämförs med varanda och mot ett basvärde. Basvärdet erhölls med algorithmen en-regel. Algoritmerna applicerades på verklig data från SOS-alarm, Sveriges operatör för larmsamtal. Resultaten mättes med noggrannhet och f1-poäng. Slumpmässigskog visade bäst resultat följt av neurala nätverk. Logistisk regression uppvisade något sämre resultat men var fortfarande betydligt bättre än basvärdet. Resultaten pekar mot att det är lämpligt att använda maskininlärning för att lära sig att ta beslut om val av sjukhus.

APA, Harvard, Vancouver, ISO, and other styles

39

Eamrurksiri, Araya. "Applying Machine Learning to LTE/5G Performance Trend Analysis." Thesis, Linköpings universitet, Statistik och maskininlärning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139126.

Full text

Abstract:

The core idea of this thesis is to reduce the workload of manual inspection when the performance analysis of an updated software is required. The Central Process- ing Unit (CPU) utilization, which is one of the essential factors for evaluating the performance, is analyzed. The purpose of this work is to apply machine learning techniques that are suitable for detecting the state of the CPU utilization and any changes in the test environment that affects the CPU utilization. The detection re- lies on a Markov switching model to identify structural changes, which are assumed to follow an unobserved Markov chain, in the time series data. A historical behav- ior of the data can be described by a first-order autoregression. Then, the Markov switching model becomes a Markov switching autoregressive model. Another ap- proach based on a non-parametric analysis, a distribution-free method that requires fewer assumptions, called an E-divisive method, is proposed. This method uses a hi- erarchical clustering algorithm to detect multiple change point locations in the time series data. As the data used in this analysis does not contain any ground truth, the evaluation of the methods is analyzed by generating simulated datasets with known states. Besides, these simulated datasets are used for studying and compar- ing between the Markov switching autoregressive model and the E-divisive method. Results show that the former method is preferable because of its better performance in detecting changes. Some information about the state of the CPU utilization are also obtained from performing the Markov switching model. The E-divisive method is proved to have less power in detecting changes and has a higher rate of missed detections. The results from applying the Markov switching autoregressive model to the real data are presented with interpretations and discussions.

APA, Harvard, Vancouver, ISO, and other styles

40

Spiegler, Sebastian Reiner. "Machine learning for the analysis of morphologically complex languages." Thesis, University of Bristol, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.535166.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Alotaibi, Saud Saleh. "Sentiment analysis in the Arabic language using machine learning." Thesis, Colorado State University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3720340.

Full text

Abstract:

<p> Sentiment analysis has recently become one of the growing areas of research related to natural language processing and machine learning. Much opinion and sentiment about specific topics are available online, which allows several parties such as customers, companies and even governments, to explore these opinions. The first task is to classify the text in terms of whether or not it expresses opinion or factual information. Polarity classification is the second task, which distinguishes between polarities (positive, negative or neutral) that sentences may carry. The analysis of natural language text for the identification of subjectivity and sentiment has been well studied in terms of the English language. Conversely, the work that has been carried out in terms of Arabic remains in its infancy; thus, more cooperation is required between research communities in order for them to offer a mature sentiment analysis system for Arabic. There are recognized challenges in this field; some of which are inherited from the nature of the Arabic language itself, while others are derived from the scarcity of tools and sources. </p><p> This dissertation provides the rationale behind the current work and proposed methods to enhance the performance of sentiment analysis in the Arabic language. The first step is to increase the resources that help in the analysis process; the most important part of this task is to have annotated sentiment corpora. Several free corpora are available for the English language, but these resources are still limited in other languages, such as Arabic. This dissertation describes the work undertaken by the author to enrich sentiment analysis in Arabic by building a new Arabic Sentiment Corpus. The data is labeled not only with two polarities (positive and negative), but the neutral sentiment is also used during the annotation process. </p><p> The second step includes the proposal of features that may capture sentiment orientation in the Arabic language, as well as using different machine learning classifiers that may be able to work better and capture the non-linearity with a richly morphological and highly inflectional language, such as Arabic. Different types of features are proposed. These proposed features try to capture different aspects and characteristics of Arabic. Morphological, Semantic, Stylistic features are proposed and investigated. In regard with the classifier, the performance of using linear and nonlinear machine learning approaches was compared. The results are promising for the continued use of nonlinear ML classifiers for this task. Learning knowledge from a particular dataset domain and applying it to a different domain is one useful method in the case of limited resources, such as with the Arabic language. This dissertation shows and discussed the possibility of applying cross-domain in the field of Arabic sentiment analysis. It also indicates the feasibility of using different mechanisms of the cross-domain method. </p><p> Other work in this dissertation includes the exploration of the effect of negation in Arabic subjectivity and polarity classification. The negation word lists were devised to help in this and other natural language processing tasks. These words include both types of Arabic, Modern Standard and some of Dialects. Two methods of dealing with the negation in sentiment analysis in Arabic were proposed. The first method is based on a static approach that assumes that each sentence containing negation words is considered a negated sentence. When determining the effect of negation, different techniques were proposed, using different word window sizes, or using base phrase chunk. The second approach depends on a dynamic method that needs an annotated negation dataset in order to build a model that can determine whether or not the sentence is negated by the negation words and to establish the effect of the negation on the sentence. The results achieved by adding negation to Arabic sentiment analysis were promising and indicate that the negation has an effect on this task. Finally, the experiments and evaluations that were conducted in this dissertation encourage the researchers to continue in this direction of research. </p>

APA, Harvard, Vancouver, ISO, and other styles

42

Musco, Christopher Paul. "Faster linear algebra for data analysis and machine learning." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118093.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 189-208).<br>We study fast algorithms for linear algebraic problems that are ubiquitous in data analysis and machine learning. Examples include singular value decomposition and low-rank approximation, several varieties of linear regression, data clustering, and nonlinear kernel methods. To scale these problems to massive datasets, we design new algorithms based on random sampling and iterative refinement, tools that have become an essential part of modern computational linear algebra. We focus on methods that are provably accurate and efficient, while working well in practical applications. Open source code for many of the methods discussed in this thesis can be found at https://github.com/cpmusco.<br>by Christopher Paul Musco.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

43

Brun, Yuriy 1981. "Software fault identification via dynamic analysis and machine learning." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/17939.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.<br>Includes bibliographical references (p. 65-67).<br>I propose a technique that identifies program properties that may indicate errors. The technique generates machine learning models of run-time program properties known to expose faults, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. I evaluate an implementation of the technique, the Fault Invariant Classifier, that demonstrates the efficacy of the error finding technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. Given a set of properties produced by the program analysis, some of which are indicative of errors, the technique selects a subset of properties that are most likely to reveal an error. The experimental evaluation over 941,000 lines of code, showed that a user must examine only the 2.2 highest-ranked properties for C programs and 1.7 for Java programs to find a fault-revealing property. The technique increases the relevance (the concentration of properties that reveal errors) by a factor of 50 on average for C programs, and 4.8 for Java programs.<br>by Yuriy Brun.<br>M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

44

Adjodah, Dhaval D. K. (Adjodlah Dhaval Dhamnidhi Kumar). "Understanding social influence using network analysis and machine learning." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/81111.

Full text

Abstract:

Thesis (S.M. in Technology and Policy)--Massachusetts Institute of Technology, Engineering Systems Division, 2013.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (p. 61-62).<br>If we are to enact better policy, fight crime and decrease poverty, we will need better computational models of how society works. In order to make computational social science a useful reality, we will need generative models of how social influence sprouts at the interpersonal level and how it leads to emergent social behavior. In this thesis, I take steps at understanding the predictors and conduits of social influence by analyzing real-life data, and I use the findings to create a high-accuracy prediction model of individuals' future behavior. The funf dataset which comprises detailed high-frequency data gathered from 25 mobile phone-based signals from 130 people over a period of 15 months, will be used to test the hypothesis that people who interact more with each other have a greater ability to influence each other. Various metrics of interaction will be investigated such as self-reported friendships, call and SMS logs and Bluetooth co-location signals. The Burt Network Constraint of each pair of participants is calculated as a measure of not only the direct interaction between two participants but also the indirect friendships through intermediate neighbors that form closed triads with both the participants being assessed. To measure influence, the results of the live funf intervention will be used where behavior change of each participant to be more physically active was rewarded, with the reward being calculated live. There were three variants of the reward structure: one where each participant was rewarded for her own behavior change without seeing that of anybody else (the control), one where each participant was paired up with two 'buddies' whose behavior change she could see live but she was still rewarded based on her own behavior, and one where each participant who was paired with two others was paid based on their behavior change that she could see live. As a metric for social influence, it will be considered how the change in slope and average physical activity levels of one person follows the change in slope and average physical activity levels of the buddy who saw her data and/or was rewarded based on her performance. Finally, a linear regression model that uses the various types of direction and indirect network interactions will be created to predict the behavior change of one participant based on her closeness with her buddy. In addition to explaining and demonstrating the causes of social influence with unprecedented detail using network analysis and machine learning, I will discuss the larger topic of using such a technology-driven approach to changing behavior instead of the traditional policy-driven approach. The advantages of the technology-driven approach will be highlighted and the potential political-economic pitfalls of implementing such a novel approach will also be addressed. Since technology-driven approaches to changing individual behavior can have serious negative consequences for democracy and the free-market, I will introduce a novel dimension to the discussion of how to protect individuals from the state and from powerful private organizations. Hence, I will describe how transparency policies and civic engagement technologies can further this goal of 'watching the watchers'.<br>by Dhaval D.K. Adjodah.<br>S.M.in Technology and Policy

APA, Harvard, Vancouver, ISO, and other styles

45

Rudovic, Ognjen. "Machine learning techniques for automated analysis of facial expressions." Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/24677.

Full text

Abstract:

Automated analysis of facial expressions paves the way for numerous next-generation computing tools including aff ective computing technologies (proactive and a ctive user interfaces), learner-adaptive tutoring systems, medical and marketing applications, etc. In this thesis, we propose machine learning algorithms that head toward solving two important but largely understudied problems in automated analysis of facial expressions from facial images: pose-invariant facial expression classi fication, and modeling of dynamics of facial expressions, in terms of their temporal segments and intensity. The methods that we propose for the former represent the pioneering work on pose-invariant facial expression analysis. In these methods, we use our newly introduced models for pose normalization that achieve successful decoupling of head pose and expression in the presence of large out-of-plane head rotations, followed by facial expression classification. This is in contrast to most existing works, which can deal only with small in-plane head rotations. We derive our models for pose normalization using the Gaussian Process (GP) framework for regression and manifold learning. In these, we model the structure encoded in relationships between facial expressions from di fferent poses and also in facial shapes. This results in the models that can successfully perform pose normalization either by warping facial expressions from non-frontal poses to the frontal pose, or by aligning facial expressions from different poses on a common expression manifold. These models solve some of the most important challenges of pose-invariant facial expression classification by being able to generalize to various poses and expressions from a small amount of training data, while also being largely robust to corrupted image features and imbalanced examples of different facial expression categories. We demonstrate this on the task of pose-invariant facial expression classi fication of six basic emotions. The methods that we propose for temporal segmentation and intensity estimation of facial expressions represent some of the first attempts in the fi eld to model facial expression dynamics. In these methods, we use the Conditional Random Fields (CRF) framework to define dynamic models that encode the spatio-temporal structure of the expression data, reflected in ordinal and temporal relationships between temporal segments and intensity levels of facial expressions. We also propose several means of addressing the subject variability in the data by simultaneously exploiting various priors, and the e ffects of heteroscedasticity and context of target facial expressions. The resulting models are the first to address simultaneous classi fication and temporal segmentation of facial expressions of six basic emotions, and dynamic modeling of intensity of facial expressions of pain. Moreover, the context-sensitive model that we propose for intensity estimation of spontaneously displayed facial expressions of pain and Action Units (AUs), is the first approach in the field that performs context-sensitive modeling of facial expressions in a principled manner.

APA, Harvard, Vancouver, ISO, and other styles

46

Donoghue, Claire. "Analysis of MRI for knee osteoarthritis using machine learning." Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/24684.

Full text

Abstract:

Approximately 8.5 million people in the UK (13.5% of the population) have osteoarthritis (OA) in one or both knees, with more than 6 million people in the UK suffering with painful osteoarthritis of the knee. In addition, an ageing population implies that an estimated 17 million people (twice as many as in 2012) are likely to be living with OA by 2030. Despite this, there exists no disease modifying drugs for OA and structural OA in MRI is poorly characterised. This motivates research to develop biomarkers and tools to aid osteoarthritis diagnosis from MRI of the knee. Previously many solutions for learning biomarkers have relied upon hand-crafted features to characterise and diagnose osteoarthritis from MRI. The methods proposed in this thesis are scalable and use machine learning to characterise large populations of the OAI dataset, with one experiment applying an algorithm to over 10,000 images. Studies of this size enable subtle characteristics of the dataset to be learnt and model many variations within a population. We present data-driven algorithms to learn features to predict OA from the appearance of the articular cartilage. An unsupervised manifold learning algorithm is used to compute a low dimensional representation of knee MR data which we propose as an imaging marker of OA. Previous metrics introduced for OA diagnosis are loosely based on the research communities intuition of the structural causes of OA progression, including morphological measures of the articular cartilage such as the thickness and volume. We demonstrate that there is a strong correlation between traditional morphological measures of the articular cartilage and the biomarkers identified using the manifold learning algorithm that we propose (R 2 = 0.75). The algorithm is extended to create biomarkers for different regions and sequences. A combination of these markers is proposed to yield a diagnostic imaging biomarker with superior performance. The diagnostic biomarkers presented are shown to improve upon hand-crafted morphological measure of disease status presented in the literature, a linear discriminant analysis (LDA) classification for early stage diagnosis of knee osteoarthritis results with an AUC of 0.9. From the biomarker discovery experiments we identified that intensity based affine registration of knee MRIs is not sufficiently robust for large scale image analysis, approximately 5% of these registrations fail. We have developed fast algorithms to compute robust affine transformations of knee MRI, which enables accurate pairwise registrations in large datasets. We model the population of images as a non-linear manifold, a registration is defined by the shortest geodesic path over the manifold representation. We identify sources of error in our manifold representation and propose fast mitigation strategies by checking for consistency across the manifold and by utilising multiple paths. These mitigation strategies are shown to improve registration accuracy and can be computed in less than 2 seconds with current architecture.

APA, Harvard, Vancouver, ISO, and other styles

47

Chan, Herman King Yeung. "Machine learning and statistical approaches to support gait analysis." Thesis, Ulster University, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.646039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Figea, Léo. "Machine Learning for Affect Analysis on White Supremacy Forum." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-301983.

Full text

Abstract:

Since the inception of the World Wide Web, security agencies, researchers, and analysts have focused much of their attention on the sentiment found on hate-inspired webforums. Here, one of their goals has been to detect and measure users' affects that are expressed in the forums as well as identify how users' affects change over time. Manual inspection has been one way to do this; however, as the number of discussion posts and sub-forums increase, there has been a growing need for an automated system that can assist humans in their analysis. The aim of this thesis, then, is to detect and measure a number of affects expressed in written text on Stormfront.org, the most visited hate forum on the Web. To do this, we used a machine learning approach where we trained a model to recognize affects on three sub-forums: Ideology and Philosophy, For Stormfront Ladies Only, and Stormfront Ireland. The training data consisted of manually annotated data and the affects we focused on were racism, aggression, and worries. Results indicate that even though measuring affects is a subjective process, machine learning is a promising way forward to analyse and measure the presence of different affects on hate forums.

APA, Harvard, Vancouver, ISO, and other styles

49

Silva, Thiago Christiano. "Machine learning in complex networks: modeling, analysis, and applications." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19042013-104641/.

Full text

Abstract:

Machine learning is evidenced as a research area with the main purpose of developing computational methods that are capable of learning with their previously acquired experiences. Although a large amount of machine learning techniques has been proposed and successfully applied in real systems, there are still many challenging issues, which need be addressed. In the last years, an increasing interest in techniques based on complex networks (large-scale graphs with nontrivial connection patterns) has been verified. This emergence is explained by the inherent advantages provided by the complex network representation, which is able to capture the spatial, topological and functional relations of the data. In this work, we investigate the new features and possible advantages offered by complex networks in the machine learning domain. In fact, we do show that the network-based approach really brings interesting features for supervised, semisupervised, and unsupervised learning. Specifically, we reformulate a previously proposed particle competition technique for both unsupervised and semisupervised learning using a stochastic nonlinear dynamical system. Moreover, an analytical analysis is supplied, which enables one to predict the behavior of the proposed technique. In addition to that, data reliability issues are explored in semisupervised learning. Such matter has practical importance and is found to be of little investigation in the literature. With the goal of validating these techniques for solving real problems, simulations on broadly accepted databases are conducted. Still in this work, we propose a hybrid supervised classification technique that combines both low and high orders of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features, while the latter measures the compliance of the test instances with the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the semantic meaning of the data, but also is able to improve the performance of traditional classification techniques. Finally, it is expected that this study will contribute, in a relevant manner, to the machine learning area<br>Aprendizado de máquina figura-se como uma área de pesquisa que visa a desenvolver métodos computacionais capazes de aprender com a experiência. Embora uma grande quantidade de técnicas de aprendizado de máquina foi proposta e aplicada, com sucesso, em sistemas reais, existem ainda inúmeros problemas desafiantes que necessitam ser explorados. Nos últimos anos, um crescente interesse em técnicas baseadas em redes complexas (grafos de larga escala com padrões de conexão não triviais) foi verificado. Essa emergência é explicada pelas inerentes vantagens que a representação em redes complexas traz, sendo capazes de capturar as relações espaciais, topológicas e funcionais dos dados. Nesta tese, serão investigadas as possíveis vantagens oferecidas por redes complexas quando utilizadas no domínio de aprendizado de máquina. De fato, será mostrado que a abordagem por redes realmente proporciona melhorias nos aprendizados supervisionado, semissupervisionado e não supervisionado. Especificamente, será reformulada uma técnica de competição de partículas para o aprendizado não supervisionado e semissupervisionado por meio da utilização de um sistema dinâmico estocástico não linear. Em complemento, uma análise analítica de tal modelo será desenvolvida, permitindo o entendimento evolucional do modelo no tempo. Além disso, a questão de confiabilidade de dados será investigada no aprendizado semissupervisionado. Tal tópico tem importância prática e é pouco estudado na literatura. Com o objetivo de validar essas técnicas em problemas reais, simulações computacionais em bases de dados consagradas pela literatura serão conduzidas. Ainda nesse trabalho, será proposta uma técnica híbrica de classificação supervisionada que combina tanto o aprendizado de baixo como de alto nível. O termo de baixo nível pode ser implementado por qualquer técnica de classificação tradicional, enquanto que o termo de alto nível é realizado pela extração das características de uma rede construída a partir dos dados de entrada. Nesse contexto, aquele classifica as instâncias de teste segundo qualidades físicas, enquanto que esse estima a conformidade da instância de teste com a formação de padrões dos dados. Os estudos aqui desenvolvidos mostram que o método proposto pode melhorar o desempenho de técnicas tradicionais de classificação, além de permitir uma classificação de acordo com o significado semântico dos dados. Enfim, acredita-se que este estudo possa gerar contribuições relevantes para a área de aprendizado de máquina.

APA, Harvard, Vancouver, ISO, and other styles

50

Wilmer, Adam I. "A machine learning approach to texture analysis and synthesis." Thesis, University of Southampton, 2004. https://eprints.soton.ac.uk/259696/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!