To see the other types of publications on this topic, follow the link: Predictive Machine Learning.

Dissertations / Theses on the topic 'Predictive Machine Learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Predictive Machine Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Romano, Donato. "Machine Learning algorithms for predictive diagnostics applied to automatic machines." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22319/.

Full text
Abstract:
In questo lavoro di tesi è stato analizzato l'avvento dell'industria 4.0 all'interno dell' industria nel settore packaging. In particolare, è stata discussa l'importanza della diagnostica predittiva e sono stati analizzati e testati diversi approcci per la determinazione di modelli descrittivi del problema a partire dai dati. Inoltre, sono state applicate le principali tecniche di Machine Learning in modo da classificare i dati analizzati nelle varie classi di appartenenza.
APA, Harvard, Vancouver, ISO, and other styles
2

Korvesis, Panagiotis. "Machine Learning for Predictive Maintenance in Aviation." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX093/document.

Full text
Abstract:
L'augmentation des données disponibles dans presque tous les domaines soulève la nécessité d'utiliser des algorithmes pour l'analyse automatisée des données. Cette nécessité est mise en évidence dans la maintenance prédictive, où l'objectif est de prédire les pannes des systèmes en observant continuellement leur état, afin de planifier les actions de maintenance à l'avance. Ces observations sont générées par des systèmes de surveillance habituellement sous la forme de séries temporelles et de journaux d'événements et couvrent la durée de vie des composants correspondants. Le principal défi de la maintenance prédictive est l'analyse de l'historique d'observation afin de développer des modèles prédictifs.Dans ce sens, l'apprentissage automatique est devenu omniprésent puisqu'il fournit les moyens d'extraire les connaissances d'une grande variété de sources de données avec une intervention humaine minimale. L'objectif de cette thèse est d'étudier et de résoudre les problèmes dans l'aviation liés à la prévision des pannes de composants à bord. La quantité de données liées à l'exploitation des avions est énorme et, par conséquent, l'évolutivité est une condition essentielle dans chaque approche proposée.Cette thèse est divisée en trois parties qui correspondent aux différentes sources de données que nous avons rencontrées au cours de notre travail. Dans la première partie, nous avons ciblé le problème de la prédiction des pannes des systèmes, compte tenu de l'historique des Post Flight Reports. Nous avons proposé une approche statistique basée sur la régression précédée d'une formulation méticuleuse et d'un prétraitement / transformation de données. Notre méthode estime le risque d'échec avec une solution évolutive, déployée dans un environnement de cluster en apprentissage et en déploiement. À notre connaissance, il n'y a pas de méthode disponible pour résoudre ce problème jusqu'au moment où cette thèse a été écrite.La deuxième partie consiste à analyser les données du livre de bord, qui consistent en un texte décrivant les problèmes d'avions et les actions de maintenance correspondantes. Le livre de bord contient des informations qui ne sont pas présentes dans les Post Flight Reports bien qu'elles soient essentielles dans plusieurs applications, comme la prédiction de l'échec. Cependant, le journal de bord contient du texte écrit par des humains, il contient beaucoup de bruit qui doit être supprimé afin d'extraire les informations utiles. Nous avons abordé ce problème en proposant une approche basée sur des représentations vectorielles de mots. Notre approche exploite des similitudes sémantiques, apprises par des neural networks qui ont généré les représentations vectorielles, afin d'identifier et de corriger les fautes d'orthographe et les abréviations. Enfin, des mots-clés importants sont extraits à l'aide du Part of Speech Tagging.Dans la troisième partie, nous avons abordé le problème de l'évaluation de l'état des composants à bord en utilisant les mesures des capteurs. Dans les cas considérés, l'état du composant est évalué par l'ampleur de la fluctuation du capteur et une tendance à l'augmentation monotone. Dans notre approche, nous avons formulé un problème de décomposition des séries temporelles afin de séparer les fluctuations de la tendance en résolvant un problème convexe. Pour quantifier l'état du composant, nous calculons à l'aide de Gaussian Mixture Models une fonction de risque qui mesure l'écart du capteur par rapport à son comportement normal<br>The increase of available data in almost every domain raises the necessity of employing algorithms for automated data analysis. This necessity is highlighted in predictive maintenance, where the ultimate objective is to predict failures of hardware components by continuously observing their status, in order to plan maintenance actions well in advance. These observations are generated by monitoring systems usually in the form of time series and event logs and cover the lifespan of the corresponding components. Analyzing this history of observation in order to develop predictive models is the main challenge of data driven predictive maintenance.Towards this direction, Machine Learning has become ubiquitous since it provides the means of extracting knowledge from a variety of data sources with the minimum human intervention. The goal of this dissertation is to study and address challenging problems in aviation related to predicting failures of components on-board. The amount of data related to the operation of aircraft is enormous and therefore, scalability is a key requirement in every proposed approach.This dissertation is divided in three main parts that correspond to the different data sources that we encountered during our work. In the first part, we targeted the problem of predicting system failures, given the history of Post Flight Reports. We proposed a regression-based approach preceded by a meticulous formulation and data pre-processing/transformation. Our method approximates the risk of failure with a scalable solution, deployed in a cluster environment both in training and testing. To our knowledge, there is no available method for tackling this problem until the time this thesis was written.The second part consists analyzing logbook data, which consist of text describing aircraft issues and the corresponding maintenance actions and it is written by maintenance engineers. The logbook contains information that is not reflected in the post-flight reports and it is very essential in several applications, including failure prediction. However, since the logbook contains text written by humans, it contains a lot of noise that needs to be removed in order to extract useful information. We tackled this problem by proposing an approach based on vector representations of words (or word embeddings). Our approach exploits semantic similarities of words, learned by neural networks that generated the vector representations, in order to identify and correct spelling mistakes and abbreviations. Finally, important keywords are extracted using Part of Speech Tagging.In the third part, we tackled the problem of assessing the health of components on-board using sensor measurements. In the cases under consideration, the condition of the component is assessed by the magnitude of the sensor's fluctuation and a monotonically increasing trend. In our approach, we formulated a time series decomposition problem in order to separate the fluctuation from the trend by solving a convex program. To quantify the condition of the component, we compute a risk function which measures the sensor's deviation from it's normal behavior, which is learned using Gaussian Mixture Models
APA, Harvard, Vancouver, ISO, and other styles
3

Karlsson, Lotta. "Predictive Maintenance for RM12 with Machine Learning." Thesis, Högskolan i Halmstad, Akademin för ekonomi, teknik och naturvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-42283.

Full text
Abstract:
Few components within mechanical engineering possess the fatigue resistance as of high-pressure turbine blades found in jet engines. This as they are designed to perform in extensively high temperatures under severe loading which causes degradation to be an important aspect despite a design, optimized for its environment. This study aims to find a method for predicting life consumption of those blades belonging to the turbine section of the jet engine in JAS 39 Gripen C/D called RM12. This was performed at GKN Aerospace, which holds the military type certificate for this engine as well as a patented solution that determines life consumption in components depending on operational history. With the help of machine learning in Matlab, flight sensor data and loading results, the method was to explore a variety of prediction models and find a selection of blades with varied utilization before reaching end of life for comparison. Followed by a search of understanding the life limiting fatigue conditions and the factors involved in the deterioration process. A similarity finding approach gave valuable meaning to the accuracy of regression analysis from flight data towards output in form of temperature predictions. Comparing known and reliable fatigue calculation results gave however no clear picture as inspected blades had reach their limit at very diverse accumulated values. The next approach was therefore to investigate if an initialization point of degradation could be found, from where the result could give an answer that matched for all blades and their different utilization. The result was that an accelerated degradation after high loading could give a prediction that could explain the total life consumption with an accuracy of 87% for 19 out of 21 investigated blades. The accelerated deterioration could in theory be explained by the fact that the fatigue resistance as well as different types of degradation, propagates each other and originates from thermal loading making them all contributors, whereas the conventional numerical methods only handles them separately. In order to get confidence, valuable and reliable predictions, the models do however need to be accompanied with more testing and adding of contributing factors before assumed as a proven method for life consumption determination of the high-pressure turbine blades.
APA, Harvard, Vancouver, ISO, and other styles
4

Darwiche, Aiman A. "Machine Learning Methods for Septic Shock Prediction." Diss., NSUWorks, 2018. https://nsuworks.nova.edu/gscis_etd/1051.

Full text
Abstract:
Sepsis is an organ dysfunction life-threatening disease that is caused by a dysregulated body response to infection. Sepsis is difficult to detect at an early stage, and when not detected early, is difficult to treat and results in high mortality rates. Developing improved methods for identifying patients in high risk of suffering septic shock has been the focus of much research in recent years. Building on this body of literature, this dissertation develops an improved method for septic shock prediction. Using the data from the MMIC-III database, an ensemble classifier is trained to identify high-risk patients. A robust prediction model is built by obtaining a risk score from fitting the Cox Hazard model on multiple input features. The score is added to the list of features and the Random Forest ensemble classifier is trained to produce the model. The Cox Enhanced Random Forest (CERF) proposed method is evaluated by comparing its predictive accuracy to those of extant methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Pienaar, Celia. "Machine learning in predictive analytics on judicial decision-making." Master's thesis, Faculty of Science, 2021. http://hdl.handle.net/11427/33925.

Full text
Abstract:
Legal professionals globally are under pressure to provide ‘more for less' – not an easy challenge in the era of big data, increasingly complex regulatory and legislative frameworks and volatile financial markets. Although largely limited to information retrieval and extraction, Machine Learning applications targeted at the legal domain have to some extent become mainstream. The startup market is rife with legal technology providers with many major law firms encouraging research and development through formal legal technology incubator programs. Experienced legal professionals are expected to become technologically astute as part of their response to the ‘more for less' challenge, while legal professionals on track to enter the legal services industry are encouraged to broaden their skill sets beyond a traditional law degree. Predictive analytics applied to judicial decision-making raise interesting discussions around potential benefits to the general public, over-burdened judicial systems and legal professionals respectively. It is also associated with limitations and challenges around manual input required (in the absence of automatic extraction and prediction) and domain-specific application. While there is no ‘one size fits all' solution when considering predictive analytics across legal domains or different countries' legal systems, this dissertation aims to provide an overview of Machine Learning techniques which could be applied in further research, to start unlocking the benefits associated with predictive analytics on a greater (and hopefully local) scale.
APA, Harvard, Vancouver, ISO, and other styles
6

Gligorijevic, Djordje. "Predictive Uncertainty Quantification and Explainable Machine Learning in Healthcare." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/520057.

Full text
Abstract:
Computer and Information Science<br>Ph.D.<br>Predictive modeling is an ever-increasingly important part of decision making. The advances in Machine Learning predictive modeling have spread across many domains bringing significant improvements in performance and providing unique opportunities for novel discoveries. A notably important domains of the human world are medical and healthcare domains, which take care of peoples' wellbeing. And while being one of the most developed areas of science with active research, there are many ways they can be improved. In particular, novel tools developed based on Machine Learning theory have drawn benefits across many areas of clinical practice, pushing the boundaries of medical science and directly affecting well-being of millions of patients. Additionally, healthcare and medicine domains require predictive modeling to anticipate and overcome many obstacles that future may hold. These kinds of applications employ a precise decision--making processes which requires accurate predictions. However, good prediction by its own is often insufficient. There has been no major focus in developing algorithms with good quality uncertainty estimates. Ergo, this thesis aims at providing a variety of ways to incorporate solutions by learning high quality uncertainty estimates or providing interpretability of the models where needed for purpose of improving existing tools built in practice and allowing many other tools to be used where uncertainty is the key factor for decision making. The first part of the thesis proposes approaches for learning high quality uncertainty estimates for both short- and long-term predictions in multi-task learning, developed on top for continuous probabilistic graphical models. In many scenarios, especially in long--term predictions, it may be of great importance for the models to provide a reliability flag in order to be accepted by domain experts. To this end we explored a widely applied structured regression model with a goal of providing meaningful uncertainty estimations on various predictive tasks. Our particular interest is in modeling uncertainty propagation while predicting far in the future. To address this important problem, our approach centers around providing an uncertainty estimate by modeling input features as random variables. This allows modeling uncertainty from noisy inputs. In cases when model iteratively produces errors it should propagate uncertainty over the predictive horizon, which may provide invaluable information for decision making based on predictions. In the second part of the thesis we propose novel neural embedding models for learning low-dimensional embeddings of medical concepts, such are diseases and genes, and show how they can be interpreted to allow accessing their quality, and show how can they be used to solve many problems in medical and healthcare research. We use EHR data to discover novel relationships between diseases by studying their comorbidities (i.e., co-occurrences in patients). We trained our models on a large-scale EHR database comprising more than 35 million inpatient cases. To confirm value and potential of the proposed approach we evaluate its effectiveness on a held-out set. Furthermore, for select diseases we provide a candidate gene list for which disease-gene associations were not studied previously, allowing biomedical researchers to better focus their often very costly lab studies. We furthermore examine how disease heterogeneity can affect the quality of learned embeddings and propose an approach for learning types of such heterogeneous diseases, while in our study we primarily focus on learning types of sepsis. Finally, we evaluate the quality of low-dimensional embeddings on tasks of predicting hospital quality indicators such as length of stay, total charges and mortality likelihood, demonstrating their superiority over other approaches. In the third part of the thesis we focus on decision making in medicine and healthcare domain by developing state-of-the-art deep learning models capable of outperforming human performance while maintaining good interpretability and uncertainty estimates.<br>Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
7

Murphy, Killian. "Predictive maintenance of network equipment using machine learning methods." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAS013.

Full text
Abstract:
Avec la montée en puissance des capacités de calcul nécessaires pour les méthodes plus développées d'Apprentissage Machine (ML), la Prédiction des Incidents Réseau (NFP:Network Fault Prediction) connait un regain d'intérêt scientifique. La capacité de prédire les incidents des équipements réseau est de plus en plus fréquemment identifiée comme un moyen efficace d'améliorer la fiabilité du réseau. Cette capacité prédictive peut être utilisée pour atténuer ou mettre en œuvre une maintenance prédictive en prévision des cas d'incidents réseau imminents. Cela pourrait contribuer à la mise en œuvre de réseaux sans défaillance et sans pertes, et permettre aux applications critiques d'être exécutées sur des réseaux de plus grandes dimensions et hétérogènes. Dans ce manuscrit, nous nous proposons de contribuer au domaine du NFP en nous focalisant sur la prédiction des alertes réseau. Dans un premier temps, nous présentons une étude de l'état de l'art complet du NFP en utilisant des méthodes d'apprentissage machine (ML) entièrement dédiée aux réseaux de télécommunications. Ensuite, nous établissons de futures directions de recherche dans le domaine. Dans un deuxième temps, nous proposons et étudions un couple de métriques (Réduction des coûts de maintenance, et mesure des gains de Qualité de Service) de performances de ML adaptées au NFP dans le cadre de la maintenance des réseaux. Dans un troisième temps, nous décrivons l'architecture complète de traitement des données, incluant l'infrastructure réseau et logicielle, et la chaîne de prétraitement des données nécessaires au ML qui ont été mis en œuvre chez SPIE ICS, société d'intégration de réseaux et de systèmes. Nous décrivons également avec précision le modèle du problème d'alarme et d'incidents. Dans un quatrième temps, nous établissons une comparaison des différentes méthodes de ML appliquées à notre jeu de données. Nous considérons des méthodes conventionnelles de ML, basés sur des arbres de décision, des perceptrons multicouches et des Séparateurs à Vastes Marges. Nous testons la généralisation des performances des modèles par rapport aux différents types d'équipements, ainsi que les généralisations en ML des modèles de ML et des paramètres proposés. Ensuite, nous étudions avec succès les architectures de ML à entrée séquentielle - Réseaux de neurones convolutifs et Long Short Term Memory - dans le cas de données SNMP séquentielles sur notre ensemble de données. Finalement, nous étudions l'impact de la définition de l'horizon de prédiction (et des variables arbitraires associées) sur la performance de prédiction des modèles ML<br>With the improvement of computation power necessary for advanced applications of Machine Learning (ML), Network Fault Prediction (NFP) experiences a renewed scientific interest. The ability to predict network equipment failure is increasingly identified as an effective means to improve network reliability. This predictive capability can be used, to mitigate or to enact predictive maintenance on incoming network failures. This could contribute to establishing zero-failure networks and allow safety-critical applications to run over higher dimension and heterogeneous networks.In this PhD thesis, we propose to contribute to the NFP field by focusing on network alarm prediction. First, we present a comprehensive survey on NFP using Machine Learning (ML) methods entirely dedicated to telecommunication networks, and determine new directions for research in the field. Second, we propose and study a set of Machine Learning performance metrics (maintenance cost reduction and Quality of Service improvement) adapted to NFP in the context of network maintenance. Third, we describe the complete data processing architecture, including the network and software infrastructure, and the necessary data preprocessing pipeline that was implemented at SPIE ICS, Networks and Systems Integrator. We also describe the alarm or failure prediction problem model precisely. Fourth, we establish a benchmark of the different ML solutions applied to our dataset. We consider Decision Tree-based methods, Multi-Layer Perceptron and Support Vector Machines. We test the generalization of performance prediction across equipment types as well as normal ML generalization of the proposed models and parameters.Then, we apply sequential - Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) - ML architectures with success on our sequential SNMP dataset. Finally, we study the impact of the definition of the prediction horizon (and associated arbitrary timeframes) on the ML model prediction performance
APA, Harvard, Vancouver, ISO, and other styles
8

PROVEZZA, LUCA. "Predictive diagnostics through machine learning on the injection group of a diecasting machine." Doctoral thesis, Università degli studi di Brescia, 2022. http://hdl.handle.net/11379/559976.

Full text
Abstract:
L'analisi dei big data ha sempre più preso un ruolo rilevante nello scenario industriale degli ultimi decenni. Il "Data Lake" rappresenta una nuova frontiera in data science. Non si vuole solo immagazzinare dati, ma analizzarli con il fine di applicare procedure correttive in tempo per poter evitare stop di produzione e far crescere la produttività di un'azienda. Ogni campo della catena produttiva è importante per poter crescere la produttività. In particolare, la manutenzione è uno dei tasks più importanti di cui tenere in considerazione per queste analisi. Infatti, i costi dovuti alla manutenzione sono la maggior parte dei costi totali di un impianto di produzione. Questi costi devono essere ridotti mediante diverse strategie. La nuova frontiera nella strategia di manutenzione è rappresentata dalla Manutenzione Predittiva (PM) o anche definita, Predictive Health Management (PHM). La PHM è una strategia di manutenzione dove vengono applicati algoritmi statistici o algoritmi di machine learning per ottenere la "Remaining Useful Life" (RUL) di un componente. Questo progetto è focalizzato nell'applicazione della PHM nel gruppo iniezione di una macchina di pressocolata. Per sua definizione, il processo di pressocolata ad alta pressione (HPDC) presenta aspetti differenti che possono andare ad inficiare sull'analisi dei dati. Per esempio, il malfunzionamento di un componente è un evento raro e l'analisi non può essere eseguita andando ad investigare ampi dataset, oppure investigando i dati di fault basandoci sui registri di manutenzione di un'azienda. Questo rende difficoltoso indentificare condizioni di fault dei componenti utilizzando i tradizionali algoritmi di machine learning. Un ulteriore problema legato con il processo di HPDC sta nel frequente cambio di produzione, che porta a cambiamenti radicali nei parametri di processo. Inoltre, può capitare che le aziende di piccole dimensioni non facciano il corretto update dei dati di produzione (cambio di stampo, cambio ricetta nell'iniezione). Per risolvere queste problematiche, viene proposto un nuovo metodo per determinare le condizioni di fault di componenti in una macchina di pressocolata. Il metodo proposto è in grado di determinare automaticamente un cambio di produzione e resettare il dataset utilizzato per il training. Il metodo si basa sulla peculiarità del processo di pressocolata di avere differenti fasi che risultano essere eguali per ogni tipo di macchina e produzione. Queste fasi sono, l'avanzamento lento del pistone per l'iniezione in modo da evitare bolle d'aria nella camera di iniezione, l'avanzamento veloce del pistone con il completo riempimento dello stampo, e la fase di moltiplica dando una maggiore pressione al processo di pressofusione. Questo aumento di pressione serve per compensare il ritiro del materiale dovuto al raffreddamento dopo l'iniezione. Ogni fase viene interpolata in modo da estrarre parametri significativi per la futura predizione del fault. Per ogni parametro, viene calcolato uno stimatore dell'incertezza, che viene combinato con l'incertezza della strumentazione per ottenere un'incertezza estesa che tenga in considerazione dei due contributi. Il core di questo metodo si concretizza proprio nel calcolo della metrica finale per poter monitorare lo stato di salute di un componente. Infatti, il metodo si basa sulla combinazione della classica analisi statistica con delle matrici peso date dagli esperti del settore della manutenzione di questa tipologia di macchina. Il risultato finale è l'Health Index (HI) che rappresenta la probabilità di avere una condizione di fault per la macchina di pressocolata. Ogni matrice peso che viene combinata con i parametri estratti si traduce in un HI per quel componente. In questo modo, è possibile creare tanti Heath Index possibili utilizzando una specifica matrice peso, che è possibile costruire mediante le interviste degli esperti.<br>In the last decades, data analysis becomes relevant in the industrial scenario. The data lake represents the new frontier in the data science. The new concept is not only the data storage anymore, but the possibility to analyse the historical data in order to optimize the production by finding bottle necks in the production chain and solving the problem by applying corrective procedures to increase the productivity of a company. Every field in the production chain is important to increase the productivity. Maintenance is one of the most important tasks to take into in account. Indeed, maintenance costs are a major part of the total operating costs of all manufacturing or production plants. These costs must be reduced by applying different strategies. The new frontier in the maintenance strategy is represented by the Predictive Maintenance (PM) or Predictive Health Management (PHM). PHM is a maintenance strategy in which different statistical algorithms or machine learning algorithms can be applied to obtain the Remaining Useful Life (RUL) of a component. This project is focused on the application of the PHM on an injection group of a die casting machine. By this own definition, the High Pressure Die Casting (HPDC) process presents different aspects that can affect the analysis. For instance, the fault of components is a rare event, and the analysis cannot be performed by investigating large datasets or fault data based on maintenance records. This makes very difficult to detect the fault of components with traditional machine learning algorithms. A further problem, however, linked with HPDC process is in the frequent change in production, which leads to changes in the process parameters. Moreover, sometimes small companies do not correctly update the production identifiers. To solve these problems, a new method is proposed to detect the fault of components in a diecasting machine. The proposed method automatically detects a production change and resets each time the dataset used for training. The method is based on the peculiarity of the die casting process that presents different phases equal to each machine and production considered. These phases are the slow motion of the piston to avoid air bubbles inside the injection chamber, the stroke with the filling of the die, and the multiplication phase to compensate the shrinkage of the material due to the cooling by giving more pressure in the process. Each phase is interpolated to extract sensitive parameters to perform the prediction of fault. For each parameter, an uncertainty estimator is recorded and combined with the uncertainty of the instrumentation to obtain an uncertainty that considers the two contributions. The core of this method is in the combination of the classical prediction analysis with a weighing matrix given by the experts. The weights are determined in a series of formal interviews for each phase and quantity recorded. The result is the Health Index (HI) representing the probability of different types of faults in the diecasting machine. Each weighing matrix combined with the parameters extracted is a HI for that component and it is possible to create how many HIs as possible by using a proper weighing matrix that can be constructed through the interview of the experts.
APA, Harvard, Vancouver, ISO, and other styles
9

PROVEZZA, LUCA. "Predictive diagnostics through machine learning on the injection group of a diecasting machine." Doctoral thesis, Università degli studi di Brescia, 2022. http://hdl.handle.net/11379/559959.

Full text
Abstract:
L'analisi dei big data ha sempre più preso un ruolo rilevante nello scenario industriale degli ultimi decenni. Il "Data Lake" rappresenta una nuova frontiera in data science. Non si vuole solo immagazzinare dati, ma analizzarli con il fine di applicare procedure correttive in tempo per poter evitare stop di produzione e far crescere la produttività di un'azienda. Ogni campo della catena produttiva è importante per poter crescere la produttività. In particolare, la manutenzione è uno dei tasks più importanti di cui tenere in considerazione per queste analisi. Infatti, i costi dovuti alla manutenzione sono la maggior parte dei costi totali di un impianto di produzione. Questi costi devono essere ridotti mediante diverse strategie. La nuova frontiera nella strategia di manutenzione è rappresentata dalla Manutenzione Predittiva (PM) o anche definita, Predictive Health Management (PHM). La PHM è una strategia di manutenzione dove vengono applicati algoritmi statistici o algoritmi di machine learning per ottenere la "Remaining Useful Life" (RUL) di un componente. Questo progetto è focalizzato nell'applicazione della PHM nel gruppo iniezione di una macchina di pressocolata. Per sua definizione, il processo di pressocolata ad alta pressione (HPDC) presenta aspetti differenti che possono andare ad inficiare sull'analisi dei dati. Per esempio, il malfunzionamento di un componente è un evento raro e l'analisi non può essere eseguita andando ad investigare ampi dataset, oppure investigando i dati di fault basandoci sui registri di manutenzione di un'azienda. Questo rende difficoltoso indentificare condizioni di fault dei componenti utilizzando i tradizionali algoritmi di machine learning. Un ulteriore problema legato con il processo di HPDC sta nel frequente cambio di produzione, che porta a cambiamenti radicali nei parametri di processo. Inoltre, può capitare che le aziende di piccole dimensioni non facciano il corretto update dei dati di produzione (cambio di stampo, cambio ricetta nell'iniezione). Per risolvere queste problematiche, viene proposto un nuovo metodo per determinare le condizioni di fault di componenti in una macchina di pressocolata. Il metodo proposto è in grado di determinare automaticamente un cambio di produzione e resettare il dataset utilizzato per il training. Il metodo si basa sulla peculiarità del processo di pressocolata di avere differenti fasi che risultano essere eguali per ogni tipo di macchina e produzione. Queste fasi sono, l'avanzamento lento del pistone per l'iniezione in modo da evitare bolle d'aria nella camera di iniezione, l'avanzamento veloce del pistone con il completo riempimento dello stampo, e la fase di moltiplica dando una maggiore pressione al processo di pressofusione. Questo aumento di pressione serve per compensare il ritiro del materiale dovuto al raffreddamento dopo l'iniezione. Ogni fase viene interpolata in modo da estrarre parametri significativi per la futura predizione del fault. Per ogni parametro, viene calcolato uno stimatore dell'incertezza, che viene combinato con l'incertezza della strumentazione per ottenere un'incertezza estesa che tenga in considerazione dei due contributi. Il core di questo metodo si concretizza proprio nel calcolo della metrica finale per poter monitorare lo stato di salute di un componente. Infatti, il metodo si basa sulla combinazione della classica analisi statistica con delle matrici peso date dagli esperti del settore della manutenzione di questa tipologia di macchina. Il risultato finale è l'Health Index (HI) che rappresenta la probabilità di avere una condizione di fault per la macchina di pressocolata. Ogni matrice peso che viene combinata con i parametri estratti si traduce in un HI per quel componente. In questo modo, è possibile creare tanti Heath Index possibili utilizzando una specifica matrice peso, che è possibile costruire mediante le interviste degli esperti.<br>In the last decades, data analysis becomes relevant in the industrial scenario. The data lake represents the new frontier in the data science. The new concept is not only the data storage anymore, but the possibility to analyse the historical data in order to optimize the production by finding bottle necks in the production chain and solving the problem by applying corrective procedures to increase the productivity of a company. Every field in the production chain is important to increase the productivity. Maintenance is one of the most important tasks to take into in account. Indeed, maintenance costs are a major part of the total operating costs of all manufacturing or production plants. These costs must be reduced by applying different strategies. The new frontier in the maintenance strategy is represented by the Predictive Maintenance (PM) or Predictive Health Management (PHM). PHM is a maintenance strategy in which different statistical algorithms or machine learning algorithms can be applied to obtain the Remaining Useful Life (RUL) of a component. This project is focused on the application of the PHM on an injection group of a die casting machine. By this own definition, the High Pressure Die Casting (HPDC) process presents different aspects that can affect the analysis. For instance, the fault of components is a rare event, and the analysis cannot be performed by investigating large datasets or fault data based on maintenance records. This makes very difficult to detect the fault of components with traditional machine learning algorithms. A further problem, however, linked with HPDC process is in the frequent change in production, which leads to changes in the process parameters. Moreover, sometimes small companies do not correctly update the production identifiers. To solve these problems, a new method is proposed to detect the fault of components in a diecasting machine. The proposed method automatically detects a production change and resets each time the dataset used for training. The method is based on the peculiarity of the die casting process that presents different phases equal to each machine and production considered. These phases are the slow motion of the piston to avoid air bubbles inside the injection chamber, the stroke with the filling of the die, and the multiplication phase to compensate the shrinkage of the material due to the cooling by giving more pressure in the process. Each phase is interpolated to extract sensitive parameters to perform the prediction of fault. For each parameter, an uncertainty estimator is recorded and combined with the uncertainty of the instrumentation to obtain an uncertainty that considers the two contributions. The core of this method is in the combination of the classical prediction analysis with a weighing matrix given by the experts. The weights are determined in a series of formal interviews for each phase and quantity recorded. The result is the Health Index (HI) representing the probability of different types of faults in the diecasting machine. Each weighing matrix combined with the parameters extracted is a HI for that component and it is possible to create how many HIs as possible by using a proper weighing matrix that can be constructed through the interview of the experts.
APA, Harvard, Vancouver, ISO, and other styles
10

PROVEZZA, LUCA. "Predictive diagnostics through machine learning on the injection group of a diecasting machine." Doctoral thesis, Università degli studi di Brescia, 2022. http://hdl.handle.net/11379/559956.

Full text
Abstract:
L'analisi dei big data ha sempre più preso un ruolo rilevante nello scenario industriale degli ultimi decenni. Il "Data Lake" rappresenta una nuova frontiera in data science. Non si vuole solo immagazzinare dati, ma analizzarli con il fine di applicare procedure correttive in tempo per poter evitare stop di produzione e far crescere la produttività di un'azienda. Ogni campo della catena produttiva è importante per poter crescere la produttività. In particolare, la manutenzione è uno dei tasks più importanti di cui tenere in considerazione per queste analisi. Infatti, i costi dovuti alla manutenzione sono la maggior parte dei costi totali di un impianto di produzione. Questi costi devono essere ridotti mediante diverse strategie. La nuova frontiera nella strategia di manutenzione è rappresentata dalla Manutenzione Predittiva (PM) o anche definita, Predictive Health Management (PHM). La PHM è una strategia di manutenzione dove vengono applicati algoritmi statistici o algoritmi di machine learning per ottenere la "Remaining Useful Life" (RUL) di un componente. Questo progetto è focalizzato nell'applicazione della PHM nel gruppo iniezione di una macchina di pressocolata. Per sua definizione, il processo di pressocolata ad alta pressione (HPDC) presenta aspetti differenti che possono andare ad inficiare sull'analisi dei dati. Per esempio, il malfunzionamento di un componente è un evento raro e l'analisi non può essere eseguita andando ad investigare ampi dataset, oppure investigando i dati di fault basandoci sui registri di manutenzione di un'azienda. Questo rende difficoltoso indentificare condizioni di fault dei componenti utilizzando i tradizionali algoritmi di machine learning. Un ulteriore problema legato con il processo di HPDC sta nel frequente cambio di produzione, che porta a cambiamenti radicali nei parametri di processo. Inoltre, può capitare che le aziende di piccole dimensioni non facciano il corretto update dei dati di produzione (cambio di stampo, cambio ricetta nell'iniezione). Per risolvere queste problematiche, viene proposto un nuovo metodo per determinare le condizioni di fault di componenti in una macchina di pressocolata. Il metodo proposto è in grado di determinare automaticamente un cambio di produzione e resettare il dataset utilizzato per il training. Il metodo si basa sulla peculiarità del processo di pressocolata di avere differenti fasi che risultano essere eguali per ogni tipo di macchina e produzione. Queste fasi sono, l'avanzamento lento del pistone per l'iniezione in modo da evitare bolle d'aria nella camera di iniezione, l'avanzamento veloce del pistone con il completo riempimento dello stampo, e la fase di moltiplica dando una maggiore pressione al processo di pressofusione. Questo aumento di pressione serve per compensare il ritiro del materiale dovuto al raffreddamento dopo l'iniezione. Ogni fase viene interpolata in modo da estrarre parametri significativi per la futura predizione del fault. Per ogni parametro, viene calcolato uno stimatore dell'incertezza, che viene combinato con l'incertezza della strumentazione per ottenere un'incertezza estesa che tenga in considerazione dei due contributi. Il core di questo metodo si concretizza proprio nel calcolo della metrica finale per poter monitorare lo stato di salute di un componente. Infatti, il metodo si basa sulla combinazione della classica analisi statistica con delle matrici peso date dagli esperti del settore della manutenzione di questa tipologia di macchina. Il risultato finale è l'Health Index (HI) che rappresenta la probabilità di avere una condizione di fault per la macchina di pressocolata. Ogni matrice peso che viene combinata con i parametri estratti si traduce in un HI per quel componente. In questo modo, è possibile creare tanti Heath Index possibili utilizzando una specifica matrice peso, che è possibile costruire mediante le interviste degli esperti.<br>In the last decades, data analysis becomes relevant in the industrial scenario. The data lake represents the new frontier in the data science. The new concept is not only the data storage anymore, but the possibility to analyse the historical data in order to optimize the production by finding bottle necks in the production chain and solving the problem by applying corrective procedures to increase the productivity of a company. Every field in the production chain is important to increase the productivity. Maintenance is one of the most important tasks to take into in account. Indeed, maintenance costs are a major part of the total operating costs of all manufacturing or production plants. These costs must be reduced by applying different strategies. The new frontier in the maintenance strategy is represented by the Predictive Maintenance (PM) or Predictive Health Management (PHM). PHM is a maintenance strategy in which different statistical algorithms or machine learning algorithms can be applied to obtain the Remaining Useful Life (RUL) of a component. This project is focused on the application of the PHM on an injection group of a die casting machine. By this own definition, the High Pressure Die Casting (HPDC) process presents different aspects that can affect the analysis. For instance, the fault of components is a rare event, and the analysis cannot be performed by investigating large datasets or fault data based on maintenance records. This makes very difficult to detect the fault of components with traditional machine learning algorithms. A further problem, however, linked with HPDC process is in the frequent change in production, which leads to changes in the process parameters. Moreover, sometimes small companies do not correctly update the production identifiers. To solve these problems, a new method is proposed to detect the fault of components in a diecasting machine. The proposed method automatically detects a production change and resets each time the dataset used for training. The method is based on the peculiarity of the die casting process that presents different phases equal to each machine and production considered. These phases are the slow motion of the piston to avoid air bubbles inside the injection chamber, the stroke with the filling of the die, and the multiplication phase to compensate the shrinkage of the material due to the cooling by giving more pressure in the process. Each phase is interpolated to extract sensitive parameters to perform the prediction of fault. For each parameter, an uncertainty estimator is recorded and combined with the uncertainty of the instrumentation to obtain an uncertainty that considers the two contributions. The core of this method is in the combination of the classical prediction analysis with a weighing matrix given by the experts. The weights are determined in a series of formal interviews for each phase and quantity recorded. The result is the Health Index (HI) representing the probability of different types of faults in the diecasting machine. Each weighing matrix combined with the parameters extracted is a HI for that component and it is possible to create how many HIs as possible by using a proper weighing matrix that can be constructed through the interview of the experts.
APA, Harvard, Vancouver, ISO, and other styles
11

Cahill, Jaspar. "Machine learning techniques to improve software quality." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/41730/1/Jaspar_Cahill_Thesis.pdf.

Full text
Abstract:
A significant proportion of the cost of software development is due to software testing and maintenance. This is in part the result of the inevitable imperfections due to human error, lack of quality during the design and coding of software, and the increasing need to reduce faults to improve customer satisfaction in a competitive marketplace. Given the cost and importance of removing errors improvements in fault detection and removal can be of significant benefit. The earlier in the development process faults can be found, the less it costs to correct them and the less likely other faults are to develop. This research aims to make the testing process more efficient and effective by identifying those software modules most likely to contain faults, allowing testing efforts to be carefully targeted. This is done with the use of machine learning algorithms which use examples of fault prone and not fault prone modules to develop predictive models of quality. In order to learn the numerical mapping between module and classification, a module is represented in terms of software metrics. A difficulty in this sort of problem is sourcing software engineering data of adequate quality. In this work, data is obtained from two sources, the NASA Metrics Data Program, and the open source Eclipse project. Feature selection before learning is applied, and in this area a number of different feature selection methods are applied to find which work best. Two machine learning algorithms are applied to the data - Naive Bayes and the Support Vector Machine - and predictive results are compared to those of previous efforts and found to be superior on selected data sets and comparable on others. In addition, a new classification method is proposed, Rank Sum, in which a ranking abstraction is laid over bin densities for each class, and a classification is determined based on the sum of ranks over features. A novel extension of this method is also described based on an observed polarising of points by class when rank sum is applied to training data to convert it into 2D rank sum space. SVM is applied to this transformed data to produce models the parameters of which can be set according to trade-off curves to obtain a particular performance trade-off.
APA, Harvard, Vancouver, ISO, and other styles
12

Swedish, Tristan Breaden. "Expert-free eye alignment and machine learning for predictive health." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/112543.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 67-72).<br>This thesis documents the development of an "expert-free" device in order to realize a system for scalable screening of the eye fundus. The goal of this work is to demonstrate enabling technologies that remove dependence on expert operators and explore the usefulness of this approach in the context of scalable health screening. I will present a system that includes a novel method for eye self-alignment and automatic image analysis and evaluate its effectiveness when applied to a case study of a diabetic retinopathy screening program. This work is inspired by advances in machine learning that makes accessible interactions previously confined to specialized environments and trained users. I will also suggest some new directions for future work based on this expert-free paradigm.<br>by Tristan Breaden Swedish.<br>S.M.
APA, Harvard, Vancouver, ISO, and other styles
13

Lindström, Johan. "Predictive maintenance for a wood chipper using supervised machine learning." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-149304.

Full text
Abstract:
With a predictive model that can predict failures of a manufacturing machine, many benefits can be obtained. Unnecessary downtime and accidents can be avoided. In this study a wood chipper which has 12 replaceable knives was examined. The specific task was to create a predictive model that can predict if a knife change is needed or not. To create a predictive model, supervised machine learning was used. Decision forest was the algorithm used in this study. Data samples were collected from vibration measurements. Each sample was labeled with help of ocular inspections of the knives. Microsoft Azure learning studio was the workspace used to train all models. The data set acquired consist of 106 samples, were only 9 samples belongs to the minority class. Two strategies of training a model were used, with and without oversampling. The result for the best model without oversampling obtained 87.5% precision and 77.8% recall. The best model with oversampling achieved 79% precision and 86.7% recall. This result indicates that the trained models can be useful. However, the validity of the result has been hurt by a small data set and many uncertainness of acquiring the data set.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Katherine(Katherine Yuchen). "A machine learning framework for predictive maintenance of wind turbines." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129927.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020<br>Cataloged from student-submitted PDF of thesis.<br>Includes bibliographical references (pages 73-75).<br>Wind energy is one of the fastest growing energy sources in the world. However, the failure to detect the breakdown of turbine parts can be very costly. Wind energy companies have increasingly turned to machine learning to improve wind turbine reliability. Thus, the goal of this thesis is to create a flexible and extensible machine learning framework that enables wind energy experts to define and build models for the predictive maintenance of wind turbines. We contribute two libraries that provide experts with the necessary tools to solve prediction problems in the wind energy industry. The first is GPE, which translates and uses the desired prediction problem to generate machine learning training examples from turbine operations data. The other library, CMS-ML, provides the architecture for building machine learning models using vibration data generated by turbine sensors within the Condition Monitoring System (CMS). With this architecture, we can easily create modular feature engineering and machine learning pipelines for the CMS signal data. Finally, we demonstrate the application of these two libraries on proprietary wind turbine data and analyze the effects of their parameters.<br>by Katherine Wang.<br>M. Eng.<br>M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
15

Le, Nguyen Minh Huong. "Online machine learning-based predictive maintenance for the railway industry." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT027.

Full text
Abstract:
En tant que moyen de transport en commun efficace sur de longues distances, le chemin de fer continuera de prospérer pour son empreinte carbone limitée dans l'environnement. Assurer la fiabilité des équipements et la sécurité des passagers fait ressortir la nécessité d'une maintenance efficace. Outre la maintenance corrective et périodique courante, la maintenance prédictive a pris de l'importance ces derniers temps. Les progrès récents de l'apprentissage automatique et l'abondance de données poussent les praticiens à la maintenance prédictive basée sur les données. La pratique courante consiste à collecter des données pour former un modèle d'apprentissage automatique, puis à déployer le modèle pour la production et à le conserver inchangé par la suite. Nous soutenons qu'une telle pratique est sous-optimale sur un flux de données. Le caractère illimité du flux rend le modèle sujet à un apprentissage incomplet. Les changements dynamiques sur le flux introduisent de nouveaux concepts invisibles pour le modèle et diminuent sa précision. La vitesse du flux rend l'étiquetage manuel impossible et désactive les algorithmes d'apprentissage supervisé. Par conséquent, il est nécessaire de passer d'un paradigme d'apprentissage statique et hors ligne à un paradigme adaptatif en ligne, en particulier lorsque de nouvelles générations de trains connectés générant en continu des données de capteurs sont déjà une réalité. Nous étudions l'applicabilité de l'apprentissage automatique en ligne pour la maintenance prédictive sur des systèmes complexes typiques du secteur ferroviaire. Tout d'abord, nous développons InterCE en tant que framework basé sur l'apprentissage actif pour extraire des cycles d'un flux non étiqueté en interagissant avec un expert humain. Ensuite, nous implémentons un auto-encodeur à mémoire longue et courte durée pour transformer les cycles extraits en vecteurs de caractéristiques plus compacts tout en restant représentatifs. Enfin, nous concevons CheMoc comme un framework pour surveiller en permanence l'état des systèmes en utilisant le clustering adaptatif en ligne. Nos méthodes sont évaluées sur les systèmes d'accès voyageurs sur deux flottes de trains gérés par la société nationale des chemins de fer SNCF de la France<br>Being an effective long-distance mass transit, the railway will continue to flourish for its limited carbon footprint in the environment. Ensuring the equipment's reliability and passenger safety brings forth the need for efficient maintenance. Apart from the prevalence of corrective and periodic maintenance, predictive maintenance has come into prominence lately. Recent advances in machine learning and the abundance of data drive practitioners to data-driven predictive maintenance. The common practice is to collect data to train a machine learning model, then deploy the model for production and keep it unchanged afterward. We argue that such practice is suboptimal on a data stream. The unboundedness of the stream makes the model prone to incomplete learning. Dynamic changes on the stream introduce novel concepts unseen by the model and decrease its accuracy. The velocity of the stream makes manual labeling infeasible and disables supervised learning algorithms. Therefore, switching from a static, offline learning paradigm to an adaptive, online one is necessary, especially when new generations of connected trains continuously generating sensor data have already been a reality. We investigate the applicability of online machine learning for predictive maintenance on typical complex systems in the railway. First, we develop InterCE as an active learning-based framework that extracts cycles from an unlabeled stream by interacting with a human expert. Then, we implement a long short-term memory autoencoder to transform the extracted cycles into feature vectors that are more compact yet remain representative. Finally, we design CheMoc as a framework that continuously monitors the condition of the systems using online adaptive clustering. Our methods are evaluated on the passenger access systems on two fleets of passenger trains managed by the national railway company SNCF of France
APA, Harvard, Vancouver, ISO, and other styles
16

Löfström, Tuwe. "On Effectively Creating Ensembles of Classifiers : Studies on Creation Strategies, Diversity and Predicting with Confidence." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-116683.

Full text
Abstract:
An ensemble is a composite model, combining the predictions from several other models. Ensembles are known to be more accurate than single models. Diversity has been identified as an important factor in explaining the success of ensembles. In the context of classification, diversity has not been well defined, and several heuristic diversity measures have been proposed. The focus of this thesis is on how to create effective ensembles in the context of classification. Even though several effective ensemble algorithms have been proposed, there are still several open questions regarding the role diversity plays when creating an effective ensemble. Open questions relating to creating effective ensembles that are addressed include: what to optimize when trying to find an ensemble using a subset of models used by the original ensemble that is more effective than the original ensemble; how effective is it to search for such a sub-ensemble; how should the neural networks used in an ensemble be trained for the ensemble to be effective? The contributions of the thesis include several studies evaluating different ways to optimize which sub-ensemble would be most effective, including a novel approach using combinations of performance and diversity measures. The contributions of the initial studies presented in the thesis eventually resulted in an investigation of the underlying assumption motivating the search for more effective sub-ensembles. The evaluation concluded that even if several more effective sub-ensembles exist, it may not be possible to identify which sub-ensembles would be the most effective using any of the evaluated optimization measures. An investigation of the most effective ways to train neural networks to be used in ensembles was also performed. The conclusions are that effective ensembles can be obtained by training neural networks in a number of different ways but that high average individual accuracy or much diversity both would generate effective ensembles. Several findings regarding diversity and effective ensembles presented in the literature in recent years are also discussed and related to the results of the included studies. When creating confidence based predictors using conformal prediction, there are several open questions regarding how data should be utilized effectively when using ensembles. Open questions related to predicting with confidence that are addressed include: how can data be utilized effectively to achieve more efficient confidence based predictions using ensembles; how do problems with class imbalance affect the confidence based predictions when using conformal prediction? Contributions include two studies where it is shown in the first that the use of out-of-bag estimates when using bagging ensembles results in more effective conformal predictors and it is shown in the second that a conformal predictor conditioned on the class labels to avoid a strong bias towards the majority class is more effective on problems with class imbalance. The research method used is mainly inspired by the design science paradigm, which is manifested by the development and evaluation of artifacts.<br>En ensemble är en sammansatt modell som kombinerar prediktionerna från flera olika modeller. Det är välkänt att ensembler är mer träffsäkra än enskilda modeller. Diversitet har identifierats som en viktig faktor för att förklara varför ensembler är så framgångsrika. Diversitet hade fram tills nyligen inte definierats entydigt för klassificering vilket resulterade i att många heuristiska diverstitetsmått har föreslagits. Den här avhandlingen fokuserar på hur klassificeringsensembler kan skapas på ett ändamålsenligt (eng. effective) sätt. Den vetenskapliga metoden är huvudsakligen inspirerad av design science-paradigmet vilket lämpar sig väl för utveckling och evaluering av IT-artefakter. Det finns sedan tidigare många framgångsrika ensembleralgoritmer men trots det så finns det fortfarande vissa frågetecken kring vilken roll diversitet spelar vid skapande av välpresterande (eng. effective) ensemblemodeller. Några av de frågor som berör diversitet som behandlas i avhandlingen inkluderar: Vad skall optimeras när man söker efter en delmängd av de tillgängliga modellerna för att försöka skapa en ensemble som är bättre än ensemblen bestående av samtliga modeller; Hur väl fungerar strategin att söka efter sådana delensembler; Hur skall neurala nätverk tränas för att fungera så bra som möjligt i en ensemble? Bidraget i avhandlingen inkluderar flera studier som utvärderar flera olika sätt att finna delensembler som är bättre än att använda hela ensemblen, inklusive ett nytt tillvägagångssätt som utnyttjar en kombination av både diversitets- och prestandamått. Resultaten i de första studierna ledde fram till att det underliggande antagandet som motiverar att söka efter delensembler undersöktes. Slutsatsen blev, trots att det fanns flera delensembler som var bättre än hela ensemblen, att det inte fanns något sätt att identifiera med tillgänglig data vilka de bättre delensemblerna var. Vidare undersöktes hur neurala nätverk bör tränas för att tillsammans samverka så väl som möjligt när de används i en ensemble. Slutsatserna från den undersökningen är att det är möjligt att skapa välpresterande ensembler både genom att ha många modeller som är antingen bra i genomsnitt eller olika varandra (dvs diversa). Insikter som har presenterats i litteraturen under de senaste åren diskuteras och relateras till resultaten i de inkluderade studierna. När man skapar konfidensbaserade modeller med hjälp av ett ramverk som kallas för conformal prediction så finns det flera frågor kring hur data bör utnyttjas på bästa sätt när man använder ensembler som behöver belysas. De frågor som relaterar till konfidensbaserad predicering inkluderar: Hur kan data utnyttjas på bästa sätt för att åstadkomma mer effektiva konfidensbaserade prediktioner med ensembler; Hur påverkar obalanserad datade konfidensbaserade prediktionerna när man använder conformal perdiction? Bidragen inkluderar två studier där resultaten i den första visar att det mest effektiva sättet att använda data när man har en baggingensemble är att använda sk out-of-bag estimeringar. Resultaten i den andra studien visar att obalanserad data behöver hanteras med hjälp av en klassvillkorad konfidensbaserad modell för att undvika en stark tendens att favorisera majoritetsklassen.<br><p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: In press.</p><br>Dataanalys för detektion av läkemedelseffekter (DADEL)
APA, Harvard, Vancouver, ISO, and other styles
17

Pardos, Zachary Alexander. "Predictive Models of Student Learning." Digital WPI, 2012. https://digitalcommons.wpi.edu/etd-dissertations/185.

Full text
Abstract:
In this dissertation, several approaches I have taken to build upon the student learning model are described. There are two focuses of this dissertation. The first focus is on improving the accuracy with which future student knowledge and performance can be predicted by individualizing the model to each student. The second focus is to predict how different educational content and tutorial strategies will influence student learning. The two focuses are complimentary but are approached from slightly different directions. I have found that Bayesian Networks, based on belief propagation, are strong at achieving the goals of both focuses. In prediction, they excel at capturing the temporal nature of data produced where student knowledge is changing over time. This concept of state change over time is very difficult to capture with classical machine learning approaches. Interpretability is also hard to come by with classical machine learning approaches; however, it is one of the strengths of Bayesian models and aids in studying the direct influence of various factors on learning. The domain in which these models are being studied is the domain of computer tutoring systems, software which uses artificial intelligence to enhance computer based tutorial instruction. These systems are growing in relevance. At their best they have been shown to achieve the same educational gain as one on one human interaction. Computer tutors have also received the attention of White House, which mentioned an tutoring platform called ASSISTments in its National Educational Technology Plan. With the fast paced adoption of these data driven systems it is important to learn how to improve the educational effectiveness of these systems by making sense of the data that is being generated from them. The studies in this proposal use data from these educational systems which primarily teach topics of Geometry and Algebra but can be applied to any domain with clearly defined sub-skills and dichotomous student response data. One of the intended impacts of this work is for these knowledge modeling contributions to facilitate the move towards computer adaptive learning in much the same way that Item Response Theory models facilitated the move towards computer adaptive testing.
APA, Harvard, Vancouver, ISO, and other styles
18

Ye, Chen S. M. Massachusetts Institute of Technology. "A system approach to implementation of predictive maintenance with machine learning." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118502.

Full text
Abstract:
Thesis: S.M. in Engineering and Management, Massachusetts Institute of Technology, System Design and Management Program, 2018.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 87-91).<br>Digital technology is changing the industrial sector, yet how to make rational use of some technologies and create considerable value in a variety of industrial scenarios is an issue. Many digital industrial companies have stated that they have helped clients with their digital transformation, create much value, but the real effects have not been shown in public. Venture capitals firms have made huge investment in potential digital industrial startups. Numerous industrial IoT platforms are emerging in the market, but a number of them fade soon after. Many people have heard about industrial maintenance technology, but they have difficulty in differentiate concepts such as reactive maintenance, planned maintenance, proactive maintenance, and predictive maintenance. Many people know that big data and Al are essential in industrial sector, but they do not know how to process, analyze, and extract value from industrial data and how to use Al algorithms and tools to implement a research project. This thesis analyzes the entire digital industrial ecosystem in various dimensions such as initiatives, technologies in related domains, stakeholders, markets, and strategies. This work also analyzes of the predictive maintenance solution in various dimensions such as background, importance, suitable scenarios, market, business model, and technology. The author plans an experiment for the predictive maintenance solution, including goal, data source and description, methods and steps, and flow and tools. Then author uses a baseline approach and an optimal approach to implement the experiment, including data preparation, selection and evaluation of both regression and classification models, and deep learning practice through neural network building and optimization. Finally, contributions and expectations, and limitations and future research are discussed. This work uses a system approach, including system architecting, system engineering, and project management, to complete the process of analysis, design, and implementation.<br>by Chen Ye.<br>S.M. in Engineering and Management
APA, Harvard, Vancouver, ISO, and other styles
19

Wu, Jinlong. "Predictive Turbulence Modeling with Bayesian Inference and Physics-Informed Machine Learning." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/85129.

Full text
Abstract:
Reynolds-Averaged Navier-Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high-fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled.<br>Ph. D.<br>Reynolds-Averaged Navier–Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled.
APA, Harvard, Vancouver, ISO, and other styles
20

Kantedal, Simon. "Evaluating Segmentation of MR Volumes Using Predictive Models and Machine Learning." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-171102.

Full text
Abstract:
A reliable evaluation system is essential for every automatic process. While techniques for automatic segmentation of images have been extensively researched in recent years, evaluation of the same has not received an equal amount of attention. Amra Medical AB has developed a system for automatic segmentation of magnetic resonance (MR) images of human bodies using an atlas-based approach. Through their software, Amra is able to derive body composition measurements, such as muscle and fat volumes, from the segmented MR images. As of now, the automatic segmentations are quality controlled by clinical experts to ensure their correctness. This thesis investigates the possibilities to leverage predictive modelling to reduce the need for a manual quality control (QC) step in an otherwise automatic process. Two different regression approaches have been implemented as a part of this study: body composition measurement prediction (BCMP) and manual correction prediction (MCP). BCMP aims at predicting the derived body composition measurements and comparing the predictions to actual measurements. The theory is that large deviations between the predictions and the measurements signify an erroneously segmented sample. MCP instead tries to directly predict the amount of manual correction needed for each sample. Several regression models have been implemented and evaluated for the two approaches. Comparison of the regression models shows that local linear regression (LLR) is the most performant model for both BCMP and MCP. The results show that the inaccuracies in the BCMP-models, in practice, renders this approach useless. MCP proved to be a far more viable approach; using MCP together with LLR achieves a high true positive rate with a reasonably low false positive rate for several body composition measurements. These results suggest that the type of system developed in this thesis has the potential to reduce the need for manual inspections of the automatic segmentation masks.
APA, Harvard, Vancouver, ISO, and other styles
21

Mirzaikamrani, Sonya. "Predictive modeling and classification for Stroke using the machine learning methods." Thesis, Örebro universitet, Handelshögskolan vid Örebro Universitet, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-81837.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Hedkvist, Adam. "Predictive maintenance with machine learning on weld joint analysed by ultrasound." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-396059.

Full text
Abstract:
Ever since the first industrial revolution industries have had the goal to increase their production. With new technology such as CPS, AI and IoT industries today are going through the fourth industrial revolution denoted as industry 4.0. The new technology not only revolutionises production, but also maintenance, making predictive maintenance possible. Predictive maintenance seeks to predict when failure would occur, instead of having scheduled maintenance or maintenance after failure already occurred. In this report a convolutional neural network (CNN) will analyse data from an ultrasound machine scanning a weld joint. The data from the ultrasound machine will be transformed by the short time Fourier transform in order to create an image for the CNN. Since the data from the ultrasound is not complete, simulated data will be created and investigated as another option for training the network. The results are promising, however the lack of data makes it hard to show any concrete proof.
APA, Harvard, Vancouver, ISO, and other styles
23

BERNARDINI, MICHELE. "Machine Learning approaches in Predictive Medicine using Electronic Health Records data." Doctoral thesis, Università Politecnica delle Marche, 2021. http://hdl.handle.net/11566/289622.

Full text
Abstract:
L' approccio tradizionale in medicina per gestire le patologie può essere ridotto al concetto di “one-size-fits all”, in cui l'effetto di una cura rispecchia l'intero campione. Però, la medicina di precisione può rappresentare l'estensione e l'evoluzione della medicina tradizionale perché risulta principalmente preventiva e proattiva piuttosto che prettamente reattiva. Questa evoluzione può portare a una Sanità predittiva, personalizzata, preventiva, partecipativa e psicocognitiva. Tra tutte queste caratteristiche, la tesi si focalizza sulla medicina predittiva. Quindi, si può introdurre un nuovo emergente paradigma di Sanità, chiamato medicina di precisione predittiva (PPM), che può beneficiare da tecniche di Machine Learning (ML) e da una enorme quantità di informazioni racchiuse nelle cartelle cliniche elettroniche (EHRs). L'ecosistema sanitario della tesi, costituito dai 3 punti chiave interconnessi (PPM, EHR, ML), offre un contributo al campo dell'informatica medica proponendo metodologie di ML con lo scopo di affrontare e superare le sfide dello stato dell'arte che emergono dagli EHR dataset, come: dati eterogenei e molto numerosi, sbilanciamento tra classi, labeling sparso, ambiguità temporale, interpretabilità, capacità di generalizzazione. Le seguenti metodologie di ML sviluppate per specifici task clinici nello scenario della PM sono adatte a costituire il nucleo principale di nuovi sistemi clinici di supporto alle decisioni, utilizzabili dai medici per scopi di prevenzione, screening, diagnosi e follow-up: i) un approccio sparse-balanced Support Vector Machine con lo scopo di predire il diabete di tipo 2 (T2D), utilizzando le informazioni estratte da un nuovo EHR dataset di un medico di medicina generale; ii) un approccio Regression Forest ensemble ad alta interpretabilità con lo scopo di identificare fattori clinici non di routine nei dati EHR per determinare dove sia racchiusa la condizione di insulino-resistenza; iii) un approccio di Multiple Instance Learning boosting applicato ai dati EHR volto a predire precocemente un peggioramento dell'insulino-resistenza (basso vs alto rischio di T2D) in termini di TyG index; iv) un nuovo approccio multitasking semi-supervisionato con lo scopo di predire l'evoluzione a breve termine della patologie renale (cioè il profilo di rischio del paziente) sui dati EHR di un cluster di medici di medicina generale; v) un approccio XGBoosting con lo scopo di predire il SOFA score al quinto giorno, utilizzando solo i dati EHR del giorno di ammissione in unità di terapia intensiva (ICU). Il SOFA score descrive le complicazioni del paziente COVID-19 in ICU e aiuta i medici a creare profili di rischio dei pazienti COVID-19. La tesi ha anche contribuito alla pubblicazione di nuovi EHR datasets open access (FIMMG dataset, FIMMG_obs dataset, FIMMG_pred dataset, mFIMMG dataset).<br>Traditional approaches in medicine to manage diseases can be briefly reduced to the “one-size-fits all” concept (i.e., the effect of treatment reflects the whole sample). On the contrary, precision medicine may represent the extension and the evolution of traditional medicine because is mainly preventive and proactive rather than reactive. This evolution may lead to a predictive, personalized, preventive, participatory, and psycho-cognitive healthcare. Among all these characteristics, the predictive medicine (PM), used to forecast disease onset, diagnosis, and prognosis, is the one this thesis emphasizes. Thus, it is possible to introduce a new emerging healthcare area, named predictive precision medicine (PPM), which may benefit from a huge amount of medical information stored in Electronic Health Records (EHRs) and Machine Learning (ML) techniques. The thesis ecosystem, which consists of the previous 3 inter-connected key points (i.e., PPM, EHR, ML), contributes to the biomedical and health informatics by proposing meaningful ML methodologies to face and overcome the state-of-the-art challenges, that emerge from real-world EHR datasets, such as high-dimensional and heterogeneous data; unbalanced setting; sparse labeling; temporal ambiguity; interpretability/explainability; and generalization capability. The following ML methodologies designed from specific clinical objectives in PM scenario are suitable to constitute the main core of any novel clinical Decision Support Systems usable by physicians for prevention, screening, diagnosis, and treatment purposes: i) a sparse-balanced Support Vector Machine (SB-SVM) approach aimed to discover type 2 diabetes (T2D) using features extracted from a novel EHR dataset of a general practitioner (GP); ii) a high-interpretable ensemble Regression Forest (TyG-er) approach aimed to identify non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded; iii) a Multiple Instance Learning boosting (MIL-Boost) approach applied to EHR data aimed to early predict an insulin resistance worsening (low vs high T2D risk) in terms of TyG index; iv) a novel Semi-Supervised Multi-task Learning (SS-MTL) approach aimed to predict short-term kidney disease evolution (i.e., patient’s risk profile) on multiple GPs’ EHR data; v) A XGBoosting (XGBoost) approach aimed to predict the sequential organ failure assessment score (SOFA) score at day 5, by utilising only EHR data at the admission day in the Intensive Care Unit (ICU). The SOFA score describes the COVID-19 patient’s complications in ICU and helps clinicians to create COVID-19 patients' risk profiles. The thesis also contributed to the publication of novel publicly available EHR datasets (i.e., FIMMG dataset, FIMMG_obs dataset, FIMMG_pred dataset, mFIMMG dataset).
APA, Harvard, Vancouver, ISO, and other styles
24

LAKSHMANAN, KAYALVIZHI. "Predictive Maintenance of an External Gear Pump using Machine Learning Algorithms." Doctoral thesis, Università degli studi di Pavia, 2021. http://hdl.handle.net/11571/1447613.

Full text
Abstract:
The thesis describes a novel computational strategy of Predictive Maintenance (fault diagnosis and fault prognosis) with ML and Deep Learning applications for an FG304 series external gear pump, also known as a domino pump. Due to the unavailability of a sufficient amount of experimental data, a novel approach of generating a high-fidelity in-silico dataset via a Computational Fluid Dynamic model of the gear pump in a healthy and various faulty working conditions (e.g., clogging, radial gap variations, viscosity variations, etc.). The synthetic data generation technique is implemented by perturbing the frequency content of the time series to recreate other environmental conditions. These synthetically generated datasets are used to train the underlying ML metamodel. In addition, various types of feature extraction methods considered to extract the most discriminatory information from the data. For fault diagnosis, three types of ML classification algorithms are employed, namely Multilayer Perceptron (MLP), Support Vector Machine (SVM) and Naive Bayes algorithms. For prognosis, ML regression algorithms, such as MLP and SVM, are utilised. Hyper-parameters of the ML algorithms is optimised with a staggered approach. In addition, a real case study of fault diagnosis and fault prognosis of an external gear pump considering noisy measurements to understand the sensitivity of the employed ML algorithms by adding noise on the training dataset and test dataset. A series of numerical examples are presented, enabling us to conclude that for fault diagnosis, the use of wavelet features and a MLP algorithm can provide the best accuracy and for fault prognosis, the use of MLP algorithm provides the best prediction results.
APA, Harvard, Vancouver, ISO, and other styles
25

Zhao, Jing. "Learning Predictive Models from Electronic Health Records." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-137936.

Full text
Abstract:
The ongoing digitization of healthcare, which has been much accelerated by the widespread adoption of electronic health records, generates unprecedented amounts of clinical data in a readily computable form. This, in turn, affords great opportunities for making meaningful secondary use of clinical data in the endeavor to improve healthcare, as well as to support epidemiology and medical research. To that end, there is a need for techniques capable of effectively and efficiently analyzing large amounts of clinical data. While machine learning provides the necessary tools, learning effective predictive models from electronic health records comes with many challenges due to the complexity of the data. Electronic health records contain heterogeneous and longitudinal data that jointly provides a rich perspective of patient trajectories in the healthcare process. The diverse characteristics of the data need to be properly accounted for when learning predictive models from clinical data. However, how best to represent healthcare data for predictive modeling has been insufficiently studied. This thesis addresses several of the technical challenges involved in learning effective predictive models from electronic health records. Methods are developed to address the challenges of (i) representing heterogeneous types of data, (ii) leveraging the concept hierarchy of clinical codes, and (iii) modeling the temporality of clinical events. The proposed methods are evaluated empirically in the context of detecting adverse drug events in electronic health records. Various representations of each type of data that account for its unique characteristics are investigated and it is shown that combining multiple representations yields improved predictive performance. It is also demonstrated how the information embedded in the concept hierarchy of clinical codes can be exploited, both for creating enriched feature spaces and for decomposing the predictive task. Moreover, incorporating temporal information leads to more effective predictive models by distinguishing between event occurrences in the patient history. Both single-point representations, using pre-assigned or learned temporal weights, and multivariate time series representations are shown to be more informative than representations in which temporality is ignored. Effective methods for representing heterogeneous and longitudinal data are key for enhancing and truly enabling meaningful secondary use of electronic health records through large-scale analysis of clinical data.
APA, Harvard, Vancouver, ISO, and other styles
26

Lubbock, Alexander Lyulph Robert. "Network biology and machine learning approaches to metastasis and treatment response." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/17856.

Full text
Abstract:
Cancer causes 13% of human deaths worldwide, 90% of which involve metastasis. The reactivation of embryonic processes in epithelial cancers—and the epithelial-mesenchymal transition (EMT) in particular—results in increased cell motility and invasiveness, and is a known mechanism for initiating metastasis. The reverse process, the mesenchymal-epithelial transition (MET), is implicated in the process of cells colonising pre-metastatic niches. Understanding the relationships between EMT, MET and metastasis is therefore highly relevant to cancer research and treatment. Key challenges include deciphering the large, uncharted space of gene function, mapping the complex signalling networks involved and understanding how the EMT and MET programmes function in vivo within specific environments and disease contexts. Inference and analysis of small-scale networks from human tumour tissue samples, scored for protein expression, provides insight into pleiotropy, complex interactions and context-specific behaviour. Small sets of proteins (10–50, representative of key biological processes) are scored using quantitative antibody-based technologies (e.g. immunofluorescence) to give static expression values. A novel inference algorithm specifically for these data, Gabi, is presented, which produces signed, directed networks. On synthetic data, inferred networks often recapitulate the information flow between proteins in ground truth connectivity. Directionality predictions are highly accurate (90% correct) if the input network structure is itself accurate. The Gabi algorithm was applied to study multiple carcinomas (renal, breast, ovarian), providing novel insights into the relationships between EMT players and fundamental processes dysregulated in cancers (e.g. apoptosis, proliferation). Survival analysis on these cohorts shows further evidence for association of EMT with poor outcome. A patent-pending method is presented for stratifying response to sunitinib in metastatic renal cancer patients. The method is based on a proportional hazards model with predictive features selected automatically using regularisation (Bayesian information criterion). The final algorithm includes N-cadherin expression, a determinant of mesenchymal properties, and shows significant predictive power (p = 7.6x10-7, log-rank test). A separate method stratifies response to tamoxifen in estrogen-receptor positive, node-negative breast cancer patients using a cross-validated support vector machine (SVM). The algorithm was predictive on blind-test data (p = 4.92 x 10-6, log-rank test). Methods developed have been made available within a web application (TMA Navigator) and an R package (rTMA). TMA Navigator produces visual data summaries, networks and survival analysis for uploaded tissue microarray (TMA) scores. rTMA expands on TMA Navigator capabilities for advanced workflows within a programming environment.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhao, Xiaochuang. "Ensemble Learning Method on Machine Maintenance Data." Scholar Commons, 2015. http://scholarcommons.usf.edu/etd/6056.

Full text
Abstract:
In the industry, a lot of companies are facing the explosion of big data. With this much information stored, companies want to make sense of the data and use it to help them for better decision making, especially for future prediction. A lot of money can be saved and huge revenue can be generated with the power of big data. When building statistical learning models for prediction, companies in the industry are aiming to build models with efficiency and high accuracy. After the learning models have been developed for production, new data will be generated. With the updated data, the models have to be updated as well. Due to this nature, the model performs best today doesn’t mean it will necessarily perform the same tomorrow. Thus, it is very hard to decide which algorithm should be used to build the learning model. This paper introduces a new method that ensembles the information generated by two different classification statistical learning algorithms together as inputs for another learning model to increase the final prediction power. The dataset used in this paper is NASA’s Turbofan Engine Degradation data. There are 49 numeric features (X) and the response Y is binary with 0 indicating the engine is working properly and 1 indicating engine failure. The model’s purpose is to predict whether the engine is going to pass or fail. The dataset is divided in training set and testing set. First, training set is used twice to build support vector machine (SVM) and neural network models. Second, it used the trained SVM and neural network model taking X of the training set as input to predict Y1 and Y2. Then, it takes Y1 and Y2 as inputs to build the Penalized Logistic Regression model, which is the ensemble model here. Finally, use the testing set follow the same steps to get the final prediction result. The model accuracy is calculated using overall classification accuracy. The result shows that the ensemble model has 92% accuracy. The prediction accuracies of SVM, neural network and ensemble models are compared to prove that the ensemble model successfully captured the power of the two individual learning model.
APA, Harvard, Vancouver, ISO, and other styles
28

Knoll, Byron. "A machine learning perspective on predictive coding using PAQ8 and new applications." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/35846.

Full text
Abstract:
The goal of this thesis is to describe a state-of-the-art compression method called PAQ8 from the perspective of machine learning. We show both how PAQ8 makes use of several simple, well known machine learning models and algorithms, and how it can be improved by exchanging these components for more sophisticated models and algorithms. We also present a broad range of new applications of PAQ8 to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and lossy compression using features from the field of deep learning.
APA, Harvard, Vancouver, ISO, and other styles
29

AZEVEDO, THAIS TUYANE DE. "MACHINE LEARNING METHODS APPLIED TO PREDICTIVE MODELS OF CHURN FOR LIFE INSURANCE." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2018. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=35235@1.

Full text
Abstract:
O objetivo deste estudo foi explorar o problema de churn em seguros de vida, no sentido de prever se o cliente irá cancelar o produto nos próximos 6 meses. Atualmente, métodos de machine learning vêm se popularizando para este tipo de análise, tornando-se uma alternativa ao tradicional método de modelagem da probabilidade de cancelamento através da regressão logística. Em geral, um dos desafios encontrados neste tipo de modelagem é que a proporção de clientes que cancelam o serviço é relativamente pequena. Para isso, este estudo recorreu a técnicas de balanceamento para tratar a base naturalmente desbalanceada – técnicas de undersampling, oversampling e diferentes combinações destas duas foram utilizadas e comparadas entre si. As bases foram utilizadas para treinar modelos de Bagging, Random Forest e Boosting, e seus resultados foram comparados entre si e também aos resultados obtidos através do modelo de Regressão Logística. Observamos que a técnica SMOTE-modificado para balanceamento da base, aplicada ao modelo de Bagging, foi a combinação que apresentou melhores resultados dentre as combinações exploradas.<br>The purpose of this study is to explore the churn problem in life insurance, in the sense of predicting if the client will cancel the product in the next 6 months. Currently, machine learning methods are becoming popular in this type of analysis, turning it into an alternative to the traditional method of modeling the probability of cancellation through logistics regression. In general, one of the challenges found in this type of modelling is that the proportion of clients who cancelled the service is relatively small. For this, the study resorted to balancing techniques to treat the naturally unbalanced base – under-sampling and over-sampling techniques and different combinations of these two were used and compared among each other. The bases were used to train models of Bagging, Random Forest and Boosting, and its results were compared among each other and to the results obtained through the Logistics Regression model. We observed that the modified SMOTE technique to balance the base, applied to the Bagging model, was the combination that presented the best results among the explored combinations.
APA, Harvard, Vancouver, ISO, and other styles
30

Faraj, Dina. "Using Machine Learning for Predictive Maintenance in Modern Ground-Based Radar Systems." Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299634.

Full text
Abstract:
Military systems are often part of critical operations where unplanned downtime should be avoided at all costs. Using modern machine learning algorithms it could be possible to predict when, where, and at what time a fault is likely to occur which enables time for ordering replacement parts and scheduling maintenance. This thesis is a proof of concept study for anomaly detection in monitoring data, i.e., sensor data from a ground based radar system as an initial experiment to showcase predictive maintenance. The data in this thesis was generated by a Giraffe 4A during normal operation, i.e., no anomalous data with known failures was provided. The problem setting is originally an unsupervised machine learning problem since the data is unlabeled. Speculative binary labels are introduced (start-up state and steady state) to approximate a classification accuracy. The system is functioning correctly in both phases but the monitoring data looks differently. By showing that the two phases can be distinguished, it is possible to assume that anomalous data during break down can be detected as well.  Three different machine learning classifiers, i.e., two unsupervised classifiers, K-means clustering and isolation forest and one supervised classifier, logistic regression are evaluated on their ability to detect the start-up phase each time the system is turned on. The classifiers are evaluated graphically and based on their accuracy score. All three classifiers recognize a start up phase for at least four out of seven subsystems. By only analyzing their accuracy score it appears that logistic regression outperforms the other models. The collected results manifests the possibility to distinguish between start-up and steady state both in a supervised and unsupervised setting. To select the most suitable classifier, further experiments on larger data sets are necessary.<br>Militära system är ofta en del av kritiska operationer där oplanerade driftstopp bör undvikas till varje pris. Med hjälp av moderna maskininlärningsalgoritmer kan det vara möjligt att förutsäga när och var ett fel kommer att inträffa. Detta möjliggör tid för beställning av reservdelar och schemaläggning av underhåll. Denna uppsats är en konceptstudie för detektion av anomalier i övervakningsdata från ett markbaserat radarsystem som ett initialt experiment för att studera prediktivt underhåll. Datat som används i detta arbete kommer från en Saab Giraffe 4A radar under normal operativ drift, dvs. ingen avvikande data med kända brister tillhandahölls. Problemställningen är ursprungligen ett oövervakat maskininlärningsproblem eftersom datat saknar etiketter. Spekulativa binära etiketter introduceras (uppstart och stabil fas) för att uppskatta klassificeringsnoggrannhet. Systemet fungerar korrekt i båda faserna men övervakningsdatat ser annorlunda ut. Genom att visa att de två faserna kan urskiljas, kan man anta att avvikande data också går att detektera när fel uppstår.  Tre olika klassificeringsmetoder dvs. två oövervakade maskininlärningmodeller, K-means klustring och isolation forest samt en övervakad modell, logistisk regression utvärderas utifrån deras förmåga att upptäcka uppstartfasen varje gång systemet slås på. Metoderna utvärderas grafiskt och baserat på deras träffsäkerhet. Alla tre metoderna känner igen en startfas för minst fyra av sju delsystem. Genom att endast analysera deras noggrannhetspoäng, överträffar logistisk regression de andra modellerna. De insamlade resultaten demonstrerar möjligheten att skilja mellan uppstartfas och stabil fas, både i en övervakad och oövervakad miljö. För att välja den bästa metoden är det nödvändigt med ytterligare experiment på större datamängder.
APA, Harvard, Vancouver, ISO, and other styles
31

Jorge, Inès. "Machine-learning-based predictive maintenance for lithium-ion batteries in electric vehicles." Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAD056.

Full text
Abstract:
La batterie est un élément central des véhicules électriques, soumis à de nombreux enjeux en termes de performances, sécurité et coût. La durée de vie des batteries en particulier fait l’objet d’une grande attention, car elle doit s’aligner avec la durée de vie d’un véhicule. Dans ce contexte, la maintenance prévisionnelle vise à prédire de manière fiable la durée de vie utile restante (RUL) et l’évolution de l’état de santé (SOH) d'une batterie Lithium-Ion (Li-Ion) en utilisant les données d'utilisation passées et présentes, de manière à anticiper les opérations de maintenance. L’objectif de cette thèse est de tirer profit de l’information contenue dans les séries temporelles de courant, tension et température via des algorithmes d’apprentissage automatique. Plusieurs modèles prédictifs ont étés développés à partir de jeux de données publics, afin de prédire le RUL d’une batterie ou l’évolution de son SOH à plus ou moins long terme<br>The battery is a central component of electric vehicles, and is subject to numerous challenges in terms of performance, safety and cost. The life of batteries in particular is the subject of a great deal of attention, as it needs to be aligned with the life of a vehicle. In this context, predictive maintenance aims to reliably predict the remaining useful life (RUL) and the evolution of the state of health (SOH) of a Lithium-Ion (Li-Ion) battery using past and present operating data, so as to anticipate maintenance operations. The objective of this thesis is to take advantage of the information contained in the time series of current, voltage and temperature via machine learning algorithms. Several predictive models have been developed from public datasets, in order to predict the RUL of a battery or the evolution of its SOH in the more or less long term
APA, Harvard, Vancouver, ISO, and other styles
32

Heidkamp, William. "Predicting the concentration of residual methanol in industrial formalin using machine learning." Thesis, Karlstads universitet, Institutionen för ingenjörsvetenskap och fysik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-46997.

Full text
Abstract:
In this thesis, a machine learning approach was used to develop a predictive model for residual methanol concentration in industrial formalin produced at the Akzo Nobel factory in Kristinehamn, Sweden. The MATLABTM computational environment supplemented with the Statistics and Machine LearningTM toolbox from the MathWorks were used to test various machine learning algorithms on the formalin production data from Akzo Nobel. As a result, the Gaussian Process Regression algorithm was found to provide the best results and was used to create the predictive model. The model was compiled to a stand-alone application with a graphical user interface using the MATLAB CompilerTM.
APA, Harvard, Vancouver, ISO, and other styles
33

Karunaratne, Thashmee M. "Learning predictive models from graph data using pattern mining." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-100713.

Full text
Abstract:
Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs.
APA, Harvard, Vancouver, ISO, and other styles
34

Shaikhina, Torgyn. "Machine learning with limited information : risk stratification and predictive modelling for clinical applications." Thesis, University of Warwick, 2017. http://wrap.warwick.ac.uk/99640/.

Full text
Abstract:
The high cost, complexity and multimodality of clinical data collection restrain the datasets available for predictive modelling using machine learning (ML), thus necessitating new data-efficient approaches specifically for limited datasets. This interdisciplinary thesis focuses on clinical outcome modelling using a range of ML techniques, including artificial neural networks (NNs) and their ensembles, decision trees (DTs) and random forests (RFs), as well as classical logistic regression (LR) and Cox proportional hazards (Cox PH) models. The utility of ML for data-efficient regression, classification and survival analyses was investigated in three clinical applications, whereby exposing the common limitations inherent in patient data, such as class imbalance, incomplete samples, and, in particular, limited dataset size. The latter problem was addressed by developing a methodological framework for learning from datasets with less than 10 observations per predictor variable. A novel method of multiple runs overcame the volatility of NN and DT models due to limited training samples, while a surrogate data test allowed for regression model evaluation in the presence of noise due to limited dataset size. When applied to hard tissue engineering for predicting femoral fracture risk, the framework resulted in 98.3% accurate regression NN. The framework was used to detect early rejection in antibody- incompatible kidney transplantation, achieving 85% accurate classification DT. The third clinical task – that of predicting 10-year incidence of type 2 diabetes in the UK population – resulted in 70-85% accurate classification and survival models, whilst highlighting the challenges of learning with the limited information characteristic of routinely collected data. By discovering unintuitive patterns, supporting existing hypotheses and generating novel insight, the ML models developed in this research contributed meaningfully to clinical research and paved the way for data-efficient applications of ML in engineering and clinical practice.
APA, Harvard, Vancouver, ISO, and other styles
35

Kalmár, Marcus, and Joel Nilsson. "The art of forecasting – an analysis of predictive precision of machine learning models." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280675.

Full text
Abstract:
Forecasting is used for decision making and unreliable predictions can instill a false sense of condence. Traditional time series modelling is astatistical art form rather than a science and errors can occur due to lim-itations of human judgment. In minimizing the risk of falsely specifyinga process the practitioner can make use of machine learning models. Inan eort to nd out if there's a benet in using models that require lesshuman judgment, the machine learning models Random Forest and Neural Network have been used to model a VAR(1) time series. In addition,the classical time series models AR(1), AR(2), VAR(1) and VAR(2) havebeen used as comparative foundation. The Random Forest and NeuralNetwork are trained and ultimately the models are used to make pre-dictions evaluated by RMSE. All models yield scattered forecast resultsexcept for the Random Forest that steadily yields comparatively precisepredictions. The study shows that there is denitive benet in using Random Forests to eliminate the risk of falsely specifying a process and do infact provide better results than a correctly specied model.
APA, Harvard, Vancouver, ISO, and other styles
36

Dahlberg, Emil, Mattias Mineur, Linus Shoravi, and Holger Swartling. "Replacing Setpoint Control with Machine Learning : Model Predictive Control Using Artificial Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413003.

Full text
Abstract:
Indoor climate control is responsible for a substantial amount of the world's total energy expenditure. In a time of climate crisis where a reduction of energy consumption is crucial to avoid climate disaster, indoor climate control is a ripe target for eliminating energy waste. The conventional method of adjusting the indoor climate with the use of setpoint curves, based solely on outdoor temperature, may lead to notable inefficiencies. This project evaluates the possibility to replace this method of regulation with a system based on model predictive control (MPC) in one of Uppsala University Hospitals office buildings. A prototype of an MPC controller using Artificial Neural Networks (ANN) as its system model was developed. The system takes several data sources into account, including indoor and outdoor temperatures, radiator flowline and return temperatures, current solar radiation as well as forecast for both solar radiation and outdoor temperature. The system was not set in production but the controller's predicted values correspond well to the buildings current thermal behaviour and weather data. These theoretical results attest to the viability of using the method to regulate the indoor climate in buildings in place of setpoint curves.<br>Bibehållande av inomhusklimat står för en avsevärd del av världens totala energikonsumtion. Med dagens klimatförändringar där minskad energikonsumtion är viktigt för den hållbara utvecklingen så är inomhusklimat ett lämpligt mål för att förhindra slösad energi. Konventionell styrning av inomhusklimat använder sig av börvärdeskurvor, baserade enbart på utomhustemperatur, vilket kan leda till anmärkningsvärt energispill. Detta projekt utvärderar möjligheten att ersätta denna styrmetod med ett system baserat på model predictive control (MPC) och använda detta i en av Akademiska sjukhusets lokaler i Uppsala. En MPC styrenhet som använder Artificiella Neurala Nätverk (ANN) som sin modell utvecklades. Systemet använder sig av flera datakällor däribland inomhus- och utomhustemperatur, radiatorslingornas framlednings- och returtemperatur, rådande solinstrålning såväl som prognoser för solinstrålning och utomhustemperatur. Systemet sattes inte i produktion men dess prognos stämmer väl överens med tillgänglig väderdata och husets termiska beteende. De presenterade resultaten påvisar metoden vara ett lämpligt substitut för styrning med börvärdeskurvor.
APA, Harvard, Vancouver, ISO, and other styles
37

Mundru, Nishanth. "Predictive and prescriptive methods in operations research and machine learning : an optimization approach." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122099.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019<br>Cataloged from student-submitted PDF version of thesis.<br>Includes bibliographical references (pages 213-221).<br>The availability and prevalence of data have provided a substantial opportunity for decision makers to improve decisions and outcomes by effectively using this data. In this thesis, we propose approaches that start from data leading to high-quality decisions and predictions in various application areas. In the first chapter, we consider problems with observational data, and propose variants of machine learning (ML) algorithms that are trained by taking into account decision quality. The traditional approach to such a task has often focused on two-steps, separating the estimation task from the subsequent optimization task which uses these estimated models. Consequently, this approach can miss out on potential improvements in decision quality by considering these tasks jointly. Crucially, this leads to stronger prescriptive performance, particularly for smaller training set sizes, and improves the decision quality by 3 - 5% over other state-of-the-art methods.<br>We introduce the idea of uncertainty penalization to control the optimism of these methods which improves their performance, and propose finite-sample regret bounds. Through experiments on real and synthetic data sets, we demonstrate the value of this approach. In the second chapter, we consider observational data with decision-dependent uncertainty; in particular, we focus on problems with a finite number of possible decisions (treatments). We present our method of prescriptive trees, that prescribes the best treatment option by learning from observational data while simultaneously predicting counterfactuals. We demonstrate the effectiveness of such an approach using real data for the problem of personalized diabetes management. In the third chapter, we consider stochastic optimization problems when the sample average approximation approach is computationally expensive.<br>We introduce a novel measure, called the Prescriptive divergence which takes into account the decision quality of the scenarios, and consider scenario reduction in this context. We demonstrate the power of this optimization-based approach on various examples. In the fourth chapter, we present our work on a problem in predictive analytics where we focus on ML problems from a modern optimization perspective. For sparse shape-constrained regression problems, we propose modern optimization based algorithms that are scalable, and recover the true support with high accuracy and low false positive rates.<br>by Nishanth Mundru.<br>Ph. D.<br>Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center
APA, Harvard, Vancouver, ISO, and other styles
38

Calabrese, Francesca <1992&gt. "Integrating Machine Learning Paradigms for Predictive Maintenance in the Fourth Industrial Revolution era." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/10133/1/Tesi_CalabreseFrancesca.pdf.

Full text
Abstract:
In the last decade, manufacturing companies have been facing two significant challenges. First, digitalization imposes adopting Industry 4.0 technologies and allows creating smart, connected, self-aware, and self-predictive factories. Second, the attention on sustainability imposes to evaluate and reduce the impact of the implemented solutions from economic and social points of view. In manufacturing companies, the maintenance of physical assets assumes a critical role. Increasing the reliability and the availability of production systems leads to the minimization of systems’ downtimes; In addition, the proper system functioning avoids production wastes and potentially catastrophic accidents. Digitalization and new ICT technologies have assumed a relevant role in maintenance strategies. They allow assessing the health condition of machinery at any point in time. Moreover, they allow predicting the future behavior of machinery so that maintenance interventions can be planned, and the useful life of components can be exploited until the time instant before their fault. This dissertation provides insights on Predictive Maintenance goals and tools in Industry 4.0 and proposes a novel data acquisition, processing, sharing, and storage framework that addresses typical issues machine producers and users encounter. The research elaborates on two research questions that narrow down the potential approaches to data acquisition, processing, and analysis for fault diagnostics in evolving environments. The research activity is developed according to a research framework, where the research questions are addressed by research levers that are explored according to research topics. Each topic requires a specific set of methods and approaches; however, the overarching methodological approach presented in this dissertation includes three fundamental aspects: the maximization of the quality level of input data, the use of Machine Learning methods for data analysis, and the use of case studies deriving from both controlled environments (laboratory) and real-world instances.
APA, Harvard, Vancouver, ISO, and other styles
39

GUPTA, NITIN. "MACHINE LEARNING PREDICTIVE ANALYTIC MODEL TO REDUCE COST OF QUALITY FOR SOFTWARE PRODUCTS." Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18484.

Full text
Abstract:
In today’s world, high quality product are need of the time. The low-quality product results in the high cost. This can be explained from the quality graph below 1) Prevention cost can be define as the issue/bugs found out before the deployment/delivered to customer. This cost is initially very low but in the longer run goes up 2) Failure cost includes cost of losing customers, Root cause analysis and rectification. This cost is defiantly very huge Figure 11 : Cost of Quality Source: https://www.researchgate.net/ 5 If there can be any mechanism that can help to identify the expected issues in the prevention cost then the overall all cost of quality can be reduce as shown in below graph Figure 12 : Modified Cost of Quality Source: https://www.researchgate.net/ Electronic and Design Automation (EDA) Industry is backbone of Semiconductor Industry as it provide software tool aiding in the development of Semi-Conductors chips. EDA tools are from specification to the foundry input. Below figure shows mapping of Chip design verification and currently available tools technologies Modified prevention cost Modified TCQ 6 Figure 13 : Tools offered by EDA Industry Sourced: https://en.wikipedia.org/wiki/Electronic_design_automation Term tape out means the chip out of foundry and ready for use in electronic circuit. Re- spin means incident post Tape-out chips does not function as required and re-build is required. Cost of the tape out is minimum 5 million of dollars. Major re-spin reason is functionality issues, therefore function verification tools delivered by EDA needs to be always of high quality. A major problem faced by the Functional verification tool R&amp;D team is to predict the numbers of the bugs that might have been introduced during the design phase to sign off the completeness and quality. If these bugs can be predicted, then the COQ can be reduced. Hence saving million of dollar to company and customer. Machine learning, a upcoming new discipline, define scientific study of algorithm and using computing power develop prediction model so that certainty of the task can be managed. In this project, prediction model for expected bugs during the development of the software is designed to help the Product manager to get confidence on quality. For the data, explanatory research and Interview was conducted with-in the Synopsys. This project has been successfully adopted with-in the Verification IP group of EDA leader and is in process to get it implemented in all different Business Units.
APA, Harvard, Vancouver, ISO, and other styles
40

Etminan, Ali. "Prediction of Lead Conversion With Imbalanced Data : A method based on Predictive Lead Scoring." Thesis, Linköpings universitet, Statistik och maskininlärning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176433.

Full text
Abstract:
An ongoing challenge for most businesses is to filter out potential customers from their audience. This thesis proposes a method that takes advantage of user data to classify po- tential customers from random visitors to a website. The method is based on the Predictive Lead Scoring method that segments customers based on their likelihood of purchasing a product. Our method, however, aims to predict user conversion, that is predicting whether a user has the potential to become a customer or not. Six supervised machine learning models have been used to carry out the classifica- tion task. To account for the high imbalance in the input data, multiple resampling meth- ods have been applied to the training data. The combination of classifier and resampling method with the highest average precision score has been selected as the best model. In addition, this thesis tries to quantify the effect of feature weights by evaluating some feature ranking and weighting schemes. Using the schemes, several sets of weights have been produced and evaluated by training a KNN classifier on the weighted features. The change in average precision obtained from the original KNN (without weighting) is used as the reference for measuring the performance of ranking and weighting schemes.
APA, Harvard, Vancouver, ISO, and other styles
41

Evans, Daniel T. "A SNP Microarray Analysis Pipeline Using Machine Learning Techniques." Ohio University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Legkiy, Yaroslav, Ярослав Ярославович Легкий, Bohdan Dubchak, and Богдан Мстиславович Дубчак. "Machine learning &artificial intelligence in aerospace industry." Thesis, National Aviation University, 2021. https://er.nau.edu.ua/handle/NAU/50479.

Full text
Abstract:
Machine Learning & Artificial Intelligence in Aerospace Industry URL: https://www.axiscades.com/blog-resources/whitepaper/Aerospacewhitepaper.pdf<br>It depicts the artіficіal іntelligence as a signіficant role іn cutting costs, reducing the design cycle tіme, simulation, prototyping, optimizatіon, maіntenance, manufacturіng and updatіng products.<br>Визначено штучний інтелект як значущу роль у скороченні витрат, зменшенні часу циклу проектування, моделюванні, прототипуванні, оптимізації, технічному обслуговуванні, виробництві та модернізації продукції.
APA, Harvard, Vancouver, ISO, and other styles
43

Prytz, Rune. "Machine learning methods for vehicle predictive maintenance using off-board and on-board data." Licentiate thesis, Högskolan i Halmstad, CAISR Centrum för tillämpade intelligenta system (IS-lab), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-27869.

Full text
Abstract:
Vehicle uptime is getting increasingly important as the transport solutions become more complex and the transport industry seeks new ways of being competitive. Traditional Fleet Management Systems are gradually extended with new features to improve reliability, such as better maintenance planning. Typical diagnostic and predictive maintenance methods require extensive experimentation and modelling during development. This is unfeasible if the complete vehicle is addressed as it would require too much engineering resources. This thesis investigates unsupervised and supervised methods for predicting vehicle maintenance. The methods are data driven and use extensive amounts of data, either streamed, on-board data or historic and aggregated data from off-board databases. The methods rely on a telematics gateway that enables vehicles to communicate with a back-office system. Data representations, either aggregations or models, are sent wirelessly to an off-board system which analyses the data for deviations. These are later associated to the repair history and form a knowledge base that can be used to predict upcoming failures on other vehicles that show the same deviations. The thesis further investigates different ways of doing data representations and deviation detection. The first one presented, COSMO, is an unsupervised and self-organised approach demonstrated on a fleet of city buses. It automatically comes up with the most interesting on-board data representations and uses a consensus based approach to isolate the deviating vehicle. The second approach outlined is a super-vised classification based on earlier collected and aggregated vehicle statistics in which the repair history is used to label the usage statistics. A classifier is trained to learn patterns in the usage data that precede specific repairs and thus can be used to predict vehicle maintenance. This method is demonstrated for failures of the vehicle air compressor and based on AB Volvo’s database of vehicle usage statistics.
APA, Harvard, Vancouver, ISO, and other styles
44

Tout, Hicham Refaat. "Measuring the Impact of email Headers on the Predictive Accuracy of Machine Learning Techniques." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/325.

Full text
Abstract:
The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning algorithms to detect phishing attack did increase the predictive accuracy of these learning algorithms. The same research also recommended further investigation of the impact of including an expanded set of email headers on the predictive accuracy of machine learning algorithms. In addition, research has shown that the cost of misclassifying legitimate emails as phishing attacks--false positives--was far higher than that of misclassifying phishing emails as legitimate--false negatives, while the opposite was true in the case of fraud detection. Consequently, they recommended that cost sensitive measures be taken in order to further improve the weighted predictive accuracy of machine learning algorithms. Motivated by the potentially high impact of the inclusion of email headers on the predictive accuracy of machines learning algorithms and the significance of enabling cost-sensitive measures as part of the learning process, the goal of this research was to quantify the impact of including an extended set of email headers and to investigate the impact of imposing penalty as part of the learning process on the number of false positives. It was believed that if email headers were included and cost-sensitive measures were taken as part of the learning process, than the overall weighted, predictive accuracy of the machine learning algorithm would be improved. The results showed that adding email headers as features did improve the overall predictive accuracy of machine learning algorithms and that cost-sensitive measure taken as part of the learning process did result in lower false positives.
APA, Harvard, Vancouver, ISO, and other styles
45

Hossain, Md Ekramul. "Predictive Modelling of the Comorbidity of Chronic Diseases: A Network and Machine Learning Approach." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/24229.

Full text
Abstract:
Chronic diseases have become increasingly common and caused most of the burden of ill health in many countries. They are associated with adverse health outcomes in terms of mobility and quality of life, as well as an increased financial burden. Chronic diseases pose several health risks for patients suffering from more than one chronic disease (also known as comorbidity of chronic diseases). The prevalence of chronic disease comorbidity has increased globally. Understanding the progression of comorbidities and predicting the risks can provide valuable insights into the prevention and better management of chronic diseases. The availability of administrative datasets provides an opportunity to apply a predictive model to improve the healthcare system. Most studies in this field focus on understanding the progression of one chronic disease rather than multiple chronic diseases. Analysis of administrative data using a network approach and machine learning techniques can help predict the risk of comorbidity of chronic diseases. In this thesis, we propose a risk prediction model using administrative data that uses network-based features and machine learning techniques to assess the risk of chronic disease comorbidities. This study has two broad goals: (1) to understand and represent the progression of comorbidity of chronic diseases, and (2) to develop a risk prediction model based on the disease progression to predict the comorbidity of chronic diseases for chronic disease patients. Specifically, it focuses on the comorbidity progression of CVD in patients with T2D, as a high proportion of older adults with T2D often develop CVD. We used administrative data and network analytics to implement the first part of this study, and we used machine learning techniques for the second part. For this, two cohorts (i.e. patients with both T2D and CVD and patients with only T2D) were identified from an administrative dataset collected from private healthcare funds based in Australia. Two baseline disease networks were generated from the two study cohorts. A final disease network was then generated from two baseline disease networks through normalisation. We extracted some social network-based features (i.e. the prevalence of comorbidities, transition patterns and clustering membership) from the final disease network and some demographic characteristics directly from the dataset. These risk factors were then used to develop six machine learning prediction models (logistic regression, support vector machine, decision tree, random forest, Naïve Bayes and k-nearest neighbour) to assess the risk of CVD in patients with T2D. The results showed that the prevalence of renal failure, fluid and electrolyte disorders, hypertension, and obesity was significantly higher in patients with both CVD and T2D than in patients with only T2D. This indicated that these chronic diseases occurred frequently during the progression of CVD in patients with T2D. This study measured performance in terms of accuracy, precision, recall, F1 score and area under the curve (AUC). The model based on random forest showed the highest accuracy (87.50%) and AUC of 0.83. Overall, the accuracy of the classifiers ranged from 79% to 88%, which shows the potential of the network-based and machine learning–based risk prediction model using administrative data. The proposed model may help healthcare providers to understand high-risk chronic diseases and the progression patterns between the recurrence of multiple chronic diseases. Further, the comorbid risk prediction model could be useful for medical practice and stakeholders (including government and private health insurers) to develop health management programs for patients at high risk of developing multiple chronic diseases.
APA, Harvard, Vancouver, ISO, and other styles
46

Boonen, Dries. "The impact of bias on the predictive value of EHR driven machine learning models." Thesis, Högskolan i Halmstad, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-39960.

Full text
Abstract:
The  rapid  digitization  in  the  health  care  sector  leads  to  an  increaseof  data.  This  routine  collected  data  in  the  form  of  electronic  healthrecords (EHR) is not only used by medical professionals but also hasa  secondary  purpose:  health  care  research.  It  can  be  opportune  touse this EHR data for predictive modeling in order to support medi-cal professionals in their decisions. However, using routine collecteddata  (RCD)  often  comes  with  subtle  biases  that  might  risk  efficientlearning of predictive models. In this thesis the effects of RCD on theprediction performance are reviewed.In particular we thoroughly investigate and reason if the performanceof  particular  prediction  models  is  consistent  over  a  range  of  hand-crafted sub-populations within the data.Evidence  is  presented  that  the  overall  prediction  score  of  the  algo-rithms trained by EHR significantly differ for some groups of patientsin  the  data.  A  method  is  presented  to  give  more  insight  why  thesegroups of patients have different scores.
APA, Harvard, Vancouver, ISO, and other styles
47

Staberg, Pontus, Emil Häglund, and Jakob Claesson. "Injury Prediction in Elite Ice Hockey using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235959.

Full text
Abstract:
Sport clubs are always searching for innovative ways to improve performance and obtain a competitive edge. Sports analytics today is focused primarily on evaluating metrics thought to be directly tied to performance. Injuries indirectly decrease performance and cost substantially in terms of wasted salaries. Existing sports injury research mainly focuses on correlating one specific feature at a time to the risk of injury. This paper provides a multidimensional approach to non-contact injury prediction in Swedish professional ice hockey by applying machine learning on historical data. Several features are correlated simultaneously to injury probability. The project’s aim is to create an injury predicting algorithm which ranks the different features based on how they affect the risk of injury. The paper also discusses the business potential and strategy of a start-up aiming to provide a solution for predicting injury risk through statistical analysis.<br>Idrottsklubbar letar ständigt efter innovativa sätt att förbättra prestation och erhålla konkurrensfördelar. Idag fokuserar data- analys inom idrott främst på att utvärdera mätvärden som tros vara direkt korrelerade med prestation. Skador sänker indirekt prestationen och kostar markant i bortslösade spelarlöner. Tidigare studier på skador inom idrotten fokuserar huvudsakligen på att korrelera ett mätvärde till en skada i taget. Den här rapporten ger ett multidimensionellt angreppssätt till att förutse skador inom svensk elitishockey genom att applicera maskininlärning på historisk data. Flera attribut korreleras samtidigt för att få fram en skadesannolikhet. Målet med den här rapporten är att skapa en algoritm för att förutse skador och även ranka olika attribut baserat på hur de påverkar skaderisken. I rapporten diskuteras även affärsmöjligheterna för en sådan lösning och hur en potentiell start-up ska positionera sig på marknaden.
APA, Harvard, Vancouver, ISO, and other styles
48

König, Rikard. "Enhancing genetic programming for predictive modeling." Doctoral thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-3689.

Full text
Abstract:
<p>Avhandling för teknologie doktorsexamen i datavetenskap, som kommer att försvaras offentligt tisdagen den 11 mars 2014 kl. 13.15, M404, Högskolan i Borås. Opponent: docent Niklas Lavesson, Blekinge Tekniska Högskola, Karlskrona.</p>
APA, Harvard, Vancouver, ISO, and other styles
49

Borg, Anton. "On Descriptive and Predictive Models for Serial Crime Analysis." Doctoral thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-00597.

Full text
Abstract:
Law enforcement agencies regularly collect crime scene information. There exists, however, no detailed, systematic procedure for this. The data collected is affected by the experience or current condition of law enforcement officers. Consequently, the data collected might differ vastly between crime scenes. This is especially problematic when investigating volume crimes. Law enforcement officers regularly do manual comparison on crimes based on the collected data. This is a time-consuming process; especially as the collected crime scene information might not always be comparable. The structuring of data and introduction of automatic comparison systems could benefit the investigation process. This thesis investigates descriptive and predictive models for automatic comparison of crime scene data with the purpose of aiding law enforcement investigations. The thesis first investigates predictive and descriptive methods, with a focus on data structuring, comparison, and evaluation of methods. The knowledge is then applied to the domain of crime scene analysis, with a focus on detecting serial residential burglaries. This thesis introduces a procedure for systematic collection of crime scene information. The thesis also investigates impact and relationship between crime scene characteristics and how to evaluate the descriptive model results. The results suggest that the use of descriptive and predictive models can provide feedback for crime scene analysis that allows a more effective use of law enforcement resources. Using descriptive models based on crime characteristics, including Modus Operandi, allows law enforcement agents to filter cases intelligently. Further, by estimating the link probability between cases, law enforcement agents can focus on cases with higher link likelihood. This would allow a more effective use of law enforcement resources, potentially allowing an increase in clear-up rates.
APA, Harvard, Vancouver, ISO, and other styles
50

Flannery, Nicholas Martin. "Investigating the Convergent, Discriminant, and Predictive Validity of the Mental Toughness Situational Judgment Test." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99062.

Full text
Abstract:
This study investigated the validity of scores of a workplace-based measure of mental toughness, the Mental Toughness Situational Judgment Test (MTSJT). The goal of the study was to determine if MTSJT scores predicted supervisor ratings 1) differentially compared to other measures of mental toughness, grit, and resilience, and 2) incrementally beyond cognitive ability and conscientiousness. Further, two machine learning algorithms – elastic nets and random forests – were used to model predictions at both the item and scale level. MTJST scores provided the most accurate predictions overall when model at the item level via a random forest approach. The MTSJT was the only measure to consistently provide incremental validity when predicting supervisor ratings. The results further emphasize the growing importance of both mental toughness and machine learning algorithms to industrial/organizational psychologists.<br>Doctor of Philosophy<br>The study investigated whether the Mental Toughness Situational Judgment Test (MTSJT)– a measure of mental toughness directly in the workplace, could predict employees' supervisor ratings. Further, the study aimed to understand if the MTSJT was a better predictor than other measures of mental toughness, grit, resilience, intelligence, and conscientiousness. The study used machine learning algorithms to generate predictive models using both question-level scores and scale-level scores. The results suggested that the MTSJT scores predicted supervisor ratings at both the question and scale level using a random forest model. Further, the MTJST was a better predictor than most other measures included in the study. The results emphasize the growing importance of both mental toughness and machine learning algorithms to industrial/organizational psychologists.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!