Log in

Relevant bibliographies by topics / Conformal predictions / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Conformal predictions.

Dissertations / Theses on the topic 'Conformal predictions'

Author: Grafiati

Published: 4 June 2021

Last updated: 5 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 26 dissertations / theses for your research on the topic 'Conformal predictions.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Carrión, Brännström Robin. "Aggregating predictions using Non-Disclosed Conformal Prediction." Thesis, Uppsala universitet, Statistiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385098.

Full text

Abstract:

When data are stored in different locations and pooling of such data is not allowed, there is an informational loss when doing predictive modeling. In this thesis, a new method called Non-Disclosed Conformal Prediction (NDCP) is adapted into a regression setting, such that predictions and prediction intervals can be aggregated from different data sources without interchanging any data. The method is built upon the Conformal Prediction framework, which produces predictions with confidence measures on top of any machine learning method. The method is evaluated on regression benchmark data sets using Support Vector Regression, with different sizes and settings for the data sources, to simulate real life scenarios. The results show that the method produces conservatively valid prediction intervals even though in some settings, the individual data sources do not manage to create valid intervals. NDCP also creates more stable intervals than the individual data sources. Thanks to its straightforward implementation, data owners which cannot share data but would like to contribute to predictive modeling, would benefit from using this method.

APA, Harvard, Vancouver, ISO, and other styles

2

van, Miltenburg Jelle. "Conformal survival predictions at a user-controlled time point : The introduction of time point specialized Conformal Random Survival Forests." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232060.

Full text

Abstract:

The goal of this research is to expand the field of conformal predictions using Random Survival Forests. The standard Conformal Random Survival Forest can predict with a fixed certainty whether something will survive up until a certain time point. This research is the first to show that there is little practical use in the standard Conformal Random Survival Forest algorithm. It turns out that the confidence guarantees of the conformal prediction framework are violated if the Standard algorithm makes predictions for a user-controlled fixed time point. To solve this challenge, this thesis proposes two algorithms that specialize in conformal predictions for a fixed point in time: a Fixed Time algorithm and a Hybrid algorithm. Both algorithms transform the survival data that is used by the split evaluation metric in the Random Survival Forest algorithm. The algorithms are evaluated and compared along six different set prediction evaluation criteria. The prediction performance of the Hybrid algorithm outperforms the prediction performance of the Fixed Time algorithm in most cases. Furthermore, the Hybrid algorithm is more stable than the Fixed Time algorithm when the predicting job extends to various time points. The hybrid Conformal Random Survival Forest should thus be considered by anyone who wants to make conformal survival predictions at usercontrolled time points.<br>Målet med denna avhandling är att utöka området för konformitetsprediktion med hjälp av Random Survival Forests. Standardutförandet av Conformal Random Survival Forest kan förutsäga med en viss säkerhet om någonting kommer att överleva fram till en viss tidpunkt. Denna avhandling är den första som visar att det finns liten praktisk användning i standardutförandet av Conformal Random Survival Forest-algoritmen. Det visar sig att konfidensgarantierna för konformitetsprediktionsramverket bryts om standardalgoritmen gör förutsägelser för en användarstyrd fast tidpunkt. För att lösa denna utmaning, föreslår denna avhandling två algoritmer som specialiserar sig i konformitetsprediktion för en bestämd tidpunkt: en fast-tids algoritm och en hybridalgoritm. Båda algoritmerna omvandlar den överlevnadsdata som används av den delade utvärderingsmetoden i Random Survival Forest-algoritmen. Uppskattningsförmågan för hybridalgoritmen överträffar den för fast-tids algoritmen i de flesta fall. Dessutom är hybrid algoritmen stabilare än fast-tids algoritmen när det förutsägelsejobbet sträcker sig till olika tidpunkter. Hybridalgoritmen för Conformal Random Survival Forest bör därför föredras av den som vill göra konformitetsprediktion av överlevnad vid användarstyrda tidpunkter.

APA, Harvard, Vancouver, ISO, and other styles

3

Ivina, Olga. "Conformal prediction of air pollution concentrations for the Barcelona Metropolitan Region." Doctoral thesis, Universitat de Girona, 2012. http://hdl.handle.net/10803/108341.

Full text

Abstract:

This thesis is aimed to introduce a newly developed machine learning method, conformal predictors, for air pollution assessment. For the given area of study, the Barcelona Metropolitan Region (BMR), several conformal prediction models have been developed. These models use the specification which is called ridge regression confidence machine (RRCM). The conformal predictors that have been developed for the purposes of the present study are ridge regression models, and they always provide valid predictions. Instead of a point prediction, a conformal predictor outputs a prediction set, which is usually an interval. It is desired that these sets would be as small as possible. The underlying algorithm for the conformal predictors derived in this thesis is ordinary kriging. A kriging-based conformal predictor can capture spatial distribution of the data with the use of so-called "kernel trick"<br>Aquest treball està destinat a introduir el nou mètode de les màquines d'aprenentatge, els predictors de conformació, per l'avaluació de la contaminació de l'aire a la Regió Metropolitana de Barcelona (RMB). Es fa servir l'especificació anomenada màquina de confiança de la regressió cresta (RRCM). Els predictors de conformació que s'han desenvolupat per les finalitats d'aquest estudi són uns models de regressió cresta, que sempre ofereixen prediccions vàlides. Un predictor de conformació genera un conjunt de predicció, que és gairebé sempre un interval, i la intenció és que sigui el més petit possible. L'algorisme subjacent dels predictors de conformació derivats i discutits al llarg d'aquesta tesi és el kriging. El predictor de conformació basat en el kriging ordinari pot capturar la distribució espacial mitjançant una tècnica que es diu "el truc del nucli" ("kernel trick")

APA, Harvard, Vancouver, ISO, and other styles

4

Laxhammar, Rikard. "Conformal anomaly detection : Detecting abnormal trajectories in surveillance applications." Doctoral thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-8762.

Full text

Abstract:

Human operators of modern surveillance systems are confronted with an increasing amount of trajectory data from moving objects, such as people, vehicles, vessels, and aircraft. A large majority of these trajectories reflect routine traffic and are uninteresting. Nevertheless, some objects are engaged in dangerous, illegal or otherwise interesting activities, which may manifest themselves as unusual and abnormal trajectories. These anomalous trajectories can be difficult to detect by human operators due to cognitive limitations. In this thesis, we study algorithms for the automated detection of anomalous trajectories in surveillance applications. The main results and contributions of the thesis are two-fold. Firstly, we propose and discuss a novel approach for anomaly detection, called conformal anomaly detection, which is based on conformal prediction (Vovk et al.). In particular, we propose two general algorithms for anomaly detection: the conformal anomaly detector (CAD) and the computationally more efficient inductive conformal anomaly detector (ICAD). A key property of conformal anomaly detection, in contrast to previous methods, is that it provides a well-founded approach for the tuning of the anomaly threshold that can be directly related to the expected or desired alarm rate. Secondly, we propose and analyse two parameter-light algorithms for unsupervised online learning and sequential detection of anomalous trajectories based on CAD and ICAD: the sequential Hausdorff nearest neighbours conformal anomaly detector (SHNN-CAD) and the sequential sub-trajectory local outlier inductive conformal anomaly detector (SSTLO-ICAD), which is more sensitive to local anomalous sub-trajectories. We implement the proposed algorithms and investigate their classification performance on a number of real and synthetic datasets from the video and maritime surveillance domains. The results show that SHNN-CAD achieves competitive classification performance with minimum parameter tuning on video trajectories. Moreover, we demonstrate that SSTLO-ICAD is able to accurately discriminate realistic anomalous vessel trajectories from normal background traffic.

APA, Harvard, Vancouver, ISO, and other styles

5

Löfström, Tuwe. "On Effectively Creating Ensembles of Classifiers : Studies on Creation Strategies, Diversity and Predicting with Confidence." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-116683.

Full text

Abstract:

An ensemble is a composite model, combining the predictions from several other models. Ensembles are known to be more accurate than single models. Diversity has been identified as an important factor in explaining the success of ensembles. In the context of classification, diversity has not been well defined, and several heuristic diversity measures have been proposed. The focus of this thesis is on how to create effective ensembles in the context of classification. Even though several effective ensemble algorithms have been proposed, there are still several open questions regarding the role diversity plays when creating an effective ensemble. Open questions relating to creating effective ensembles that are addressed include: what to optimize when trying to find an ensemble using a subset of models used by the original ensemble that is more effective than the original ensemble; how effective is it to search for such a sub-ensemble; how should the neural networks used in an ensemble be trained for the ensemble to be effective? The contributions of the thesis include several studies evaluating different ways to optimize which sub-ensemble would be most effective, including a novel approach using combinations of performance and diversity measures. The contributions of the initial studies presented in the thesis eventually resulted in an investigation of the underlying assumption motivating the search for more effective sub-ensembles. The evaluation concluded that even if several more effective sub-ensembles exist, it may not be possible to identify which sub-ensembles would be the most effective using any of the evaluated optimization measures. An investigation of the most effective ways to train neural networks to be used in ensembles was also performed. The conclusions are that effective ensembles can be obtained by training neural networks in a number of different ways but that high average individual accuracy or much diversity both would generate effective ensembles. Several findings regarding diversity and effective ensembles presented in the literature in recent years are also discussed and related to the results of the included studies. When creating confidence based predictors using conformal prediction, there are several open questions regarding how data should be utilized effectively when using ensembles. Open questions related to predicting with confidence that are addressed include: how can data be utilized effectively to achieve more efficient confidence based predictions using ensembles; how do problems with class imbalance affect the confidence based predictions when using conformal prediction? Contributions include two studies where it is shown in the first that the use of out-of-bag estimates when using bagging ensembles results in more effective conformal predictors and it is shown in the second that a conformal predictor conditioned on the class labels to avoid a strong bias towards the majority class is more effective on problems with class imbalance. The research method used is mainly inspired by the design science paradigm, which is manifested by the development and evaluation of artifacts.<br>En ensemble är en sammansatt modell som kombinerar prediktionerna från flera olika modeller. Det är välkänt att ensembler är mer träffsäkra än enskilda modeller. Diversitet har identifierats som en viktig faktor för att förklara varför ensembler är så framgångsrika. Diversitet hade fram tills nyligen inte definierats entydigt för klassificering vilket resulterade i att många heuristiska diverstitetsmått har föreslagits. Den här avhandlingen fokuserar på hur klassificeringsensembler kan skapas på ett ändamålsenligt (eng. effective) sätt. Den vetenskapliga metoden är huvudsakligen inspirerad av design science-paradigmet vilket lämpar sig väl för utveckling och evaluering av IT-artefakter. Det finns sedan tidigare många framgångsrika ensembleralgoritmer men trots det så finns det fortfarande vissa frågetecken kring vilken roll diversitet spelar vid skapande av välpresterande (eng. effective) ensemblemodeller. Några av de frågor som berör diversitet som behandlas i avhandlingen inkluderar: Vad skall optimeras när man söker efter en delmängd av de tillgängliga modellerna för att försöka skapa en ensemble som är bättre än ensemblen bestående av samtliga modeller; Hur väl fungerar strategin att söka efter sådana delensembler; Hur skall neurala nätverk tränas för att fungera så bra som möjligt i en ensemble? Bidraget i avhandlingen inkluderar flera studier som utvärderar flera olika sätt att finna delensembler som är bättre än att använda hela ensemblen, inklusive ett nytt tillvägagångssätt som utnyttjar en kombination av både diversitets- och prestandamått. Resultaten i de första studierna ledde fram till att det underliggande antagandet som motiverar att söka efter delensembler undersöktes. Slutsatsen blev, trots att det fanns flera delensembler som var bättre än hela ensemblen, att det inte fanns något sätt att identifiera med tillgänglig data vilka de bättre delensemblerna var. Vidare undersöktes hur neurala nätverk bör tränas för att tillsammans samverka så väl som möjligt när de används i en ensemble. Slutsatserna från den undersökningen är att det är möjligt att skapa välpresterande ensembler både genom att ha många modeller som är antingen bra i genomsnitt eller olika varandra (dvs diversa). Insikter som har presenterats i litteraturen under de senaste åren diskuteras och relateras till resultaten i de inkluderade studierna. När man skapar konfidensbaserade modeller med hjälp av ett ramverk som kallas för conformal prediction så finns det flera frågor kring hur data bör utnyttjas på bästa sätt när man använder ensembler som behöver belysas. De frågor som relaterar till konfidensbaserad predicering inkluderar: Hur kan data utnyttjas på bästa sätt för att åstadkomma mer effektiva konfidensbaserade prediktioner med ensembler; Hur påverkar obalanserad datade konfidensbaserade prediktionerna när man använder conformal perdiction? Bidragen inkluderar två studier där resultaten i den första visar att det mest effektiva sättet att använda data när man har en baggingensemble är att använda sk out-of-bag estimeringar. Resultaten i den andra studien visar att obalanserad data behöver hanteras med hjälp av en klassvillkorad konfidensbaserad modell för att undvika en stark tendens att favorisera majoritetsklassen.<br><p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: In press.</p><br>Dataanalys för detektion av läkemedelseffekter (DADEL)

APA, Harvard, Vancouver, ISO, and other styles

6

Chavarria, Pablo C. "Reduction of Confidence Interval Length for Small-Normal Data Sets Utilizing Bootstrap and Conformal Prediction Methods." Thesis, California State University, Long Beach, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10840988.

Full text

Abstract:

<p> It is of common practice to evoke a t-confidence interval for estimating the mean of a small data set with an assumed Normal distribution. These t-intervals are known to be wide to account for the lack of information. This thesis will focus on exploring ways to reduce the length of the interval, while preserving the level of confidence. Simulated small normal data sets will be used to analyze a combination of Bootstrapping and Conformal Prediction methods, while investigating measures of spread, such as standard deviation, kurtosis, excess CS kurtosis, skewness, etc. to create a criterion for when this combination of methodologies will greatly reduce the interval length. The goal is to be able to use the insight simulated data have to offer in order to apply to real world data. If time permits, a further look into the theory behind the results will be explored.</p><p>

APA, Harvard, Vancouver, ISO, and other styles

7

Öhrn, Håkan. "General image classifier for fluorescence microscopy using transfer learning." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-388633.

Full text

Abstract:

Modern microscopy and automation technologies enable experiments which can produce millions of images each day. The valuable information is often sparse, and requires clever methods to find useful data. In this thesis a general image classification tool for fluorescence microscopy images was developed usingfeatures extracted from a general Convolutional Neural Network (CNN) trained on natural images. The user selects interesting regions in a microscopy image and then, through an iterative process, using active learning, continually builds a training data set to train a classifier that finds similar regions in other images. The classifier uses conformal prediction to find samples that, if labeled, would most improve the learned model as well as specifying the frequency of errors the classifier commits. The result show that with the appropriate choice of significance one can reach a high confidence in true positive. The active learning approach increased the precision with a downside of finding fewer examples.

APA, Harvard, Vancouver, ISO, and other styles

8

Geylan, Gökçe. "Training Machine Learning-based QSAR models with Conformal Prediction on Experimental Data from DNA-Encoded Chemical Libraries." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447354.

Full text

Abstract:

DNA-encoded chemical libraries (DEL) allows an exhaustive chemical space sampling with a large-scale data consisting of compounds produced through combinatorial synthesis. This novel technology was utilized in the early drug discovery stages for robust hit identification and lead optimization. In this project, the aim was to build a Machine Learning- based QSAR model with conformal prediction for hit identification on two different target proteins, the DEL was assayed on. An initial investigation was conducted on a pilot project with 1000 compounds and the analyses and the conclusions drawn from this part were later applied to a larger dataset with 1.2 million compounds. With this classification model, the prediction of the compound activity in the DEL as well as in an external dataset was aimed to be analyzed with identification of the top hits to evaluate model’s performance and applicability. Support Vector Machine (SVM) and Random Forest (RF) models were built on both the pilot and the main datasets with different descriptor sets of Signature Fingerprints, RDKIT and CDK. In addition, an Autoencoder was used to supply data-driven descriptors on the pilot data as well. The Libsvm and the Liblinear implementations were explored and compared based on the models’ performances. The comparisons were made by considering the key concepts of conformal prediction such as the trade-off between validity and efficiency, observed fuzziness and the calibration against a range of significance levels. The top hits were determined by two sorting methods, credibility and p-value differences between the binary classes. The assignment of correct single-labels to the true actives over a wide range of significance levels regardless of the similarity of the test compounds to the training set was confirmed for the models. Furthermore, an accumulation of these true actives in the models’ top hit selections was observed according to the latter sorting method and additional investigations on the similarity and the building block enrichments in the top 50 and 100 compounds were conducted. The Tanimoto similarity demonstrated the model’s predictive power in selecting structurally dissimilar compounds while the building block enrichment analysis showed the selectivity of the binding pocket where the target protein B was determined to be more selective. All of these comparison methods enabled an extensive study on the model evaluation and performance. In conclusion, the Liblinear model with the Signature Fingerprints was concluded to give the best model performance for both the pilot and the main datasets with the considerations of the model performances and the computational power requirements. However, an external set prediction was not successful due to the low structural diversity in the DEL which the model was trained on.

APA, Harvard, Vancouver, ISO, and other styles

9

Lindh, Martin. "Computational Modelling in Drug Discovery : Application of Structure-Based Drug Design, Conformal Prediction and Evaluation of Virtual Screening." Doctoral thesis, Uppsala universitet, Avdelningen för organisk farmaceutisk kemi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-328505.

Full text

Abstract:

Structure-based drug design and virtual screening are areas of computational medicinal chemistry that use 3D models of target proteins. It is important to develop better methods in this field with the aim of increasing the speed and quality of early stage drug discovery. The first part of this thesis focuses on the application of structure-based drug design in the search for inhibitors for the protein 1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXR), one of the enzymes in the DOXP/MEP synthetic pathway. This pathway is found in many bacteria (such as Mycobacterium tuberculosis) and in the parasite Plasmodium falciparum. In order to evaluate and improve current virtual screening methods, a benchmarking data set was constructed using publically available high-throughput screening data. The exercise highlighted a number of problems with current data sets as well as with the use of publically available high-throughput screening data. We hope this work will help guide further development of well designed benchmarking data sets for virtual screening methods. Conformal prediction is a new method in the computer-aided drug design toolbox that gives the prediction range at a specified level of confidence for each compound. To demonstrate the versatility and applicability of this method we derived models of skin permeability using two different machine learning methods; random forest and support vector machines.

APA, Harvard, Vancouver, ISO, and other styles

10

Pradhan, Manoj Kumar. "Conformal Thermal Models for Optimal Loading and Elapsed Life Estimation of Power Transformers." Thesis, Indian Institute of Science, 2004. http://hdl.handle.net/2005/97.

Full text

Abstract:

Power and Generator Transformers are important and expensive elements of a power system. Inadvertent failure of Power Transformers would cause long interruption in power supply with consequent loss of reliability and revenue to the supply utilities. The mineral oil impregnated paper, OIP, is an insulation of choice in large power transformers in view of its excellent dielectric and other properties, besides being relatively inexpensive. During the normal working regime of the transformer, the insulation thereof is subjected to various stresses, the more important among them are, electrical, thermal, mechanical and chemical. Each of these stresses, appearing singly, or in combination, would lead to a time variant deterioration in the properties of insulation, called Ageing. This normal and inevitable process of degradation in the several essential properties of the insulation is irreversible, is a non-Markov physico-chemical reaction kinetic process. The speed or the rapidity of insulation deterioration is a very strong function of the magnitude of the stresses and the duration over which they acted. This is further compounded, if the stresses are in synergy. During the processes of ageing, some, or all the vital properties undergo subtle changes, more often, not in step with the duration of time over which the damage has been accumulated. Often, these changes are non monotonic, thus presenting a random or a chaotic picture and understanding the processes leading to eventual failure becomes difficult. But, there is some order in this chaos, in that, the time average of the changes over short intervals of time, seems to indicate some degree of predictability. The status of insulation at any given point in time is assessed by measuring such of those properties as are sensitive to the amount of ageing and comparing it with earlier measurements. This procedure, called the Diagnostic or nondestructive Testing, has been in vogue for some time now. Of the many parameters used as sensitive indices of the dynamics of insulation degradation, temporal changes in temperatures at different locations in the body of the transformer, more precisely, the winding hot spots (HST) and top oil temperature (TOT) are believed to give a fairly accurate indication of the rate of degradation. Further, an accurate estimation of the temperatures would enable to determine the loading limit (loadability) of power transformer. To estimate the temperature rise reasonably accurately, one has to resort to classical mathematical techniques involving formulation and solution of boundary value problem of heat conduction under carefully prescribed boundary conditions. Several complications are encountered in the development of the governing equations for the emergent heat transfer problems. The more important among them are, the inhomogeneous composition of the insulation structure and of the conductor, divergent flow patterns of the oil phase and inordinately varying thermal properties of conductor and insulation. Validation and reconfirmation of the findings of the thermal models can be made using state of the art methods, such as, Artificial Intelligence (AI) techniques, Artificial Neural Network (ANN) and Genetic Algorithm (GA). Over the years, different criteria have been prescribed for the prediction of terminal or end of life (EOL) of equipment from the standpoint of its insulation. But, thus far, no straightforward and unequivocal criterion is forth coming. Calculation of elapsed life in line with the existing methodology, given by IEEE, IEC, introduces unacceptable degrees of uncertainty. It is needless to say that, any conformal procedure proposed in the accurate prediction of EOL, has to be based on a technically feasible and economically viable consideration. A systematic study for understanding the dynamical nature of ageing in transformers in actual service is precluded for reasons very well known. Laboratory experiments on prototypes or pro-rated units fabricated based on similarity studies, are performed under controlled conditions and at accelerated stress levels to reduce experimental time. The results thereof can then be judiciously extrapolated to normal operating conditions and for full size equipment. The terms of reference of the present work are as follows; 1. Computation of TOT and HST Theoretical model based on Boundary Value Problem of Heat Conduction Application of AI Techniques 2. Experimental Investigation for estimating the Elapsed Life of transformers Based on the experimental investigation a semi-empirical expression has been developed to estimate the loss of life of power and station transformer by analyzing gas content and furfural dissolved in oil without performing off-line and destructive tests.

APA, Harvard, Vancouver, ISO, and other styles

11

Tibell, Rasmus. "Training a Multilayer Perceptron to predict the final selling price of an apartment in co-operative housing society sold in Stockholm city with features stemming from open data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-159754.

Full text

Abstract:

The need for a robust model for predicting the value of condominiums and houses are becoming more apparent as further evidence of systematic errors in existing models are presented. Traditional valuation methods fail to produce good predictions of condominium sales prices and systematic patterns in the errors linked to for example the repeat sales methodology and the hedonic pricing model have been pointed out by papers referenced in this thesis. This inability can lead to monetary problems for individuals and in worst-case economic crises for whole societies. In this master thesis paper we present how a predictive model constructed from a multilayer perceptron can predict the price of a condominium in the centre of Stockholm using objective data from sources publicly available. The value produced by the model is enriched with a predictive interval using the Inductive Conformal Prediction algorithm to give a clear view of the quality of the prediction. In addition, the Multilayer Perceptron is compared with the commonly used Support Vector Regression algorithm to underline the hallmark of neural networks handling of a broad spectrum of features. The features used to construct the Multilayer Perceptron model are gathered from multiple “Open Data” sources and includes data as: 5,990 apartment sales prices from 2011- 2013, interest rates for condominium loans from two major banks, national election results from 2010, geographic information and nineteen local features. Several well-known techniques of improving performance of Multilayer Perceptrons are applied and evaluated. A Genetic Algorithm is deployed to facilitate the process of determine appropriate parameters used by the backpropagation algorithm. Finally, we conclude that the model created as a Multilayer Perceptron using backpropagation can produce good predictions and outperforms the results from the Support Vector Regression models and the studies in the referenced papers.<br>Behovet av en robust modell för att förutsäga värdet på bostadsrättslägenheter och hus blir allt mer uppenbart alt eftersom ytterligare bevis på systematiska fel i befintliga modeller läggs fram. I artiklar refererade i denna avhandling påvisas systematiska fel i de estimat som görs av metoder som bygger på priser från repetitiv försäljning och hedoniska prismodeller. Detta tillkortakommandet kan leda till monetära problem för individer och i värsta fall ekonomisk kris för hela samhällen. I detta examensarbete påvisar vi att en prediktiv modell konstruerad utifrån en “Multilayer Perceptron” kan estimera priset på en bostadsrättslägenhet i centrala Stockholm baserad på allmänt tillgängligt data (“Öppen Data”). Modellens resultat har utökats med ett prediktivt intervall beräknat utifrån “Inductive Conformal Prediction”- algoritmen som ger en klar bild över estimatets tillförlitlighet. Utöver detta jämförs “Multilayer Perceptron”-algoritmen med en annan vanlig algoritm för maskinlärande, den så kallade “Support Vector Regression” för att påvisa neurala nätverks kvalité och förmåga att hantera dataset med många variabler. De variabler som används för att konstruera “Multilayer Perceptron”-modellen är sammanställda utifrån allmänt tillgängliga öppna datakällor och innehåller information så som: priser från 5990 sålda lägenheter under perioden 2011- 2013, ränteläget för bostadsrättslån från två av de stora bankerna, valresultat från riksdagsvalet 2010, geografisk information och nitton lokala särdrag. Ett flertal välkända förbättringar för “Multilayer Perceptron”-algoritmen har applicerats och evaluerats. En genetisk algoritm har använts för att stödja processen att hitta lämpliga parametrar till “Backpropagation”-algoritmen. I detta arbete drar vi slutsatsen att modellen kan producera goda förutsägelser med en modell konstruerad utifrån ett neuralt nätverk av typen “Multilayer Perceptron” beräknad med “backpropagation”, och därmed utklassar de resultat som levereras av Support Vector Regression modellen och de studier som refererats i denna avhandling

APA, Harvard, Vancouver, ISO, and other styles

12

Omran, Abir. "Improving ligand-based modelling by combining various features." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-448769.

Full text

Abstract:

Background: In drug discovery morphological profiles can be used to identify and establish a drug's biological activity or mechanism of action. Quantitative structure-activity relationship (QSAR) is an approach that uses the chemical structures to predict properties e.g., biological activity. Support Vector Machine (SVM) is a machine learning algorithm that can be used for classification. Confidence measures as conformal predictions can be implemented on top of machine learning algorithms. There are several methods that can be applied to improve a model’s predictive performance. Aim: The aim in this project is to evaluate if ligand-based modelling can be improved by combining features from chemical structures, target predictions and morphological profiles. Method: The project was divided into three experiments. In experiment 1 five bioassay datasets were used. In experiment 2 and 3 a cell painting dataset was used that contained morphological profiles from three different classes of kinase inhibitors, and the classes were used as endpoints. Support vector machine, liblinear models were built in all three experiments. A significant level of 0.2 was set to calculate the efficiency. The mean observed fuzziness and efficiency were used as measurements to evaluate the model performance. Results: Similar trends were observed for all datasets in experiment 1. Signatures+CDK13+TP which is the most complex model obtained the lowest mean observed fuzziness in four out of five times. With a confidence level of 0.8, TP+Signatures obtained the highest efficiency. Signatures+Morphological Profiles+TP obtained the lowest mean observed fuzziness in experiment 2 and 3. Signatures obtained the highest correct single label predictions with a confidence of 80%. Discussion: Less correct single label predictions were observed for the active class in comparison to the inactive class. This could have been due to them being harder to predict. The morphological profiles did not contribute with an improvement to the models predictive performance compared to Signatures. This could be due to the lack of information obtained from the dataset. Conclusion: A combination of features from chemical structures and target predictions improved ligand-based modelling compared to models only built on one of the features. The combination of features from chemical structures and morphological profiles did not improve the ligand-based models, compared to the model only built on chemical structures. By adding features from target predictions to a model built with features from chemical structures and morphological profiles a decrease in mean observed fuzziness was obtained.

APA, Harvard, Vancouver, ISO, and other styles

13

Laxhammar, Rikard. "Anomaly detection in trajectory data for surveillance applications." Licentiate thesis, Örebro universitet, Akademin för naturvetenskap och teknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-17235.

Full text

Abstract:

Abnormal behaviour may indicate important objects and events in a wide variety of domains. One such domain is intelligence and surveillance, where there is a clear trend towards more and more advanced sensor systems producing huge amounts of trajectory data from moving objects, such as people, vehicles, vessels and aircraft. In the maritime domain, for example, abnormal vessel behaviour, such as unexpected stops, deviations from standard routes, speeding, traffic direction violations etc., may indicate threats and dangers related to smuggling, sea drunkenness, collisions, grounding, hijacking, piracy etc. Timely detection of these relatively infrequent events, which is critical for enabling proactive measures, requires constant analysis of all trajectories; this is typically a great challenge to human analysts due to information overload, fatigue and inattention. In the Baltic Sea, for example, there are typically 3000–4000 commercial vessels present that are monitored by only a few human analysts. Thus, there is a need for automated detection of abnormal trajectory patterns. In this thesis, we investigate algorithms appropriate for automated detection of anomalous trajectories in surveillance applications. We identify and discuss some key theoretical properties of such algorithms, which have not been fully addressed in previous work: sequential anomaly detection in incomplete trajectories, continuous learning based on new data requiring no or limited human feedback, a minimum of parameters and a low and well-calibrated false alarm rate. A number of algorithms based on statistical methods and nearest neighbour methods are proposed that address some or all of these key properties. In particular, a novel algorithm known as the Similarity-based Nearest Neighbour Conformal Anomaly Detector (SNN-CAD) is proposed. This algorithm is based on the theory of Conformal prediction and is unique in the sense that it addresses all of the key properties above. The proposed algorithms are evaluated on real world trajectory data sets, including vessel traffic data, which have been complemented with simulated anomalous data. The experiments demonstrate the type of anomalous behaviour that can be detected at a low overall alarm rate. Quantitative results for learning and classification performance of the algorithms are compared. In particular, results from reproduced experiments on public data sets show that SNN-CAD, combined with Hausdorff distance for measuring dissimilarity between trajectories, achieves excellent classification performance without any parameter tuning. It is concluded that SNN-CAD, due to its general and parameter-light design, is applicable in virtually any anomaly detection application. Directions for future work include investigating sensitivity to noisy data, and investigating long-term learning strategies, which address issues related to changing behaviour patterns and increasing size and complexity of training data.

APA, Harvard, Vancouver, ISO, and other styles

14

Harwood, Adrian Roy George. "Numerical evaluation of acoustic Green's functions." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/numerical-evaluation-of-acoustic-greens-functions(809386ea-59cb-453b-9770-5e3250b35e98).html.

Full text

Abstract:

The reduction of noise generated by new and existing engineering products is of increasing importance commercially, socially and environmentally. Commercially, the noise emission of vehicles, such as cars and aircraft, may often be considered a selling point and the effects of noise pollution on human health and the environment has led to legislation restricting the noise emissions of many engineering products. Noise prediction schemes are important tools to help us understand and develop a means of controlling noise. Acoustic problems present numerous challenges to traditional CFD-type numerical methods rendering all but the most trivial problems unsuitable. Difficulties relate to the length scale discrepancies which arise due to the relatively tiny pressure and density fluctuations of an acoustic wave propagating over large distancesto the point of interest; the result being large computational domains to capture wave behaviour accurately between source and observer. Noise prediction may be performed using a hybrid Computational Aero-Acoustics (CAA) scheme, an approach to noise prediction which alleviates many issues associated with exclusively numerical or analytical approaches. Hybrid schemes often rely on knowledge of a Green’s function, representing the scattering of the geometry, to propagate source fluctuations to the far-field. Presently, these functions only exist in analytical form for relatively simple geometries. This research develops principles for the robust calculation of Green’s functions for general situations. In order to achieve this, three techniques to computeGreen’s functions for the Helmholtz equation within an extended class of 2D geometries are developed, evaluated and compared. Where appropriate, their extension to 3D is described. Guidance is provided on the selection of a suitable numerical method in practice given knowledge of the geometry of interest. Through inclusion of the numerical methods for the construction of Green’s functions presented here, the applicability of existing hybrid schemes will be significantly extended. Thus, it is expected that noise predictions may be performed on a more general range of geometries while exploiting the computational efficiency of hybrid prediction schemes.

APA, Harvard, Vancouver, ISO, and other styles

15

Dabbs, Russell Edward. "Do Predictions of Professional Business Economists Conform to the Rational Expectations Hypothesis?: Tests on a Set of Survey Data." Thesis, University of North Texas, 1989. https://digital.library.unt.edu/ark:/67531/metadc501259/.

Full text

Abstract:

A set of forecast survey data is analyzed in this paper for properties consistent with the Rational Expectations Hypothesis. Standard statistical tests for "rational expectations" are employed utilizing consensus forecasts generated by an interest rate newsletter. Four selected variables (Fed Funds rate, M1 rate of growth, rate of change in CPI, and real GNP growth rate) are analyzed over multiple time horizons. Results tend to reject "rational expectations" for most variables and time horizons. Forecasts are more likely to meet "rationality" criteria the shorter the forecast horizon, with the notable exception of forecasts of real GNP growth.

APA, Harvard, Vancouver, ISO, and other styles

16

Zouhri, Wahb. "Quality prediction/classification of a production system under uncertainty based on Support Vector Machine." Thesis, Paris, HESAM, 2020. http://www.theses.fr/2020HESAE058.

Full text

Abstract:

Avec l'émergence des techniques d’IoT, les industries manufacturières adoptent de nouvelles technologies d'analyse de données afin d’améliorer la qualité de leurs systèmes de production. Les méthodes de classification offrent diverses solutions aux problèmes de management de la qualité, comme la détection des défauts et la prédiction de la conformité. Cependant, les données de production sont entachées d’incertitudes qui affectent les performances de ces méthodes. Ces travaux visent à étudier l'impact des incertitudes de mesure sur les performances des machines à vecteurs supports (SVM). Deux groupes d'approches sont proposés, le premier visant à quantifier l'impact des incertitudes de mesure sur la précision de prédiction des SVM via des techniques de propagation d’incertitudes et d’analyse de données, et le second visant à améliorer la robustesse de la SVM via des approches d’optimisation robuste intrusives et non intrusives. Les différentes approches permettent de mieux appréhender la robustesse de la SVM et la manière de l'améliorer. Ces approches proposées ont été évaluées à l'aide d'études de cas avec des partenaires industriels<br>With the emergence of the IoT paradigm, manufacturing industries are opting for new technologies for data collection and analysis to evaluate the quality of their manufacturing systems. Machine learning and classification methods provide various solutions to quality management such as defect detection and conformity prediction. However, manufacturing data are affected by uncertainties, which affect the performances of classification techniques. Accordingly, the thesis aims to study and manage the impact of measurement uncertainties on the predictive performances of support vector machine (SVM). Two groups of approaches are thus proposed: the former aiming to quantify the impact of measurement uncertainties on the prediction accuracy of SVM using several propagation techniques and data mining techniques, and the latter aiming to improve the robustness of SVM to uncertainties using robust optimization techniques. The various approaches provide a better understanding of the SVM robustness and how to improve it. The proposed approaches are evaluated through case studies with industrial partners

APA, Harvard, Vancouver, ISO, and other styles

17

Platini, Marc. "Apprentissage machine appliqué à l'analyse et à la prédiction des défaillances dans les systèmes HPC." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM041.

Full text

Abstract:

Les systèmes informatiques dédiés à la haute performance (HPC) se livrent à une course à la puissance de calcul. Cette course se concrétise principalement par une augmentation de leur taille et de leur complexité. Cependant, cette augmentation entraîne des défaillances fréquentes qui peuvent réduire la disponibilité des systèmes HPC.Pour gérer ces défaillances et être capable de réduire leur influence sur les systèmes HPC, il est important de mettre en place des solutions permettant de comprendre les défaillances, voire de les prédire. En effet, les systèmes HPC produisent une grande quantité de données de supervision qui contiennent de nombreuses informations utiles à propos de leur état de fonctionnement. Cependant, l'analyse de ces données n'est pas facile à réaliser et peut être très fastidieuse car elles reflètent la complexité et la taille des systèmes HPC. Les travaux présentés dans cette thèse proposent d'utiliser des solutions d’apprentissage machine pour réaliser de manière automatisée cette analyse. De manière plus précise, cette thèse présente deux contributions principales : la première s'intéresse à la prédiction des surchauffes des processeurs dans les systèmes HPC, la deuxième se concentre sur l’analyse et la mise en évidence des relations entre les événements présents dans les journaux systèmes. Ces deux contributions sont évaluées sur des données réelles provenant d’un système HPC de grande taille utilisé en production.Pour prédire les surchauffes de processeur, nous proposons une solution qui utilise uniquement la température des processeurs. Elle repose sur l’analyse de la forme générale de la température avant un événement de surchauffe et sur l’apprentissage automatisé des corrélations entre cette forme et les événements de surchauffe grâce à un modèle d’apprentissage supervisé. L’utilisation de la forme générale des courbes et d'un modèle d'apprentissage supervisé permet l'apprentissage en utilisant des données de température avec une faible précision et en utilisant un nombre de cas de surchauffe restreint. L'évaluation de la solution montre qu'elle est capable de prédire plusieurs minutes en avance les surchauffes avec une précision et un rappel élevés. De plus, l’évaluation de ces résultats montre qu’il est possible d'utiliser des actions préventives reposant sur les prédictions faites par la solution pour réduire l'influence des surchauffes sur le système.Pour analyser et mettre en évidence de manière automatisée les relations causales entre dans les événements décrits dans les journaux des systèmes HPC, nous proposons une utilisation détournée d'un modèle d'apprentissage machine profond. En effet, ce type de modèle est classiquement utilisé pour des tâches de prédiction. Grâce à l'ajout d'une nouvelle couche proposée par des travaux de l'état de l'art étudiant les méthodes d'apprentissage machine, il est possible de déterminer l’importance des entrées de l’algorithme dans sa prédiction. En utilisant les informations sur l'importance des entrées, nous sommes capables de reconstruire les relations entre les différents événements. L’évaluation de la solution montre qu'elle est capable de mettre en évidence les relations de la grande majorité des événements survenant sur un système HPC. De plus, son évaluation par des administrateurs montre la validité des corrélations mises en évidence.Les deux contributions et leurs évaluations montrent le bénéfice de l'utilisation de solutions d'apprentissage machine pour la compréhension et la prédiction des défaillances dans les systèmes HPC en automatisant l'analyse des données de supervision<br>With the increase in size of supercomputers, also increases the number of failures or abnormal events. This increase of the number of failures reduces the availability of these systems.To manage these failures and be able to reduce their impact on HPC systems, it is important to implement solutions to understand the failures and to predict them. HPC systems produce a large amount of monitoring data that contains useful information about the status of these systems. However, the analysis of these data is difficult and can be very tedious because these data reflect the complexity and the size of HPC systems. The work presented in this thesis proposes to use machine-learning-based solutions to analyse these data in an automated way. More precisely, this thesis presents two main contributions: the first one focuses on the prediction of processors overheating events in HPC systems, the second one focuses on the analysis and the highlighting of the relationships between the events present in the system logs. Both contributions are evaluated on real data from a large HPC system used in production.To predict CPU overheating events, we propose a solution that uses only the temperature of the CPUs. It is based on the analysis of the general shape of the temperature prior to an overheating event and on the automated learning of the correlations between this shape and overheating events using a supervised learning model. The use of the general curve shape and a supervised learning model allows learning using temperature data with low accuracy and using a limited number of overheating events. The evaluation of the solution shows that it is able to predict overheating events several minutes in advance with high accuracy and recall. Furthermore, the evaluation of these results shows that it is possible to use preventive actions based on the predictions made by the solution to reduce the impact of overheating events on the system.To analyze and to extract in an automated way the causal relations between the events described in the HPC system logs, we propose an unconventional use of a deep machine learning model. Indeed, this type of model is classically used for prediction tasks. Thanks to the addition of a new layer proposed by state-of-the-art contributions of the machine learning community, it is possible to determine the weight of the algorithm inputs associated to its prediction. Using this information, we are able to detect the causal relations between the different events. The evaluation of the solution shows that it is able to extract the causal relations of the vast majority of events occurring in an HPC system. Moreover, its evaluation by administrators validates the highlighted correlations.Both contributions and their evaluations show the benefit of using machine learning solutions for understanding and predicting failures in HPC systems by automating the analysis of supervision data

APA, Harvard, Vancouver, ISO, and other styles

18

Grenet, Ingrid. "De l’utilisation des données publiques pour la prédiction de la toxicité des produits chimiques." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4050.

Full text

Abstract:

L’évaluation de la sécurité des composés chimiques repose principalement sur les résultats des études in vivo, réalisées sur des animaux de laboratoire. Cependant, ces études sont coûteuses en terme de temps, d'argent et d'utilisation d'animaux, ce qui les rend inadaptées à l'évaluation de milliers de composés. Afin de prédire rapidement la toxicité potentielle des composés et de les prioriser pour de futures études, des solutions alternatives sont actuellement envisagées telles que les essais in vitro et les modèles prédictifs d'apprentissage automatique. L’objectif de cette thèse est d’évaluer comment les données publiques de ToxCast et ToxRefDB peuvent permettre de construire de tels modèles afin de prédire les effets in vivo induits par les composés, uniquement à partir de leur structure chimique. A cette fin, et après pré-traitement des données, nous nous focalisons d’abord sur la prédiction de la bioactivité in vitro à partir de la structure chimique puis sur la prédiction des effets in vivo à partir des données de bio-activité in vitro. Pour la prédiction de la bio-activité in vitro, nous construisons et testons différents modèles de machine learning dont les descripteurs reflètent la structure chimique des composés. Puisque les données d'apprentissage sont fortement déséquilibrées en faveur des composés non toxiques, nous testons une technique d'augmentation de données et montrons qu’elle améliore les performances des modèles. Aussi, par une étude à grande échelle sur des centaines de tests in vitro de ToxCast, nous montrons que la méthode ensembliste "stacked generalization" mène à des modèles fiables sur leur domaine d'applicabilité. Pour la prédiction des effets in vivo, nous évaluons le lien entre les résultats des essais in vitro ciblant des voies connues pour induire des effets endocriniens et les effets in vivo observés dans les organes endocriniens lors d'études long terme. Nous montrons que, de manière inattendue, ces essais ne sont pas prédictifs des effets in vivo, ce qui soulève la question essentielle de la pertinence des essais in vitro. Nous faisons alors l’hypothèse que le choix d’essais capables de prédire les effets in vivo devrait reposer sur l’utilisation d'informations complémentaires comme, en particulier, les données mécanistiques<br>Currently, chemical safety assessment mostly relies on results obtained in in vivo studies performed in laboratory animals. However, these studies are costly in term of time, money and animals used and therefore not adapted for the evaluation of thousands of compounds. In order to rapidly screen compounds for their potential toxicity and prioritize them for further testing, alternative solutions are envisioned such as in vitro assays and computational predictive models. The objective of this thesis is to evaluate how the public data from ToxCast and ToxRefDB can allow the construction of this type of models in order to predict in vivo effects induced by compounds, only based on their chemical structure. To do so, after data pre-processing, we first focus on the prediction of in vitro bioactivity from chemical structure and then on the prediction of in vivo effects from in vitro bioactivity data. For the in vitro bioactivity prediction, we build and test various models based on compounds’ chemical structure descriptors. Since learning data are highly imbalanced in favor of non-toxic compounds, we test a data augmentation technique and show that it improves models’ performances. We also perform a largescale study to predict hundreds of in vitro assays from ToxCast and show that the stacked generalization ensemble method leads to reliable models when used on their applicability domain. For the in vivo effects prediction, we evaluate the link between results from in vitro assays targeting pathways known to induce endocrine effects and in vivo effects observed in endocrine organs during longterm studies. We highlight that, unexpectedly, these assays are not predictive of the in vivo effects, which raises the crucial question of the relevance of in vitro assays. We thus hypothesize that the selection of assays able to predict in vivo effects should be based on complementary information such as, in particular, mechanistic data

APA, Harvard, Vancouver, ISO, and other styles

19

Saverimoutou, Antoine. "Métrologie de l'internet du futur : caractérisation, quantification et prédiction de la qualité de la navigation web." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2020. http://www.theses.fr/2020IMTA0186.

Full text

Abstract:

La navigation Web est l'un des principaux services de l’Internet où un large éventail d'acteurs est impliqué et évolue de manière constante. Les nouveaux protocoles de l’Internet, les réseaux de diffusion de contenu (CDN) ou encore les évolutions des différents navigateurs Web sont destinés à améliorer les temps de chargement des pages. Pour mieux comprendre la Qualité d'Expérience (QoE) perçue par les utilisateurs, il est donc essentiel d’identifier comment le contenu des pages est constitué et délivré et de fournir une métrique de QoE pertinente. Dans cette thèse, nous avons conçu un nouvel outil, Web View, destiné à effectuer des sessions de navigation Web automatisées et mesurant de nombreuses informations liées à l’écosystème du Web. Nous avons aussi défini une nouvelle métrique Web, le Time for Full Visual Rendering (TFVR). À partir de plus de 18 trillions de mesures effectuées pendant 2,5 ans sur les 10,000 sites Web les plus visités (selon la classification Alexa), nous avons utilisé des techniques statistiques pour identifier les paramètres clés qualifiant et quantifiant la qualité de la navigation Web. Cet ensemble de facteurs a été confirmé par un processus d'apprentissage automatique, donnant en sortie un ensemble de règles pour prédire les temps de chargement des pages Web. Pour les pages Web où des fluctuations dans les temps de chargement sont fréquentes, nous avons utilisé des modèles de chaines de Markov cachées suivant un processus de Dirichlet hiérarchique (HDP-HMM) pour enrichir notre modèle et ainsi augmenter le taux de prédiction correspondant. L'évaluation de notre modèle basé sur un arbre de décision sur des sites Web jamais mesurés montre que nous pouvons prédire correctement la qualité de la navigation Web. Ces travaux visent ainsi à permettre aux opérateurs de réseaux et aux fournisseurs de services d’augmenter la Qualité de Service (QoS) offerte à leurs clients<br>Web browsing is one of the main Internet services where a wide set of actors are involved and evolves constantly. New Internet protocols, Content Delivery Networks (CDN) or web browsers’evolutions are meant to improve web pages’ loading times. In order to better understand the Quality of Experience (QoE) perceived by end-users, it is of prime importance to identify how web pages’ content is composed and delivered as well as providing are levant QoE metric. In this thesis, we have designed a new tool, Web View, meant to perform automatic Web browsing and measures several aspects of the Web browsing eco-system. We have also introduced a new web metric, the Time for Full Visual Rendering (TFVR). From more than 18 Trillion measurements performed over 2.5 years on the Top 10,000 Alexa websites, we have used statistical techniques to identify the key parameters qualifying and quantifying web browsing quality. This set of factors has been confirmed by a machine learning process, which gives as output the set of rules to predict web pages’ loading times. For websites where fluctuations in loading times happen regularly, we have used the Hierarchical Dirich let Process Hidden Markov Model (HDP-HMM) to enrich our rulesbased models in order to increase the correctness in prediction rates. The evaluation of our decision treebased model on never assessed websites shows that we can correctly predict web browsing quality. This work aims at helping network operators and service providers to increase the Quality of Service (QoS) offered to their customers

APA, Harvard, Vancouver, ISO, and other styles

20

Ndenga, Malanga Kennedy. "Predicting post-release software faults in open source software as a means of measuring intrinsic software product quality." Electronic Thesis or Diss., Paris 8, 2017. http://www.theses.fr/2017PA080099.

Full text

Abstract:

Les logiciels défectueux ont des conséquences coûteuses. Les développeurs de logiciels doivent identifier et réparer les composants défectueux dans leurs logiciels avant de les publier. De même, les utilisateurs doivent évaluer la qualité du logiciel avant son adoption. Cependant, la nature abstraite et les multiples dimensions de la qualité des logiciels entravent les organisations de mesurer leur qualités. Les métriques de qualité logicielle peuvent être utilisées comme proxies de la qualité du logiciel. Cependant, il est nécessaire de disposer d'une métrique de processus logiciel spécifique qui peut garantir des performances de prédiction de défaut meilleures et cohérentes, et cela dans de différents contextes. Cette recherche avait pour objectif de déterminer un prédicteur de défauts logiciels qui présente la meilleure performance de prédiction, nécessite moins d'efforts pour la détection et a un coût minimum de mauvaise classification des composants défectueux. En outre, l'étude inclut une analyse de l'effet de la combinaison de prédicteurs sur la performance d'un modèles de prédiction de défauts logiciels. Les données expérimentales proviennent de quatre projets OSS. La régression logistique et la régression linéaire ont été utilisées pour prédire les défauts. Les métriques Change Burst ont enregistré les valeurs les plus élevées pour les mesures de performance numérique, avaient les probabilités de détection de défaut les plus élevées et le plus faible coût de mauvaise classification des composants<br>Faulty software have expensive consequences. To mitigate these consequences, software developers have to identify and fix faulty software components before releasing their products. Similarly, users have to gauge the delivered quality of software before adopting it. However, the abstract nature and multiple dimensions of software quality impede organizations from measuring software quality. Software quality metrics can be used as proxies of software quality. There is need for a software process metric that can guarantee consistent superior fault prediction performances across different contexts. This research sought to determine a predictor for software faults that exhibits the best prediction performance, requires least effort to detect software faults, and has a minimum cost of misclassifying components. It also investigated the effect of combining predictors on performance of software fault prediction models. Experimental data was derived from four OSS projects. Logistic Regression was used to predict bug status while Linear Regression was used to predict number of bugs per file. Models built with Change Burst metrics registered overall better performance relative to those built with Change, Code Churn, Developer Networks and Source Code software metrics. Change Burst metrics recorded the highest values for numerical performance measures, exhibited the highest fault detection probabilities and had the least cost of mis-classification of components. The study found out that Change Burst metrics could effectively predict software faults

APA, Harvard, Vancouver, ISO, and other styles

21

Salaün, Achille. "Prédiction d'alarmes dans les réseaux via la recherche de motifs spatio-temporels et l'apprentissage automatique." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAS010.

Full text

Abstract:

Les réseaux de télécommunication prennent aujourd'hui une place prépondérante dans notre monde. Ils permettent en effet de partager des informations en masse et à l’échelle planétaire. Toutefois, il s’agit de systèmes complexes, en taille comme en diversité technologique. Cela rend d’autant plus complexes leur maintenance et leur réparation. Afin de limiter l’influence négative de ces dernières, des outils doivent être développés pour détecter une panne dès qu’elle a lieu, en analyser les causes afin de la résoudre efficacement, voire prédire ladite panne pour prévenir plutôt que guérir. Dans cette thèse, nous nous intéressons principalement à ces deux derniers problèmes. Pour cela, nous disposons de fichiers, appelés logs d’alarmes, recensant l’ensemble des alarmes émises par le système. Cependant, ces fichiers sont généralement verbeux et bruités: l’administrateur à la charge d’un réseau doit disposer d’outils capables d’isoler et manipuler de façon interprétable les liens de causalité au sein d’un log. Dans cette thèse, nous avons suivi deux approches. La première est inspirée des techniques de correspondance de motifs: en s’inspirant de l’algorithme d’Ukkonen, nous construisons en ligne une structure, appelée DIG-DAG, qui stocke toutes les chaînes de causalité possibles entre les événements d’un log. Nous proposons également un système de requête pour exploiter cette structure. Enfin, nous appliquons cette approche dans le cadre de l’analyse de causes racines. La seconde approche est une approche générative pour la prédiction de données. En particulier, nous comparons deux modèles classiques pour cette tâche: les réseaux de neurones récurrents d’une part et les modèles de Markov cachés d’autre part. En effet, dans leurs communautés respectives, ces deux modèles font office d’état de l’art. Ici, nous comparons analytiquement leurs expressivités grâce à un modèle probabiliste, appelé GUM, englobant ces deux modèles<br>Nowadays, telecommunication networks occupy a central position in our world. Indeed, they allow to share worldwide a huge amount of information. Networks are however complex systems, both in size and technological diversity. Therefore, it makes their management and reparation more difficult. In order to limit the negative impact of such failures, some tools have to be developed to detect a failure whenever it occurs, analyse its root causes to solve it efficiently, or even predict this failure as prevention is better than cure. In this thesis, we mainly focus on these two last problems. To do so, we use files, called alarm logs, storing all the alarms that have been emitted by the system. However, these files are generally noisy and verbose: an operator managing a network needs tools able to extract and handle in an interpretable manner the causal relationships inside a log. In this thesis, we followed two directions. First, we have inspired from pattern matching techniques: similarly to the Ukkonen’s algorithm, we build online a structure, called DIG-DAG, that stores all the potential causal relationships between the events of a log. Moreover, we introduce a query system to exploit our DIG-DAG structure. Finally, we show how our solution can be used for root cause analysis. The second approach is a generative approach for the prediction of time series. In particular, we compare two well-known models for this task: recurrent neural nets on the one hand, hidden Markov models on the other hand. Here, we compare analytically the expressivity of these models by encompassing them into a probabilistic model, called GUM

APA, Harvard, Vancouver, ISO, and other styles

22

Elloumi, Zied. "Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM005/document.

Full text

Abstract:

Nous abordons dans cette thèse la tâche de prédiction de performances des systèmes de reconnaissance automatique de la parole (SRAP).Il s'agit d'une tâche utile pour mesurer la fiabilité d'hypothèses de transcription issues d'une nouvelle collection de données, lorsque la transcription de référence est indisponible et que le SRAP utilisé est inconnu (boîte noire).Notre contribution porte sur plusieurs axes:d'abord, nous proposons un corpus français hétérogène pour apprendre et évaluer des systèmes de prédiction de performances ainsi que des systèmes de RAP.Nous comparons par la suite deux approches de prédiction: une approche à l'état de l'art basée sur l'extraction explicite de traitset une nouvelle approche basée sur des caractéristiques entraînées implicitement à l'aide des réseaux neuronaux convolutifs (CNN).L'utilisation jointe de traits textuels et acoustiques n'apporte pas de gains avec de l'approche état de l'art,tandis qu'elle permet d'obtenir de meilleures prédictions en utilisant les CNNs. Nous montrons également que les CNNs prédisent clairement la distribution des taux d'erreurs sur une collection d'enregistrements, contrairement à l'approche état de l'art qui génère une distribution éloignée de la réalité.Ensuite, nous analysons des facteurs impactant les deux approches de prédiction. Nous évaluons également l'impact de la quantité d'apprentissage des systèmes de prédiction ainsi que la robustesse des systèmes appris avec les sorties d'un système de RAP particulier et utilisés pour prédire la performance sur une nouvelle collection de données.Nos résultats expérimentaux montrent que les deux approches de prédiction sont robustes et que la tâche de prédiction est plus difficile sur des tours de parole courts ainsi que sur les tours de parole ayant un style de parole spontané.Enfin, nous essayons de comprendre quelles informations sont capturées par notre modèle neuronal et leurs liens avec différents facteurs.Nos expériences montrent que les représentations intermédiaires dans le réseau encodent implicitementdes informations sur le style de la parole, l'accent du locuteur ainsi que le type d'émission.Pour tirer profit de cette analyse, nous proposons un système multi-tâche qui se montre légèrement plus efficace sur la tâche de prédiction de performance<br>In this thesis, we focus on performance prediction of automatic speech recognition (ASR) systems.This is a very useful task to measure the reliability of transcription hypotheses for a new data collection, when the reference transcription is unavailable and the ASR system used is unknown (black box).Our contribution focuses on several areas: first, we propose a heterogeneous French corpus to learn and evaluate ASR prediction systems.We then compare two prediction approaches: a state-of-the-art (SOTA) performance prediction based on engineered features and a new strategy based on learnt features using convolutional neural networks (CNNs).While the joint use of textual and signal features did not work for the SOTA system, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the shape of the WER distribution on a collection of speech recordings.Then, we analyze factors impacting both prediction approaches. We also assess the impact of the training size of prediction systems as well as the robustness of systems learned with the outputs of a particular ASR system and used to predict performance on a new data collection.Our experimental results show that both prediction approaches are robust and that the prediction task is more difficult on short speech turns as well as spontaneous speech style.Finally, we try to understand which information is captured by our neural model and its relation with different factors.Our experiences show that intermediate representations in the network automatically encode information on the speech style, the speaker's accent as well as the broadcast program type.To take advantage of this analysis, we propose a multi-task system that is slightly more effective on the performance prediction task

APA, Harvard, Vancouver, ISO, and other styles

23

"Conformal Predictions in Multimedia Pattern Recognition." Doctoral diss., 2010. http://hdl.handle.net/2286/R.I.8604.

Full text

Abstract:

abstract: The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about a decision, or when they are not. Machine learning techniques today are not yet fully equipped to be trusted with this critical task. This work seeks to address this fundamental knowledge gap. Existing approaches that provide a measure of confidence on a prediction such as learning algorithms based on the Bayesian theory or the Probably Approximately Correct theory require strong assumptions or often produce results that are not practical or reliable. The recently developed Conformal Predictions (CP) framework - which is based on the principles of hypothesis testing, transductive inference and algorithmic randomness - provides a game-theoretic approach to the estimation of confidence with several desirable properties such as online calibration and generalizability to all classification and regression methods. This dissertation builds on the CP theory to compute reliable confidence measures that aid decision-making in real-world problems through: (i) Development of a methodology for learning a kernel function (or distance metric) for optimal and accurate conformal predictors; (ii) Validation of the calibration properties of the CP framework when applied to multi-classifier (or multi-regressor) fusion; and (iii) Development of a methodology to extend the CP framework to continuous learning, by using the framework for online active learning. These contributions are validated on four real-world problems from the domains of healthcare and assistive technologies: two classification-based applications (risk prediction in cardiac decision support and multimodal person recognition), and two regression-based applications (head pose estimation and saliency prediction in images). The results obtained show that: (i) multiple kernel learning can effectively increase efficiency in the CP framework; (ii) quantile p-value combination methods provide a viable solution for fusion in the CP framework; and (iii) eigendecomposition of p-value difference matrices can serve as effective measures for online active learning; demonstrating promise and potential in using these contributions in multimedia pattern recognition problems in real-world settings.<br>Dissertation/Thesis<br>Ph.D. Computer Science 2010

APA, Harvard, Vancouver, ISO, and other styles

24

KWON, YONG JUNG. "CONFORMAL SOLUTION METHOD WITH THE HARD CONVEX BODY EXPANSION THEORY FOR PREDICTING VAPOR-LIQUID EQUILIBRIA." Thesis, 1986. http://hdl.handle.net/1911/16079.

Full text

Abstract:

Like the hard sphere expansion (HSE) theory, the hard convex body expansion (HCBE) theory separates any residual thermodynamic property into a contribution from molecular repulsion, which is calculated directly from a hard convex body (HCB) equation of state, and other contributions from molecular attraction, which are obtained by the corresponding states principle (CSP) using pure reference fluids. The HSE theory yields good agreement with the experimental thermodynamic data for light hydrocarbon mixture systems. However, there is a limit to molecular size and shape difference in mixtures where the intermolecular repulsion can be represented by hard sphere mixture. A HCB equation of state developed by Naumann and Leland (1984) is applicable to pure components and their mixtures. The HCB equation of state for a pure component is characterized by two dimensionless geometrical parameters, $\alpha$ and $\tau\sp{-1},$ which are combinations of three molecular dimensions of a convex body--volume(V), surface area(S), and mean radius(R). Two dimensionless geometrical parameters are determined directly from Pitzer's acentric factor. The molecular volume is evaluated by equating the HCB equation of state to the optimal repulsion evaluated by the expansion method. The surface area and the mean radius are obtained from known dimensionless geometrical parameters and molecular volume. Four kinds of convex bodies are considered in this work. These are prolate spherocylinders, oblate spherocylinders, prolate ellipsoids, and oblate ellipsoids. Better results for the vapor-liquid equilibrium constants (K-values) for mixtures containing molecules as nonspherical as n-decane in the prolate models and cyclohexane in the oblate models are obtained with this method than with ordinary equations of state using empirical mixing rules. This HCBE theory can also be applied to predict thermodynamic properties of pure components using two reference fluids as in the Lee-Kesler method (1975).

APA, Harvard, Vancouver, ISO, and other styles

25

Park, Ruth Jean. "Development and verification of a short-range ensemble numerical weather prediction system for Southern Africa." Diss., 2014. http://hdl.handle.net/2263/41185.

Full text

Abstract:

This research has been conducted in order to develop a short-range ensemble numerical weather prediction system over southern Africa using the Conformal-Cubic Atmospheric Model (CCAM). An ensemble prediction system (EPS) combines several individual weather model setups into an average forecast system where each member contributes to the final weather forecast. Four different EPSs were configured and rainfall forecasts simulated for seven days ahead for the summer months of January and February, 2009 and 2010, for high (15 km) and low (50 km) resolution over the southern African domain. Statistical analysis was performed on the forecasts so as to determine which EPS was the most skilful at simulating rainfall. Measurements that were used to determine the skill of the EPSs were: reliability diagrams, relative operating characteristics, the Brier skill score and the root mean square error. The results show that the largest ensemble is consistently the most skilful for all forecasts for both the high and the low resolution cases. The higher resolution forecasts were also seen to be more skilful than the forecasts made at the low resolution. These findings conclude that the largest ensemble at high resolution is the best system to predict rainfall over southern Africa using the CCAM.<br>Dissertation (MSc)--University of Pretoria, 2014.<br>gm2014<br>Geography, Geoinformatics and Meteorology<br>unrestricted

APA, Harvard, Vancouver, ISO, and other styles

26

Grout, Ioannis. "Evaluation of normal tissue complication probability (NTCP) dose-response models predicting acute Pneumonitis in patients treated with conformal radiation therapy for non-small cell lung cancer, and development of a NTCP calculation software tool." Thesis, 2007. http://nemertes.lis.upatras.gr/jspui/handle/10889/654.

Full text

Abstract:

A set of mathematical models, known as radiobiological Dose-Response models, have been developed, to model the biological effects and complications that arise following irradiation. The overall objective is to be able to apply these in clinical practice with confidence, and ensure more successful treatments are given to patients. This investigation serves to assess these models and their predictive power of NTCP following irradiation of the lung. Clinical data, from patients treated for inoperable stage III non-small cell lung cancer is obtained and the consequent biological effect (severity of pneumonitis) observed as a result of this radiation treatment is assessed by the models. By gaining more knowledge about the 3D dose-distribution and the incidence of radiation pneumonitis through the evaluation of the models, the main treatment goal, which is to maximise TCP and minimise NTCP can be achieved. Post treatment data is obtained regarding the clinical outcome or clinical endpoint for each patient, considered to be Radiation Pneumonitis. The clinical endpoint is a specific biological effect that may or may not have occurred,after a certain period, following irradiation. The models are assessed on their ability to predict a NTCP value that corresponds to the resulting clinical endpoint following treatment. Furthermore a software tool for the calculation of NTCP’s by the models is developed, in an attempt to provide an important tool for optimization of radiotherapy treatment planning. With our findings from this study, our aim is to further strengthen, support and challenge already existing literature on dose-response modelling.<br>-

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!