Dissertations / Theses: 'Data Driven Inference'

1

Park, June Young. "Data-driven Building Metadata Inference." Research Showcase @ CMU, 2016. http://repository.cmu.edu/theses/127.

Full text

Abstract:

Building technology has been developed due to the improvement of information technology. Specifically, a human can control and monitor the building operation by a number of sensors and actuators. The sensors and actuators are installed on every single element in a building. Thus, the large stream of building data allows us to implement both quantitative and qualitative improvements. However, there are still limitations to mapping between the physical building element and cyber system. To solve this mapping issue, last summer, a text mining methodology was developed as part of a project conducted by the Consortium for Building Energy Innovation. Building data was extracted from building 661, in Philadelphia, PA. The ground truth of the building data point with semantic information was labeled by manual inspection. And a Support Vector Machine was implemented to investigate the relationship between the data point name and the semantic information. This algorithm achieves 93% accuracy with unseen building 661 data points. Techniques and lessons were gained from this project, and this knowledge was used to develop the framework for analyzing the building data from the Gates Hillman Center (GHC) building, Pittsburgh PA. This new framework consists of two stages. In the first stage, we initially tried to cluster the data points by similar semantic information, using the hierarchical clustering method. However, the effectiveness and accuracy of the clustering method is not adequate for this framework. Thus, the filtering and classification model is developed to identify the semantic information of the data points. From the filtering and classification method, it correctly identifies the damper position and supply air duct pressure data point with 90% accuracy by daily statistical features. Having the semantic information from the first stage, the second stage figures out the relationship between Variable Air Volume (VAV) terminal units and Air Handling Units (AHU). The intuitive thermal and flow relationship between VAVs and AHUs are investigated at the beginning, and the statistical features clustering method is applied from the VAV discharge temperature data. However, the control strategy of this building makes this relationship invisible. Alternatively we then compared the similarity between damper position at VAVs and supply air duct pressure at AHUs by calculating the cross correlation. Finally, this similarity scoring method achieved 80% accuracy to map the relationship between VAVs and AHUs. The suggested framework will guide the user to find the desired information such as the VAVs – AHUs relationship from the problem generated by a large number of heterogeneous sensor networks by using data-driven methodology.

APA, Harvard, Vancouver, ISO, and other styles

2

Spoon, Steven Alexander. "Demand-Driven Type Inference with Subgoal Pruning." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/7486.

Full text

Abstract:

Highly dynamic languages like Smalltalk do not have much static type information immediately available before the program runs. Static types can still be inferred by analysis tools, but historically, such analysis is only effective on smaller programs of at most a few tens of thousands of lines of code. This dissertation presents a new type inference algorithm, DDP, that is effective on larger programs with hundreds of thousands of lines of code. The approach of the algorithm borrows from the field of knowledge-based systems: it is a demand-driven algorithm that sometimes prunes subgoals. The algorithm is formally described, proven correct, and implemented. Experimental results show that the inferred types are usefully precise. A complete program understanding application, Chuck, has been developed that uses DDP type inferences. This work contributes the DDP algorithm itself, the most thorough semantics of Smalltalk to date, a new general approach for analysis algorithms, and experimental analysis of DDP including determination of useful parameter settings. It also contributes an implementation of DDP, a general analysis framework for Smalltalk, and a complete end-user application that uses DDP.

APA, Harvard, Vancouver, ISO, and other styles

3

Michelen, Strofer Carlos Alejandro. "Machine Learning and Field Inversion approaches to Data-Driven Turbulence Modeling." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103155.

Full text

Abstract:

There still is a practical need for improved closure models for the Reynolds-averaged Navier-Stokes (RANS) equations. This dissertation explores two different approaches for using experimental data to provide improved closure for the Reynolds stress tensor field. The first approach uses machine learning to learn a general closure model from data. A novel framework is developed to train deep neural networks using experimental velocity and pressure measurements. The sensitivity of the RANS equations to the Reynolds stress, required for gradient-based training, is obtained by means of both variational and ensemble methods. The second approach is to infer the Reynolds stress field for a flow of interest from limited velocity or pressure measurements of the same flow. Here, this field inversion is done using a Monte Carlo Bayesian procedure and the focus is on improving the inference by enforcing known physical constraints on the inferred Reynolds stress field. To this end, a method for enforcing boundary conditions on the inferred field is presented. The two data-driven approaches explored and improved upon here demonstrate the potential for improved practical RANS predictions.
Doctor of Philosophy
The Reynolds-averaged Navier-Stokes (RANS) equations are widely used to simulate fluid flows in engineering applications despite their known inaccuracy in many flows of practical interest. The uncertainty in the RANS equations is known to stem from the Reynolds stress tensor for which no universally applicable turbulence model exists. The computational cost of more accurate methods for fluid flow simulation, however, means RANS simulations will likely continue to be a major tool in engineering applications and there is still a need for improved RANS turbulence modeling. This dissertation explores two different approaches to use available experimental data to improve RANS predictions by improving the uncertain Reynolds stress tensor field. The first approach is using machine learning to learn a data-driven turbulence model from a set of training data. This model can then be applied to predict new flows in place of traditional turbulence models. To this end, this dissertation presents a novel framework for training deep neural networks using experimental measurements of velocity and pressure. When using velocity and pressure data, gradient-based training of the neural network requires the sensitivity of the RANS equations to the learned Reynolds stress. Two different methods, the continuous adjoint and ensemble approximation, are used to obtain the required sensitivity. The second approach explored in this dissertation is field inversion, whereby available data for a flow of interest is used to infer a Reynolds stress field that leads to improved RANS solutions for that same flow. Here, the field inversion is done via the ensemble Kalman inversion (EKI), a Monte Carlo Bayesian procedure, and the focus is on improving the inference by enforcing known physical constraints on the inferred Reynolds stress field. To this end, a method for enforcing boundary conditions on the inferred field is presented. While further development is needed, the two data-driven approaches explored and improved upon here demonstrate the potential for improved practical RANS predictions.

APA, Harvard, Vancouver, ISO, and other styles

4

Marcou, Quentin. "Probabilistic approaches to the adaptive immune repertoire : a data-driven approach." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB029/document.

Full text

Abstract:

Le système immunitaire de chaque individu doit faire face à des agressions répétées d'un environnement en constante évolution, constituant ainsi un nombre de menaces virtuellement infini. Afin de mener ce rôle à bien, le système immunitaire adaptatif s'appuie sur une énorme diversité de lymphocytes T et B. Chacune de ces cellules exhibe à sa surface un récepteur unique, créé aléatoirement via le processus de recombinaison V(D)J, et spécifique à un petit nombre de pathogènes seulement. La diversité initiale générée lors de ce processus de recombinaison est ensuite réduite par une étape de sélection fonctionnelle basée sur les propriétés de repliement du récepteur ainsi que ses capacités à interagir avec des protéines du soi. Pour les cellules B, cette diversité peut être à nouveau étendue après rencontre d'un pathogène lors du processus de maturation d'affinité durant lequel le récepteur subit des cycles successifs d'hypermutation et sélection. Ces travaux présentent des approches probabilistes visant à inférer les distributions de probabilités sous-tendant les processus de recombinaison et d'hypermutation à partir de données de séquençage haut débit. Ces approches ont donné naissance à IGoR, un logiciel polyvalent dont les performances dépassent celles des outils existants. En utilisant les modèles obtenus comme base, je présenterai comment ces derniers peuvent être utilisés afin d'étudier le vieillissement et évolution du répertoire immunitaire, la présence d'emprunte parentale lors de la recombinaison V(D)J ou encore pour démontrer que les jumeaux échangent des lymphocytes au cours de la vie fœtale
An individual’s adaptive immune system needs to face repeated challenges of a constantly evolving environment with a virtually infinite number of threats. To achieve this task, the adaptive immune system relies on large diversity of B-cells and T-cells, each carrying a unique receptor specific to a small number of pathogens. These receptors are initially randomly built through the process of V(D)J recombination. This initial generated diversity is then narrowed down by a step of functional selection based on the receptors' folding properties and their ability to recognize self antigens. Upon recognition of a pathogen the B-cell will divide and its offsprings will undergo several rounds of successive somatic hypermutations and selection in an evolutionary process called affinity maturation. This work presents principled probabilistic approaches to infer the probability distribution underlying the recombination and somatic hypermutation processes from high throughput sequencing data using IGoR - a flexible software developed throughout the course of this PhD. IGoR has been developed as a versatile research tool and can encode a variety of models of different biological complexity to allow researchers in the field to characterize evermore precisely immune receptor repertoires. To motivate this data-driven approach we demonstrate that IGoR outperforms existing tools in accuracy and estimate the sample sizes needed for reliable repertoire characterization. Finally, using obtained model predictions, we show potential applications of these methods by demonstrating that homozygous twins share T-cells through cord blood, that the public core of the T cell repertoire is formed in the pre-natal period and finally estimate naive T cell clone lifetimes in human

APA, Harvard, Vancouver, ISO, and other styles

5

Das, Debasish. "Bayesian Sparse Regression with Application to Data-driven Understanding of Climate." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/313587.

Full text

Abstract:

Computer and Information Science
Ph.D.
Sparse regressions based on constraining the L1-norm of the coefficients became popular due to their ability to handle high dimensional data unlike the regular regressions which suffer from overfitting and model identifiability issues especially when sample size is small. They are often the method of choice in many fields of science and engineering for simultaneously selecting covariates and fitting parsimonious linear models that are better generalizable and easily interpretable. However, significant challenges may be posed by the need to accommodate extremes and other domain constraints such as dynamical relations among variables, spatial and temporal constraints, need to provide uncertainty estimates and feature correlations, among others. We adopted a hierarchical Bayesian version of the sparse regression framework and exploited its inherent flexibility to accommodate the constraints. We applied sparse regression for the feature selection problem of statistical downscaling of the climate variables with particular focus on their extremes. This is important for many impact studies where the climate change information is required at a spatial scale much finer than that provided by the global or regional climate models. Characterizing the dependence of extremes on covariates can help in identification of plausible causal drivers and inform extremes downscaling. We propose a general-purpose sparse Bayesian framework for covariate discovery that accommodates the non-Gaussian distribution of extremes within a hierarchical Bayesian sparse regression model. We obtain posteriors over regression coefficients, which indicate dependence of extremes on the corresponding covariates and provide uncertainty estimates, using a variational Bayes approximation. The method is applied for selecting informative atmospheric covariates at multiple spatial scales as well as indices of large scale circulation and global warming related to frequency of precipitation extremes over continental United States. Our results confirm the dependence relations that may be expected from known precipitation physics and generates novel insights which can inform physical understanding. We plan to extend our model to discover covariates for extreme intensity in future. We further extend our framework to handle the dynamic relationship among the climate variables using a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP). The extended model can achieve simultaneous clustering and discovery of covariates within each cluster. Moreover, the a priori knowledge about association between pairs of data-points is incorporated in the model through must-link constraints on a Markov Random Field (MRF) prior. A scalable and efficient variational Bayes approach is developed to infer posteriors on regression coefficients and cluster variables.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

6

Wu, Jinlong. "Predictive Turbulence Modeling with Bayesian Inference and Physics-Informed Machine Learning." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/85129.

Full text

Abstract:

Reynolds-Averaged Navier-Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high-fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled.
Ph. D.
Reynolds-Averaged Navier–Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled.

APA, Harvard, Vancouver, ISO, and other styles

7

Koseler, Kaan Tamer. "Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use Case." Miami University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=miami1524832924255132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Sušak, Hana 1985. "The Hunt of cancer genes : statistical inference of cancer risk and driver genes using next generation sequencuing data." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/668447.

Full text

Abstract:

International cancer sequencing projects have generated comprehensive catalogs of alterations found in tumor genomes, as well as germline variant data for thousands of individuals. In this thesis, we describe two statistical methods exploiting these rich datasets in order to better understand tumor initiation, tumor progression and the contribution of genetic variants to the lifetime risk of developing cancer. The first method, a Bayesian inference model named cDriver, utilizes multiple signatures of positive selection acting on tumor genomes to predict cancer driver genes. Cancer cell fraction is introduced as a novel signature of positive selection on a cellular level, based on the hypothesis that cells obtaining additional advantageous driver mutations will undergo rapid proliferation and clonal expansion. We benchmarked cDriver against state of the art driver prediction methods on three cancer datasets demonstrating equal or better performance than the best competing tool. The second method, termed REWAS is a comprehensive framework for rare-variant association studies (RVAS) aiming at improving identification of cancer predisposition genes. Nonetheless, REWAS is readily applicable to any case-control study of complex diseases. Besides integrating well-established RVAS methods, we developed a novel Bayesian inference RVAS method (BATI) based on Integrated Nested Laplace Approximation (INLA). We demonstrate that BATI outperforms other methods on realistic simulated datasets, especially when meaningful biological context (e.g. functional impact of variants) is available or when risk variants in sum explain low phenotypic variance. Both methods developed during my thesis have the potential to facilitate personalized medicine and oncology through identification of novel therapeutic targets and identification of genetic predisposition facilitating prevention and early diagnosis of cancer.
Els distints projectes internacionals de seqüenciació de càncer duts a terme en els últims anys han generat catàlegs complets d’alteracions trobades en els genomes tumorals, així com informació de variants germinals per a milers d'individus. En aquesta tesi descrivim dos mètodes estadístics aprofitant aquestes bases de dades per tal d’entendre millor la iniciació i la progressió dels tumors, i la contribució de variants genètiques al risc de desenvolupar càncer al llarg de la vida. El primer mètode, anomenat cDriver, es basa en un model d’inferència Bayesià que utilitza múltiples senyals de la selecció positiva que ocorre en els genomes tumorals per tal de predir els gens driver del càncer. En aquest mètode, hem inclòs la fracció de cèl·lules tumorals com a nova senyal de la selecció positiva a nivell cel·lular. Aquesta es basa en la hipòtesi que les cèl·lules que adquireixen mutacions ventajoses proliferaran i s’expandiran clonalment més ràpidament. Per avaluar cDriver, aquest es va comparar amb els mètodes més utilitzats per a la predicció de gens driver actuals. L’anàlisi es va dur a terme amb conjunts de dades de tres càncer diferents i els resultats van ser iguals o millors que els obtinguts per les eines més competitives en el tema. El segon mètode, anomenat REWAS, és un marc de treball per l’estudi d’associació de variants rares (RVAS) amb l'objectiu de millorar la identificació dels gens de predisposició al càncer. Tot i això, REWAS es pot aplicar a qualsevol estudi cas-control de malalties complexes. Per una altra part, a més d'integrar mètodes RVAS ben establerts, hem desenvolupat un nou mètode d'inferència Bayesiana RVAS basat en Integrated Nested Laplace Approximation (BATI). També demostrem que BATI mostra millors resultats que altres mètodes en dades simulades amb soroll de fons real, especialment quan el context biològic (p.e. variants amb impacte funcional) està disponible or quan les variants de risc expliquen en total poca variància fenotípica.

APA, Harvard, Vancouver, ISO, and other styles

9

Sušak, Hana 1985. "The Hunt of cancer genes : statistical inference of cancer risk and driver genes using next generation sequencing data." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/664504.

Full text

Abstract:

Els distints projectes internacionals de seqüenciació de càncer duts a terme en els últims anys han generat catàlegs complets d’alteracions trobades en els genomes tumorals, així com informació de variants germinals per a milers d'individus. En aquesta tesi descrivim dos mètodes estadístics aprofitant aquestes bases de dades per tal d’entendre millor la iniciació i la progressió dels tumors, i la contribució de variants genètiques al risc de desenvolupar càncer al llarg de la vida. El primer mètode, anomenat cDriver, es basa en un model d’inferència Bayesià que utilitza múltiples senyals de la selecció positiva que ocorre en els genomes tumorals per tal de predir els gens driver del càncer. En aquest mètode, hem inclòs la fracció de cèl·lules tumorals com a nova senyal de la selecció positiva a nivell cel·lular. Aquesta es basa en la hipòtesi que les cèl·lules que adquireixen mutacions ventajoses proliferaran i s’expandiran clonalment més ràpidament. Per avaluar cDriver, aquest es va comparar amb els mètodes més utilitzats per a la predicció de gens driver actuals. L’anàlisi es va dur a terme amb conjunts de dades de tres càncer diferents i els resultats van ser iguals o millors que els obtinguts per les eines més competitives en el tema. El segon mètode, anomenat REWAS, és un marc de treball per l’estudi d’associació de variants rares (RVAS) amb l'objectiu de millorar la identificació dels gens de predisposició al càncer. Tot i això, REWAS es pot aplicar a qualsevol estudi cas-control de malalties complexes. Per una altra part, a més d'integrar mètodes RVAS ben establerts, hem desenvolupat un nou mètode d'inferència Bayesiana RVAS basat en Integrated Nested Laplace Approximation (BATI). També demostrem que BATI mostra millors resultats que altres mètodes en dades simulades amb soroll de fons real, especialment quan el context biològic (p.e. variants amb impacte funcional) està disponible or quan les variants de risc expliquen en total poca variància fenotípica.
International cancer sequencing projects have generated comprehensive catalogs of alterations found in tumor genomes, as well as germline variant data for thousands of individuals. In this thesis, we describe two statistical methods exploiting these rich datasets in order to better understand tumor initiation, tumor progression and the contribution of genetic variants to the lifetime risk of developing cancer. The first method, a Bayesian inference model named cDriver, utilizes multiple signatures of positive selection acting on tumor genomes to predict cancer driver genes. Cancer cell fraction is introduced as a novel signature of positive selection on a cellular level, based on the hypothesis that cells obtaining additional advantageous driver mutations will undergo rapid proliferation and clonal expansion. We benchmarked cDriver against state of the art driver prediction methods on three cancer datasets demonstrating equal or better performance than the best competing tool. The second method, termed REWAS is a comprehensive framework for rare-variant association studies (RVAS) aiming at improving identification of cancer predisposition genes. Nonetheless, REWAS is readily applicable to any case-control study of complex diseases. Besides integrating well-established RVAS methods, we developed a novel Bayesian inference RVAS method (BATI) based on Integrated Nested Laplace Approximation (INLA). We demonstrate that BATI outperforms other methods on realistic simulated datasets, especially when meaningful biological context (e.g. functional impact of variants) is available or when risk variants in sum explain low phenotypic variance. Both methods developed during my thesis have the potential to facilitate personalized medicine and oncology through identification of novel therapeutic targets and identification of genetic predisposition facilitating prevention and early diagnosis of cancer.

APA, Harvard, Vancouver, ISO, and other styles

10

Silva, Sanchez Rosa Elvira. "Contribution au pronostic de durée de vie des systèmes piles à combustible PEMFC." Thesis, Besançon, 2015. http://www.theses.fr/2015BESA2005/document.

Full text

Abstract:

Les travaux de cette thèse visent à apporter des éléments de solutions au problème de la durée de vie des systèmes pile à combustible (FCS – Fuel Cell System) de type à « membrane échangeuse de protons » (PEM – Proton Exchange Membrane) et se décline sur deux champs disciplinaires complémentaires :Une première approche vise à augmenter la durée de vie de celle-ci par la conception et la mise en œuvre d'une architecture de pronostic et de gestion de l'état de santé (PHM – Prognostics & Health Management). Les PEM-FCS, de par leur technologie, sont par essence des systèmes multi-physiques (électriques, fluidiques, électrochimiques, thermiques, mécaniques, etc.) et multi-échelles (de temps et d'espace) dont les comportements sont difficilement appréhendables. La nature non linéaire des phénomènes, le caractère réversible ou non des dégradations, et les interactions entre composants rendent effectivement difficile une étape de modélisation des défaillances. De plus, le manque d'homogénéité (actuel) dans le processus de fabrication rend difficile la caractérisation statistique de leur comportement. Le déploiement d'une solution PHM permettrait en effet d'anticiper et d'éviter les défaillances, d'évaluer l'état de santé, d'estimer le temps de vie résiduel du système, et finalement, d'envisager des actions de maîtrise (contrôle et/ou maintenance) pour assurer la continuité de fonctionnement. Une deuxième approche propose d'avoir recours à une hybridation passive de la PEMFC avec des super-condensateurs (UC – Ultra Capacitor) de façon à faire fonctionner la pile au plus proche de ses conditions opératoires optimales et ainsi, à minimiser l'impact du vieillissement. Les UCs apparaissent comme une source complémentaire à la PEMFC en raison de leur forte densité de puissance, de leur capacité de charge/décharge rapide, de leur réversibilité et de leur grande durée de vie. Si l'on prend l'exemple des véhicules à pile à combustible, l'association entre une PEMFC et des UCs peut être réalisée en utilisant un système hybride de type actif ou passif. Le comportement global du système dépend à la fois du choix de l'architecture et du positionnement de ces éléments en lien avec la charge électrique. Aujourd'hui, les recherches dans ce domaine se focalisent essentiellement sur la gestion d'énergie entre les sources et stockeurs embarqués ; et sur la définition et l'optimisation d'une interface électronique de puissance destinée à conditionner le flux d'énergie entre eux. Cependant, la présence de convertisseurs statiques augmente les sources de défaillances et pannes (défaillance des interrupteurs du convertisseur statique lui-même, impact des oscillations de courant haute fréquence sur le vieillissement de la pile), et augmente également les pertes énergétiques du système complet (même si le rendement du convertisseur statique est élevé, il dégrade néanmoins le bilan global)
This thesis work aims to provide solutions for the limited lifetime of Proton Exchange Membrane Fuel Cell Systems (PEM-FCS) based on two complementary disciplines:A first approach consists in increasing the lifetime of the PEM-FCS by designing and implementing a Prognostics & Health Management (PHM) architecture. The PEM-FCS are essentially multi-physical systems (electrical, fluid, electrochemical, thermal, mechanical, etc.) and multi-scale (time and space), thus its behaviors are hardly understandable. The nonlinear nature of phenomena, the reversibility or not of degradations and the interactions between components makes it quite difficult to have a failure modeling stage. Moreover, the lack of homogeneity (actual) in the manufacturing process makes it difficult for statistical characterization of their behavior. The deployment of a PHM solution would indeed anticipate and avoid failures, assess the state of health, estimate the Remaining Useful Lifetime (RUL) of the system and finally consider control actions (control and/or maintenance) to ensure operation continuity.A second approach proposes to use a passive hybridization of the PEMFC with Ultra Capacitors (UC) to operate the fuel cell closer to its optimum operating conditions and thereby minimize the impact of aging. The UC appear as an additional source to the PEMFC due to their high power density, their capacity to charge/discharge rapidly, their reversibility and their long life. If we take the example of fuel cell hybrid electrical vehicles, the association between a PEMFC and UC can be performed using a hybrid of active or passive type system. The overall behavior of the system depends on both, the choice of the architecture and the positioning of these elements in connection with the electric charge. Today, research in this area focuses mainly on energy management between the sources and embedded storage and the definition and optimization of a power electronic interface designated to adjust the flow of energy between them. However, the presence of power converters increases the source of faults and failures (failure of the switches of the power converter and the impact of high frequency current oscillations on the aging of the PEMFC), and also increases the energy losses of the entire system (even if the performance of the power converter is high, it nevertheless degrades the overall system)

APA, Harvard, Vancouver, ISO, and other styles

11

Salem, Marwan. "Building an Efficient Occupancy Grid Map Based on Lidar Data Fusion for Autonomous driving Applications." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263098.

Full text

Abstract:

The Localization and Map building module is a core building block for designing an autonomous vehicle. It describes the vehicle ability to create an accurate model of its surroundings and maintain its position in the environment at the same time. In this thesis work, we contribute to the autonomous driving research area by providing a proof-of-concept of integrating SLAM solutions into commercial vehicles; improving the robustness of the Localization and Map building module. The proposed system applies Bayesian inference theory within the occupancy grid mapping framework and utilizes Rao-Blackwellized Particle Filter for estimating the vehicle trajectory. The work has been done at Scania CV where a heavy duty vehicle equipped with multiple-Lidar sensory architecture was used. Low level sensor fusion of the different Lidars was performed and a parallelized implementation of the algorithm was achieved using a GPU. When tested on the frequently used datasets in the community, the implemented algorithm outperformed the scan-matching technique and showed acceptable performance in comparison to another state-of-art RBPF implementation that adapts some improvements on the algorithm. The performance of the complete system was evaluated under a designed set of real scenarios. The proposed system showed a significant improvement in terms of the estimated trajectory and provided accurate occupancy representations of the vehicle surroundings. The fusion module was found to build more informative occupancy grids than the grids obtained form individual sensors.
Modulen som har hand om både lokalisering och byggandet av karta är en av huvudorganen i ett system för autonom körning. Den beskriver bilens förmåga att skapa en modell av omgivningen och att hålla en position i förhållande till omgivningen. I detta examensarbete bidrar vi till forskningen inom autonom bilkörning med ett valideringskoncept genom att integrera SLAM-lösningar i kommersiella fordon, vilket förbättrar robustheten hos lokaliserings-kartbyggarmodulen. Det föreslagna systemet använder sig utav Bayesiansk statistik applicerat i ett ramverk som har hand om att skapa en karta, som består av ett rutnät som används för att beskriva ockuperingsgraden. För att estimera den bana som fordonet kommer att färdas använder ramverket RBPF(Rao-Blackwellized particle filter). Examensarbetet har genomförts hos Scania CV, där ett tungt fordon utrustat med flera lidarsensorer har använts. En lägre nivå av sensor fusion applicerades för de olika lidarsensorerna och en parallelliserad implementation av algoritmen implementerades på GPU. När algoritmen kördes mot data som ofta används av ”allmänheten” kan vi konstatera att den implementerade algoritmen ger ett väldigt mycket bättre resultat än ”scan-matchnings”-tekniken och visar på ett acceptabelt resultat i jämförelse med en annan högpresterande RBPFimplementation, vilken tillför några förbättringar på algoritmen. Prestandan av hela systemet utvärderas med ett antal egendesignade realistiska scenarion. Det föreslagna systemet visar på en tydlig förbättring av uppskattningen av körbanan och bidrar även med en exakt representation av omgivningen. Sensor Fusionen visar på en bättre och mer informativ representation än när man endast utgår från de individuella lidarsensorerna.

APA, Harvard, Vancouver, ISO, and other styles

12

"Data Driven Inference in Populations of Agents." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.53476.

Full text

Abstract:

abstract: In the artificial intelligence literature, three forms of reasoning are commonly employed to understand agent behavior: inductive, deductive, and abductive. More recently, data-driven approaches leveraging ideas such as machine learning, data mining, and social network analysis have gained popularity. While data-driven variants of the aforementioned forms of reasoning have been applied separately, there is little work on how data-driven approaches across all three forms relate and lend themselves to practical applications. Given an agent behavior and the percept sequence, how one can identify a specific outcome such as the likeliest explanation? To address real-world problems, it is vital to understand the different types of reasonings which can lead to better data-driven inference. This dissertation has laid the groundwork for studying these relationships and applying them to three real-world problems. In criminal modeling, inductive and deductive reasonings are applied to early prediction of violent criminal gang members. To address this problem the features derived from the co-arrestee social network as well as geographical and temporal features are leveraged. Then, a data-driven variant of geospatial abductive inference is studied in missing person problem to locate the missing person. Finally, induction and abduction reasonings are studied for identifying pathogenic accounts of a cascade in social networks.
Dissertation/Thesis
Doctoral Dissertation Computer Science 2019

APA, Harvard, Vancouver, ISO, and other styles

13

(11210091), Prateek Jaiswal. "Variational Inference for Data-driven Stochastic Programming." Thesis, 2021.

Find full text

Abstract:

Stochastic programs are standard models for decision-making under uncertainty and have been extensively studied in the operations research literature. In general, stochastic programming involves minimizing an expected cost function, where the expectation is with respect to fully specified stochastic models that quantify the aleatoric or `inherent' uncertainty in the decision-making problem. In practice, however, the stochastic models are unknown but can be estimated from data, introducing an additional epistemic uncertainty into the decision-making problem. The Bayesian framework provides a coherent way to quantify the epistemic uncertainty through the posterior distribution by combining prior beliefs of the decision-makers with the observed data. Bayesian methods have been used for data-driven decision-making in various applications such as inventory management, portfolio design, machine learning, optimal scheduling, and staffing, etc.

Bayesian methods are challenging to implement, mainly due to the fact that the posterior is computationally intractable, necessitating the computation of approximate posteriors. Broadly speaking, there are two methods in the literature implementing approximate posterior inference. First are sampling-based methods such as Markov Chain Monte Carlo. Sampling-based methods are theoretically well understood, but they suffer from various issues like high variance, poor scalability to high-dimensional problems, and have complex diagnostics. Consequently, we propose to use optimization-based methods collectively known as variational inference (VI) that use information projections to compute an approximation to the posterior. Empirical studies have shown that VI methods are computationally faster and easily scalable to higher-dimensional problems and large datasets. However, the theoretical guarantees of these methods are not well understood. Moreover, VI methods are empirically and theoretically less explored in the decision-theoretic setting.

In this thesis, we first propose a novel VI framework for risk-sensitive data-driven decision-making, which we call risk-sensitive variational Bayes (RSVB). In RSVB, we jointly compute a risk-sensitive approximation to the `true' posterior and the optimal decision by solving a minimax optimization problem. The RSVB framework includes the naive approach of first computing a VI approximation to the true posterior and then using it in place of the true posterior for decision-making. We show that the RSVB approximate posterior and the corresponding optimal value and decision rules are asymptotically consistent, and we also compute their rate of convergence. We illustrate our theoretical findings in both parametric as well as nonparametric setting with the help of three examples: the single and multi-product newsvendor model and Gaussian process classification. Second, we present the Bayesian joint chance-constrained stochastic program (BJCCP) for modeling decision-making problems with epistemically uncertain constraints. We discover that using VI methods for posterior approximation can ensure the convexity of the feasible set in (BJCCP) unlike any sampling-based methods and thus propose a VI approximation for (BJCCP). We also show that the optimal value computed using the VI approximation of (BJCCP) are statistically consistent. Moreover, we derive the rate of convergence of the optimal value and compute the rate at which a VI approximate solution of (BJCCP) is feasible under the true constraints. We demonstrate the utility of our approach on an optimal staffing problem for an M/M/c queue. Finally, this thesis also contributes to the growing literature in understanding statistical performance of VI methods. In particular, we establish the frequentist consistency of an approximate posterior computed using a well known VI method that computes an approximation to the posterior distribution by minimizing the Renyi divergence from the ‘true’ posterior.

APA, Harvard, Vancouver, ISO, and other styles

14

Hsiao, Yu-Ting, and 蕭羽廷. "Combining Knowledge-Driven and Data-Driven Modeling Approaches in Gene Regulatory Networks Inference." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/34103430677245489852.

Full text

Abstract:

博士
國立中山大學
資訊管理學系研究所
102
In the emergence of post-genomic research, one of the most important themes is to uncover the complex biological mechanisms involved in genetic regulation. The regulatory interactions controlled by cis-regulatory DNA modules provide clues about the development of biological processes. These regulatory links can be represented as network-like architectures, i.e. gene regulatory networks (GRNs), which indicate the causal gene expression relationships between instructional inputs and functional outputs of genes. Modeling GRNs, therefore, is essential for conceptualizing how genes express themselves as well as influence others. Thanks to modern measurement techniques for gene expression, researchers can investigate phenotypic behavior of a living being by reconstructing GRNs from expression data. Typically a reverse engineering approach is employed; it is an effective strategy to reproduce possible fitting models of GRNs. Under this strategy, however, two daunting tasks must be undertaken. One is to optimize the accuracy of inferred network behaviors; the other is to designate valid biological topologies for target networks. Though existing studies have addressed the two tasks for years, few are able to satisfy both requirements simultaneously. To cope with the difficulties, this thesis proposes an integrative modeling framework which consists of three aspects. First, a novel reverse engineering algorithm is developed to tackle the issue of efficiently optimizing network behaviors for GRNs. Second, a proposed sensitivity analysis approach coupling with the optimization algorithm is designed to identify critical regulatory interactions under the situation where biological knowledge is unavailable. Finally, an integrated modeling approach combining knowledge-based and data-driven input sources is constructed to conduct biological topologies with corresponding network behaviors. For each aspect, a series of experiments are performed. The results reveal that the proposed framework can successfully infer solutions that are satisfactory for both requirements of network behaviors and biological structures, and thus the outcomes are exploitable for future in vivo experimental design.

APA, Harvard, Vancouver, ISO, and other styles

15

Langovoy, Mikhail Anatolievich. "Data-driven goodness-of-fit tests." Doctoral thesis, 2007. http://hdl.handle.net/11858/00-1735-0000-0006-B393-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Mariasin, Margalit. "Novice, Generalist, and Expert Reasoning During Clinical Case Explanation: A Propositional Assessment of Knowledge Utilization and Application." Thesis, 2010. http://hdl.handle.net/10012/5522.

Full text

Abstract:

Objectives: The aim of the two exploratory studies presented here, was to investigate expert-novice cognitive performance in the field of dietetic counseling. More specifically, the purpose was to characterize the knowledge used and the cognitive reasoning strategies of expert, intermediate and novice dietitians during their assessment of clinical vignettes of simulated dyslipidemia cases. Background: Since no studies have been conducted on the expert-novice differences in knowledge utilization and reasoning in the field of dietetics, literature from various domains looking at expert-novice decision-making was used to guide the studies presented here. Previous expert-novice research in aspects of health such as counseling and diagnostic reasoning among physicians and nurses has found differences between in the way experts extract and apply knowledge during reasoning. In addition, various studies illustrate an intermediate effect, where generalist performance is somewhat poorer than that of experts and novices. Methods: The verbal protocols of expert (n=4), generalist (n=4), and novice (n=4) dietitians were analyzed, using propositional analysis. Semantic networks were generated, and used to compare reasoning processes to a reference model developed from an existing Dyslipidemia care map by Brauer et al, (2007, 2009). Detailed analysis was conducted on individual networks in an effort to obtain better understanding of cue utilization, concept usage, and overall cohesiveness during reasoning. Results: The results of the first study indicate no statistical differences in reasoning between novices, generalist and experts with regards to recalls and inferences. Interesting findings in the study also suggest that discussions of the terms “dietary fat” and “cholesterol” by individuals in each level of expertise had qualitative differences. This may be reflective of the information provided in the case scenearios to each participating dietitian. Furthermore, contrary to previous studies in expert-novice reasoning, an intermediate effect was not evident. The results of the second study show a statistical difference in data driven (forward) reasoning between experts and novices. There was no statistical difference in hypothesis driven (backward) reasoning between groups. The reasoning networks of experts appear to reveal more concise explanations of important aspects related to dyslipidemia counseling. Reasoning patterns of the expert dietitians appear more coherent, although there was no statistical difference in the length or number of reasoning chains between groups. With previous research focusing on diagnostic reasoning rather than counseling, this finding may be a result of the nature of the underlying task. Conclusion: The studies presented here serve as a basis for future expert-novice research in the field of dietetics. The exploration of individual verbal protocols to identify characteristics of dietitians of various levels of expertise, can provide insight into the way knowledge is used and applied during diet counseling. Subsequent research can focus on randomized sample selection, with case scenarios as a constant, in order to obtain results that can be generalized to the greater dietitian population.

APA, Harvard, Vancouver, ISO, and other styles

17

Lu, Yun-Pei, and 呂芸霈. "A Bayesian Framework to Integrate Data-Driven and Knowledge-Based Inference Systems for Reliable Yield Analyses in Semiconductor Manufacturing." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/61501430745390140085.

Full text

Abstract:

碩士
元智大學
工業工程與管理學系
96
This thesis studies the reliable yield diagnosis from hundreds of suspected yield-loss factors in semiconductor manufacturing. Two problems of data-driven inference approach commonly used for yield diagnosis are identified: False Identification (FI) Due To Confounding Variables and Miss-Identification (MI) Due To One-Factor-At-A-Time Analysis. To cope with the FI and MI problems, a framework of Bayesian model selection is proposed to integrate and reuse both the data-driven and knowledge-based inference systems in industry practices for more reliable yield diagnosis. The Bayesian framework consists of three modules: Pre-Processing, Bayesian Analysis, and Post-Processing. The Pre-Processing module applies successive factor filtering and matching techniques to integrate the inference results from both data-driven and knowledge-based systems for the generation of candidate factors and corresponding beliefs. Two algorithms are then adopted for the Bayesian Analysis in the second module: One-Factor-At-A-Time Bayesian Analysis to solve the FI problem and Multi-Factor-At-A-Time Bayesian Analysis to further solve the MI problem. For reliable factor rankings, a novel Bubble Diagram with Pareto Frontier is proposed in the Post-Processing module, where the size of each bubble(factor) representing the magnitude of posterior probability while the bubbles on Pareto Frontier represents the factors non-dominated by other factors with respect to both p-value and fault possibility generated by data-driven and knowledge-based systems respectively. Two simulation experiments are conducted for evaluating the capability of the proposed Bayesian Framework. The first simulation experiment is to study the capability of One-Factor-At-A-Time Bayesian Algorithm on solving the FI problem with respect to different numbers of dummy factors and qualities of prior knowledge. The second simulation is to study the capability of Multi-Factor-At-A-Time Bayesian Algorithm on solving the MI problem with respect to different numbers of faulty factors, various effects among faulty factors, and different qualities of prior knowledge. Both simulation experiments are evaluated by the metrics derived from the Bubble Diagram with Pareto Frontier. Simulation results show that, without the aid of Bayesian inference, the interpretation of Pareto Frontier alone successfully captures the faulty factors in most cases. With the additional information of bubble size, i.e. the posterior probability derived from Bayesian inference, the proposed Bayesian framework performs much better than the data-driven approach commonly used for yield diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data Driven Inference'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles