Dissertations / Theses on the topic 'Data Driven Inference'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 17 dissertations / theses for your research on the topic 'Data Driven Inference.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Park, June Young. "Data-driven Building Metadata Inference." Research Showcase @ CMU, 2016. http://repository.cmu.edu/theses/127.
Full textSpoon, Steven Alexander. "Demand-Driven Type Inference with Subgoal Pruning." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/7486.
Full textMichelen, Strofer Carlos Alejandro. "Machine Learning and Field Inversion approaches to Data-Driven Turbulence Modeling." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103155.
Full textDoctor of Philosophy
The Reynolds-averaged Navier-Stokes (RANS) equations are widely used to simulate fluid flows in engineering applications despite their known inaccuracy in many flows of practical interest. The uncertainty in the RANS equations is known to stem from the Reynolds stress tensor for which no universally applicable turbulence model exists. The computational cost of more accurate methods for fluid flow simulation, however, means RANS simulations will likely continue to be a major tool in engineering applications and there is still a need for improved RANS turbulence modeling. This dissertation explores two different approaches to use available experimental data to improve RANS predictions by improving the uncertain Reynolds stress tensor field. The first approach is using machine learning to learn a data-driven turbulence model from a set of training data. This model can then be applied to predict new flows in place of traditional turbulence models. To this end, this dissertation presents a novel framework for training deep neural networks using experimental measurements of velocity and pressure. When using velocity and pressure data, gradient-based training of the neural network requires the sensitivity of the RANS equations to the learned Reynolds stress. Two different methods, the continuous adjoint and ensemble approximation, are used to obtain the required sensitivity. The second approach explored in this dissertation is field inversion, whereby available data for a flow of interest is used to infer a Reynolds stress field that leads to improved RANS solutions for that same flow. Here, the field inversion is done via the ensemble Kalman inversion (EKI), a Monte Carlo Bayesian procedure, and the focus is on improving the inference by enforcing known physical constraints on the inferred Reynolds stress field. To this end, a method for enforcing boundary conditions on the inferred field is presented. While further development is needed, the two data-driven approaches explored and improved upon here demonstrate the potential for improved practical RANS predictions.
Marcou, Quentin. "Probabilistic approaches to the adaptive immune repertoire : a data-driven approach." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB029/document.
Full textAn individual’s adaptive immune system needs to face repeated challenges of a constantly evolving environment with a virtually infinite number of threats. To achieve this task, the adaptive immune system relies on large diversity of B-cells and T-cells, each carrying a unique receptor specific to a small number of pathogens. These receptors are initially randomly built through the process of V(D)J recombination. This initial generated diversity is then narrowed down by a step of functional selection based on the receptors' folding properties and their ability to recognize self antigens. Upon recognition of a pathogen the B-cell will divide and its offsprings will undergo several rounds of successive somatic hypermutations and selection in an evolutionary process called affinity maturation. This work presents principled probabilistic approaches to infer the probability distribution underlying the recombination and somatic hypermutation processes from high throughput sequencing data using IGoR - a flexible software developed throughout the course of this PhD. IGoR has been developed as a versatile research tool and can encode a variety of models of different biological complexity to allow researchers in the field to characterize evermore precisely immune receptor repertoires. To motivate this data-driven approach we demonstrate that IGoR outperforms existing tools in accuracy and estimate the sample sizes needed for reliable repertoire characterization. Finally, using obtained model predictions, we show potential applications of these methods by demonstrating that homozygous twins share T-cells through cord blood, that the public core of the T cell repertoire is formed in the pre-natal period and finally estimate naive T cell clone lifetimes in human
Das, Debasish. "Bayesian Sparse Regression with Application to Data-driven Understanding of Climate." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/313587.
Full textPh.D.
Sparse regressions based on constraining the L1-norm of the coefficients became popular due to their ability to handle high dimensional data unlike the regular regressions which suffer from overfitting and model identifiability issues especially when sample size is small. They are often the method of choice in many fields of science and engineering for simultaneously selecting covariates and fitting parsimonious linear models that are better generalizable and easily interpretable. However, significant challenges may be posed by the need to accommodate extremes and other domain constraints such as dynamical relations among variables, spatial and temporal constraints, need to provide uncertainty estimates and feature correlations, among others. We adopted a hierarchical Bayesian version of the sparse regression framework and exploited its inherent flexibility to accommodate the constraints. We applied sparse regression for the feature selection problem of statistical downscaling of the climate variables with particular focus on their extremes. This is important for many impact studies where the climate change information is required at a spatial scale much finer than that provided by the global or regional climate models. Characterizing the dependence of extremes on covariates can help in identification of plausible causal drivers and inform extremes downscaling. We propose a general-purpose sparse Bayesian framework for covariate discovery that accommodates the non-Gaussian distribution of extremes within a hierarchical Bayesian sparse regression model. We obtain posteriors over regression coefficients, which indicate dependence of extremes on the corresponding covariates and provide uncertainty estimates, using a variational Bayes approximation. The method is applied for selecting informative atmospheric covariates at multiple spatial scales as well as indices of large scale circulation and global warming related to frequency of precipitation extremes over continental United States. Our results confirm the dependence relations that may be expected from known precipitation physics and generates novel insights which can inform physical understanding. We plan to extend our model to discover covariates for extreme intensity in future. We further extend our framework to handle the dynamic relationship among the climate variables using a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP). The extended model can achieve simultaneous clustering and discovery of covariates within each cluster. Moreover, the a priori knowledge about association between pairs of data-points is incorporated in the model through must-link constraints on a Markov Random Field (MRF) prior. A scalable and efficient variational Bayes approach is developed to infer posteriors on regression coefficients and cluster variables.
Temple University--Theses
Wu, Jinlong. "Predictive Turbulence Modeling with Bayesian Inference and Physics-Informed Machine Learning." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/85129.
Full textPh. D.
Reynolds-Averaged Navier–Stokes (RANS) simulations are widely used for engineering design and analysis involving turbulent flows. In RANS simulations, the Reynolds stress needs closure models and the existing models have large model-form uncertainties. Therefore, the RANS simulations are known to be unreliable in many flows of engineering relevance, including flows with three-dimensional structures, swirl, pressure gradients, or curvature. This lack of accuracy in complex flows has diminished the utility of RANS simulations as a predictive tool for engineering design, analysis, optimization, and reliability assessments. Recently, data-driven methods have emerged as a promising alternative to develop the model of Reynolds stress for RANS simulations. In this dissertation I explore two physics-informed, data-driven frameworks to improve RANS modeled Reynolds stresses. First, a Bayesian inference framework is proposed to quantify and reduce the model-form uncertainty of RANS modeled Reynolds stress by leveraging online sparse measurement data with empirical prior knowledge. Second, a machine-learning-assisted framework is proposed to utilize offline high fidelity simulation databases. Numerical results show that the data-driven RANS models have better prediction of Reynolds stress and other quantities of interest for several canonical flows. Two metrics are also presented for an a priori assessment of the prediction confidence for the machine-learning-assisted RANS model. The proposed data-driven methods are also applicable to the computational study of other physical systems whose governing equations have some unresolved physics to be modeled.
Koseler, Kaan Tamer. "Realization of Model-Driven Engineering for Big Data: A Baseball Analytics Use Case." Miami University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=miami1524832924255132.
Full textSušak, Hana 1985. "The Hunt of cancer genes : statistical inference of cancer risk and driver genes using next generation sequencuing data." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/668447.
Full textEls distints projectes internacionals de seqüenciació de càncer duts a terme en els últims anys han generat catàlegs complets d’alteracions trobades en els genomes tumorals, així com informació de variants germinals per a milers d'individus. En aquesta tesi descrivim dos mètodes estadístics aprofitant aquestes bases de dades per tal d’entendre millor la iniciació i la progressió dels tumors, i la contribució de variants genètiques al risc de desenvolupar càncer al llarg de la vida. El primer mètode, anomenat cDriver, es basa en un model d’inferència Bayesià que utilitza múltiples senyals de la selecció positiva que ocorre en els genomes tumorals per tal de predir els gens driver del càncer. En aquest mètode, hem inclòs la fracció de cèl·lules tumorals com a nova senyal de la selecció positiva a nivell cel·lular. Aquesta es basa en la hipòtesi que les cèl·lules que adquireixen mutacions ventajoses proliferaran i s’expandiran clonalment més ràpidament. Per avaluar cDriver, aquest es va comparar amb els mètodes més utilitzats per a la predicció de gens driver actuals. L’anàlisi es va dur a terme amb conjunts de dades de tres càncer diferents i els resultats van ser iguals o millors que els obtinguts per les eines més competitives en el tema. El segon mètode, anomenat REWAS, és un marc de treball per l’estudi d’associació de variants rares (RVAS) amb l'objectiu de millorar la identificació dels gens de predisposició al càncer. Tot i això, REWAS es pot aplicar a qualsevol estudi cas-control de malalties complexes. Per una altra part, a més d'integrar mètodes RVAS ben establerts, hem desenvolupat un nou mètode d'inferència Bayesiana RVAS basat en Integrated Nested Laplace Approximation (BATI). També demostrem que BATI mostra millors resultats que altres mètodes en dades simulades amb soroll de fons real, especialment quan el context biològic (p.e. variants amb impacte funcional) està disponible or quan les variants de risc expliquen en total poca variància fenotípica.
Sušak, Hana 1985. "The Hunt of cancer genes : statistical inference of cancer risk and driver genes using next generation sequencing data." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/664504.
Full textInternational cancer sequencing projects have generated comprehensive catalogs of alterations found in tumor genomes, as well as germline variant data for thousands of individuals. In this thesis, we describe two statistical methods exploiting these rich datasets in order to better understand tumor initiation, tumor progression and the contribution of genetic variants to the lifetime risk of developing cancer. The first method, a Bayesian inference model named cDriver, utilizes multiple signatures of positive selection acting on tumor genomes to predict cancer driver genes. Cancer cell fraction is introduced as a novel signature of positive selection on a cellular level, based on the hypothesis that cells obtaining additional advantageous driver mutations will undergo rapid proliferation and clonal expansion. We benchmarked cDriver against state of the art driver prediction methods on three cancer datasets demonstrating equal or better performance than the best competing tool. The second method, termed REWAS is a comprehensive framework for rare-variant association studies (RVAS) aiming at improving identification of cancer predisposition genes. Nonetheless, REWAS is readily applicable to any case-control study of complex diseases. Besides integrating well-established RVAS methods, we developed a novel Bayesian inference RVAS method (BATI) based on Integrated Nested Laplace Approximation (INLA). We demonstrate that BATI outperforms other methods on realistic simulated datasets, especially when meaningful biological context (e.g. functional impact of variants) is available or when risk variants in sum explain low phenotypic variance. Both methods developed during my thesis have the potential to facilitate personalized medicine and oncology through identification of novel therapeutic targets and identification of genetic predisposition facilitating prevention and early diagnosis of cancer.
Silva, Sanchez Rosa Elvira. "Contribution au pronostic de durée de vie des systèmes piles à combustible PEMFC." Thesis, Besançon, 2015. http://www.theses.fr/2015BESA2005/document.
Full textThis thesis work aims to provide solutions for the limited lifetime of Proton Exchange Membrane Fuel Cell Systems (PEM-FCS) based on two complementary disciplines:A first approach consists in increasing the lifetime of the PEM-FCS by designing and implementing a Prognostics & Health Management (PHM) architecture. The PEM-FCS are essentially multi-physical systems (electrical, fluid, electrochemical, thermal, mechanical, etc.) and multi-scale (time and space), thus its behaviors are hardly understandable. The nonlinear nature of phenomena, the reversibility or not of degradations and the interactions between components makes it quite difficult to have a failure modeling stage. Moreover, the lack of homogeneity (actual) in the manufacturing process makes it difficult for statistical characterization of their behavior. The deployment of a PHM solution would indeed anticipate and avoid failures, assess the state of health, estimate the Remaining Useful Lifetime (RUL) of the system and finally consider control actions (control and/or maintenance) to ensure operation continuity.A second approach proposes to use a passive hybridization of the PEMFC with Ultra Capacitors (UC) to operate the fuel cell closer to its optimum operating conditions and thereby minimize the impact of aging. The UC appear as an additional source to the PEMFC due to their high power density, their capacity to charge/discharge rapidly, their reversibility and their long life. If we take the example of fuel cell hybrid electrical vehicles, the association between a PEMFC and UC can be performed using a hybrid of active or passive type system. The overall behavior of the system depends on both, the choice of the architecture and the positioning of these elements in connection with the electric charge. Today, research in this area focuses mainly on energy management between the sources and embedded storage and the definition and optimization of a power electronic interface designated to adjust the flow of energy between them. However, the presence of power converters increases the source of faults and failures (failure of the switches of the power converter and the impact of high frequency current oscillations on the aging of the PEMFC), and also increases the energy losses of the entire system (even if the performance of the power converter is high, it nevertheless degrades the overall system)
Salem, Marwan. "Building an Efficient Occupancy Grid Map Based on Lidar Data Fusion for Autonomous driving Applications." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263098.
Full textModulen som har hand om både lokalisering och byggandet av karta är en av huvudorganen i ett system för autonom körning. Den beskriver bilens förmåga att skapa en modell av omgivningen och att hålla en position i förhållande till omgivningen. I detta examensarbete bidrar vi till forskningen inom autonom bilkörning med ett valideringskoncept genom att integrera SLAM-lösningar i kommersiella fordon, vilket förbättrar robustheten hos lokaliserings-kartbyggarmodulen. Det föreslagna systemet använder sig utav Bayesiansk statistik applicerat i ett ramverk som har hand om att skapa en karta, som består av ett rutnät som används för att beskriva ockuperingsgraden. För att estimera den bana som fordonet kommer att färdas använder ramverket RBPF(Rao-Blackwellized particle filter). Examensarbetet har genomförts hos Scania CV, där ett tungt fordon utrustat med flera lidarsensorer har använts. En lägre nivå av sensor fusion applicerades för de olika lidarsensorerna och en parallelliserad implementation av algoritmen implementerades på GPU. När algoritmen kördes mot data som ofta används av ”allmänheten” kan vi konstatera att den implementerade algoritmen ger ett väldigt mycket bättre resultat än ”scan-matchnings”-tekniken och visar på ett acceptabelt resultat i jämförelse med en annan högpresterande RBPFimplementation, vilken tillför några förbättringar på algoritmen. Prestandan av hela systemet utvärderas med ett antal egendesignade realistiska scenarion. Det föreslagna systemet visar på en tydlig förbättring av uppskattningen av körbanan och bidrar även med en exakt representation av omgivningen. Sensor Fusionen visar på en bättre och mer informativ representation än när man endast utgår från de individuella lidarsensorerna.
"Data Driven Inference in Populations of Agents." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.53476.
Full textDissertation/Thesis
Doctoral Dissertation Computer Science 2019
(11210091), Prateek Jaiswal. "Variational Inference for Data-driven Stochastic Programming." Thesis, 2021.
Find full textHsiao, Yu-Ting, and 蕭羽廷. "Combining Knowledge-Driven and Data-Driven Modeling Approaches in Gene Regulatory Networks Inference." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/34103430677245489852.
Full text國立中山大學
資訊管理學系研究所
102
In the emergence of post-genomic research, one of the most important themes is to uncover the complex biological mechanisms involved in genetic regulation. The regulatory interactions controlled by cis-regulatory DNA modules provide clues about the development of biological processes. These regulatory links can be represented as network-like architectures, i.e. gene regulatory networks (GRNs), which indicate the causal gene expression relationships between instructional inputs and functional outputs of genes. Modeling GRNs, therefore, is essential for conceptualizing how genes express themselves as well as influence others. Thanks to modern measurement techniques for gene expression, researchers can investigate phenotypic behavior of a living being by reconstructing GRNs from expression data. Typically a reverse engineering approach is employed; it is an effective strategy to reproduce possible fitting models of GRNs. Under this strategy, however, two daunting tasks must be undertaken. One is to optimize the accuracy of inferred network behaviors; the other is to designate valid biological topologies for target networks. Though existing studies have addressed the two tasks for years, few are able to satisfy both requirements simultaneously. To cope with the difficulties, this thesis proposes an integrative modeling framework which consists of three aspects. First, a novel reverse engineering algorithm is developed to tackle the issue of efficiently optimizing network behaviors for GRNs. Second, a proposed sensitivity analysis approach coupling with the optimization algorithm is designed to identify critical regulatory interactions under the situation where biological knowledge is unavailable. Finally, an integrated modeling approach combining knowledge-based and data-driven input sources is constructed to conduct biological topologies with corresponding network behaviors. For each aspect, a series of experiments are performed. The results reveal that the proposed framework can successfully infer solutions that are satisfactory for both requirements of network behaviors and biological structures, and thus the outcomes are exploitable for future in vivo experimental design.
Langovoy, Mikhail Anatolievich. "Data-driven goodness-of-fit tests." Doctoral thesis, 2007. http://hdl.handle.net/11858/00-1735-0000-0006-B393-4.
Full textMariasin, Margalit. "Novice, Generalist, and Expert Reasoning During Clinical Case Explanation: A Propositional Assessment of Knowledge Utilization and Application." Thesis, 2010. http://hdl.handle.net/10012/5522.
Full textLu, Yun-Pei, and 呂芸霈. "A Bayesian Framework to Integrate Data-Driven and Knowledge-Based Inference Systems for Reliable Yield Analyses in Semiconductor Manufacturing." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/61501430745390140085.
Full text元智大學
工業工程與管理學系
96
This thesis studies the reliable yield diagnosis from hundreds of suspected yield-loss factors in semiconductor manufacturing. Two problems of data-driven inference approach commonly used for yield diagnosis are identified: False Identification (FI) Due To Confounding Variables and Miss-Identification (MI) Due To One-Factor-At-A-Time Analysis. To cope with the FI and MI problems, a framework of Bayesian model selection is proposed to integrate and reuse both the data-driven and knowledge-based inference systems in industry practices for more reliable yield diagnosis. The Bayesian framework consists of three modules: Pre-Processing, Bayesian Analysis, and Post-Processing. The Pre-Processing module applies successive factor filtering and matching techniques to integrate the inference results from both data-driven and knowledge-based systems for the generation of candidate factors and corresponding beliefs. Two algorithms are then adopted for the Bayesian Analysis in the second module: One-Factor-At-A-Time Bayesian Analysis to solve the FI problem and Multi-Factor-At-A-Time Bayesian Analysis to further solve the MI problem. For reliable factor rankings, a novel Bubble Diagram with Pareto Frontier is proposed in the Post-Processing module, where the size of each bubble(factor) representing the magnitude of posterior probability while the bubbles on Pareto Frontier represents the factors non-dominated by other factors with respect to both p-value and fault possibility generated by data-driven and knowledge-based systems respectively. Two simulation experiments are conducted for evaluating the capability of the proposed Bayesian Framework. The first simulation experiment is to study the capability of One-Factor-At-A-Time Bayesian Algorithm on solving the FI problem with respect to different numbers of dummy factors and qualities of prior knowledge. The second simulation is to study the capability of Multi-Factor-At-A-Time Bayesian Algorithm on solving the MI problem with respect to different numbers of faulty factors, various effects among faulty factors, and different qualities of prior knowledge. Both simulation experiments are evaluated by the metrics derived from the Bubble Diagram with Pareto Frontier. Simulation results show that, without the aid of Bayesian inference, the interpretation of Pareto Frontier alone successfully captures the faulty factors in most cases. With the additional information of bubble size, i.e. the posterior probability derived from Bayesian inference, the proposed Bayesian framework performs much better than the data-driven approach commonly used for yield diagnosis.