Дисертації з теми "Statistical association"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Statistical association.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Statistical association".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

ZHANG, GE. "STATISTICAL METHODS IN GENETIC ASSOCIATION." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1196099744.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Perry, Martin Andrew. "Statistical linkage analysis and association studies." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ57208.pdf.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Kazeem, Gbenga Rahman. "Statistical analysis of genetic-association studies." Thesis, University of Oxford, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.426396.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Mastrodomenico, Robert. "Statistical analysis of genetic association studies." Thesis, University of Reading, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.515692.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Alshahrani, Mohammed Nasser D. "Statistical methods for rare variant association." Thesis, University of Leeds, 2018. http://etheses.whiterose.ac.uk/22436/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deoxyribonucleic acid (DNA) sequencing allows researchers to conduct more complete assessments of low-frequency and rare genetic variants. In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating associations between complex traits and rare variants (RVs). In contrast to association studies of common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests that analyze RVs, most of which are based on the idea of pooling/collapsing RVs. Genome-wide association studies (GWAS) based on common SNPs gained more attention in the last few years and have been regularly used to examine complex genetic compositions of diseases and quantitative traits. GWASs have not discovered everything associated with diseases and genetic variations. However, recent empirical evidence has demonstrated that low-frequency and rare variants are, in fact, connected to complex diseases. This thesis will focus on the study of rare variant association. Aggregation tests, where multiple rare variants are analyzed jointly, have incorporated weighting schemes on variants. However, their power is very much dependent on the weighting scheme. I will address three topics in this thesis: the definition of rare variants and their call file (VCF) and a description of the methods that have been used in rare variant analysis. Finally, I will illustrate challenges involved in the analysis of rare variants and propose different weighting schemes for them. Therefore, since the efficiency of rare variant studies might be considerably improved by the application of an appropriate weighting scheme, choosing the proper weighting scheme is the topic of the thesis. In the following chapters, I will propose different weighting schemes, where weights are applied at the level of the variant, the individual or the cell (i.e. the individual genotype call), as well as a weighting scheme that can incorporate quality measures for variants (i.e., a quality score for variant calls) and cells (i.e., genotype quality).
6

Dai, Xiaotian. "Novel Statistical Models for Quantitative Shape-Gene Association Selection." DigitalCommons@USU, 2017. https://digitalcommons.usu.edu/etd/6856.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Other research reported that genetic mechanism plays a major role in the development process of biological shapes. The primary goal of this dissertation is to develop novel statistical models to investigate the quantitative relationships between biological shapes and genetic variants. However, these problems can be extremely challenging to traditional statistical models for a number of reasons: 1) the biological phenotypes cannot be effectively represented by single-valued traits, while traditional regression only handles one dependent variable; 2) in real-life genetic data, the number of candidate genes to be investigated is extremely large, and the signal-to-noise ratio of candidate genes is expected to be very high. In order to address these challenges, we propose three statistical models to handle multivariate, functional, and multilevel functional phenotypes, with applications to biological shape data using different shape descriptors. To the best of our knowledge, there is no statistical model developed for multilevel functional phenotypes. Even though multivariate regressions have been well-explored and these approaches can be applied to genetic studies, we show that the model proposed in this dissertation can outperform other alternatives regarding variable selection and prediction through simulation examples and real data examples. Although motivated ultimately by genetic research, the proposed models can be used as general-purpose machine learning algorithms with far-reaching applications.
7

Huang, Bevan Emma Lin Danyu. "Statistical aspects of haplotype-based association studies." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2007. http://dc.lib.unc.edu/u?/etd,1237.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2007.
Title from electronic title page (viewed Mar. 26, 2008). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics, School of Public Health." Discipline: Biostatistics; Department/School: Public Health.
8

Teo, Yik Ying. "Statistical challenges arising in genomewide association studies." Thesis, University of Oxford, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.436942.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Koh, Hyunwook. "Adaptive Statistical Methods for Microbiome Association Studies." Thesis, New York University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10750033.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

The human microbiome studies have been accelerated by the advances in next-generation sequencing technologies. There has also been increasing interest in discovering microbial taxa that are associated with diverse host phenotypes, environmental factors or clinical interventions. Here, I first describe unique features of microbiome data and the resulting demand for adaptive association analysis which robustly suits different association patterns, while providing valid statistical inferences. Then, I introduce two adaptive microbiome association tests as follows.

My first method, namely, optimal microbiome-based association test (OMiAT), relates microbial composition with continuous (e.g., body mass index) or binary (e.g., disease status) traits. OMiAT is a data-driven adaptive testing method which approximates to the most powerful performance among different candidate tests from the sum of powered score tests (SPU) and microbiome regression-based kernel association test (MiRKAT). I illustrate that OMiAT robustly discovers underlying association signals arising from highly imbalanced microbial abundances and phylogenetic tree structure, while correctly controlling type I error rates. I also propose a way to apply it to fine association mapping of diverse higher-level taxa at different taxonomic levels within a newly introduced microbial taxa discovery framework, microbiome comprehensive association mapping (MiCAM).

My second method, namely, optimal microbiome-based survival analysis (OMiSA), relates microbial composition with survival (i.e., time to event) traits. OMiSA approximates to the most powerful association test within two test domains, 1) microbiome-based survival analysis using linear and non-linear bases of OTUs (MiSALN) and 2) microbiome-based kernel association test for survival traits (MiRKAT-S). I illustrate that OMiSA powerfully discovers underlying associated lineages whether they are rare or abundant and phylogenetically related or not, while correctly controlling type I error rates.

OMiAT and OMiSA are attractive in practice due to the high complexity of microbiome data and the unknown true nature of the state. MiCAM also provides a hierarchical microbiome association map through a breadth of taxonomic levels, which can be used as a guideline for further investigation on the roles of discovered taxa in human health or disease.

10

Liley, Albert James. "Statistical co-analysis of high-dimensional association studies." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/270628.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Modern medical practice and science involve complex phenotypic definitions. Understanding patterns of association across this range of phenotypes requires co-analysis of high-dimensional association studies in order to characterise shared and distinct elements. In this thesis I address several problems in this area, with a general linking aim of making more efficient use of available data. The main application of these methods is in the analysis of genome-wide association studies (GWAS) and similar studies. Firstly, I developed methodology for a Bayesian conditional false discovery rate (cFDR) for levering GWAS results using summary statistics from a related disease. I extended an existing method to enable a shared control design, increasing power and applicability, and developed an approximate bound on false-discovery rate (FDR) for the procedure. Using the new method I identified several new variant-disease associations. I then developed a second application of shared control design in the context of study replication, enabling improvement in power at the cost of changing the spectrum of sensitivity to systematic errors in study cohorts. This has application in studies on rare diseases or in between-case analyses. I then developed a method for partially characterising heterogeneity within a disease by modelling the bivariate distribution of case-control and within-case effect sizes. Using an adaptation of a likelihood-ratio test, this allows an assessment to be made of whether disease heterogeneity corresponds to differences in disease pathology. I applied this method to a range of simulated and real datasets, enabling insight into the cause of heterogeneity in autoantibody positivity in type 1 diabetes (T1D). Finally, I investigated the relation of subtypes of juvenile idiopathic arthritis (JIA) to adult diseases, using modified genetic risk scores and linear discriminants in a penalised regression framework. The contribution of this thesis is in a range of methodological developments in the analysis of high-dimensional association study comparison. Methods such as these will have wide application in the analysis of GWAS and similar areas, particularly in the development of stratified medicine.
11

Yung, Godwin Yuen Han. "Statistical methods for analyzing genetic sequencing association studies." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493313.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Case-control genetic sequencing studies are increasingly being conducted to identify rare variants associated with complex diseases. Oftentimes, these studies collect a variety of secondary traits--quantitative and qualitative traits besides the case-control disease status. Reusing the data and studying the association between rare variants and secondary phenotypes provide an attractive and cost effective approach that can lead to discovery of new genetic associations. In Chapter 1, we carry out an extensive investigation of the validity of ad hoc methods, which are simple, computationally efficient methods frequently applied in practice to study the association between secondary phenotypes and single common genetic variants. Though other researchers have investigated the same problem, we make two key contributions to existing literature. First, we show that in taking an ad hoc approach, it may be desirable to adjust for covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype in the population. Second, we show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a non-logistic model such as the probit model. Spurious associations can be avoided by including interaction terms in the fitted regression model. Our results are justified theoretically and via simulations, and illustrated by a genome-wide association study of smoking using a lung cancer case-control study. In Chapter 2, we consider the problem of testing associations between secondary phenotypes and sets of rare genetic variants. We show that popular region-based methods such as the burden test and the sequence kernel association test (SKAT) can only be applied under the same conditions as those applicable to ad hoc methods (Chapter 1). For a more robust alternative, we propose an inverse-probability-weighted version of the optimal SKAT (SKAT-O) to account for unequal sampling of cases and controls. As an extension of SKAT-O, our approach is data adaptive and includes the weighted burden test and weighted SKAT as special cases. In addition to weighting individuals to account for the biased sampling, we can also consider weighting the variants in SKAT-O. Decreasing the weight of non-causal variants and increasing the weight of causal variants can improve power. However, since researchers do not know which variants are actually causal, it is common practice to weight genetic variants as a function of their minor allele frequencies. This is motivated by the belief that rarer variants are more likely to have larger effects. In Chapter 3, we propose a new unsupervised statistical framework for predicting the functional status of genetic variants. Compared to existing methods, the proposed algorithm integrates a diverse set of annotations---which are partitioned beforehand into multiple groups by the user---and predicts the functional status for each group, taking into account within- and between-group correlations. We demonstrate the advantages of the algorithm through application to real annotation data and conclude with future directions.
Biostatistics
12

Zang, Yong, and 臧勇. "Robust tests under genetic model uncertainty in case-control association studies." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B46419123.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Su, Zhan. "Statistical methods for the analysis of genetic association studies." Thesis, University of Oxford, 2008. http://ora.ox.ac.uk/objects/uuid:98614f8b-63fe-4fa1-9a24-422216ad14cf.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
One of the main biological goals of recent years is to determine the genes in the human genome that cause disease. Recent technological advances have realised genome-wide association studies, which have uncovered numerous genetic regions implicated with human diseases. The current approach to analysing data from these studies is based on testing association at single SNPs but this is widely accepted as underpowered to detect rare and poorly tagged variants. In this thesis we propose several novel approaches to analysing large-scale association data, which aim to improve upon the power offered by traditional approaches. We combine an established imputation framework with a sophisticated disease model that allows for multiple disease causing mutations at a single locus. To evaluate our methods, we have developed a fast and realistic method to simulate association data conditional on population genetic data. The simulation results show that our methods remain powerful even if the causal variant is not well tagged, there are haplotypic effects or there is allelic heterogeneity. Our methods are further validated by the analysis of the recent WTCCC genome-wide association data, where we have detected confirmed disease loci, known regions of allelic heterogeneity and new signals of association. One of our methods also has the facility to identify the high risk haplotype backgrounds that harbour the disease alleles, and therefore can be used for fine-mapping. We believe that the incorporation of our methods into future association studies will help progress the understanding genetic diseases.
14

Li, Yinglei. "Genetic Association Testing of Copy Number Variation." UKnowledge, 2014. http://uknowledge.uky.edu/statistics_etds/8.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Copy-number variation (CNV) has been implicated in many complex diseases. It is of great interest to detect and locate such regions through genetic association testings. However, the association testings are complicated by the fact that CNVs usually span multiple markers and thus such markers are correlated to each other. To overcome the difficulty, it is desirable to pool information across the markers. In this thesis, we propose a kernel-based method for aggregation of marker-level tests, in which first we obtain a bunch of p-values through association tests for every marker and then the association test involving CNV is based on the statistic of p-values combinations. In addition, we explore several aspects of its implementation. Since p-values among markers are correlated, it is complicated to obtain the null distribution of test statistics for kernel-base aggregation of marker-level tests. To solve the problem, we develop two proper methods that are both demonstrated to preserve the family-wise error rate of the test procedure. They are permutation based and correlation base approaches. Many implementation aspects of kernel-based method are compared through the empirical power studies in a number of simulations constructed from real data involving a pharmacogenomic study of gemcitabine. In addition, more performance comparisons are shown between permutation-based and correlation-based approach. We also apply those two approaches to the real data. The main contribution of the dissertation is the development of marker-level association testing, a comparable and powerful approach to detect phenotype-associated CNVs. Furthermore, the approach is extended to high dimension setting with high efficiency.
15

Parisi, Rosa. "Multi-locus statistical analysis of genome-wide association studies." Thesis, University of Leeds, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.535123.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Ferreira, Teresa. "Statistical methods for modelling epistasis in genetic association studies." Thesis, University of Oxford, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.543476.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Yi, Wan Kitty Yuen. "Statistical methods for the analysis of genetic association studies." Thesis, University of Kent, 2011. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544040.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Adhikari, Kaustubh. "Statistical Methodology for Sequence Analysis." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10178.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Rare disease variants are receiving increasing importance in the past few years as the potential cause for many complex diseases, after the common disease variants failed to explain a large part of the missing heritability. With the advancement in sequencing techniques as well as computational capabilities, statistical methodology for analyzing rare variants is now a hot topic, especially in case-control association studies. In this thesis, we initially present two related statistical methodologies designed for case-control studies to predict the number of common and rare variants in a particular genomic region underlying the complex disease. Genome-wide association studies are nowadays routinely performed to identify a few putative marker loci or a candidate region for further analysis. These methods are designed to work with SNP data on such a genomic region highlighted by GWAS studies for potential disease variants. The fundamental idea is to use Bayesian methodology to obtain bivariate posterior distributions on counts of common and rare variants. While the first method uses randomly generated (minimal) ancestral recombination graphs, the second method uses ensemble clustering method to explore the space of genealogical trees that represent the inherent structure in the test subjects. In contrast to the aforesaid methods which work with SNP data, the third chapter deals with next-generation sequencing data to detect the presence of rare variants in a genomic region. We present a non-parametric statistical methodology for rare variant association testing, using the well-known Kolmogorov-Smirnov framework adapted for genetic data. it is a fast, model-free robust statistic, designed for situations where both deleterious and protective variants are present. It is also unique in utilizing the variant locations in the test statistic.
19

Halle, Kari Krizak. "Statistical Methods for Multiple Testing in Genome-Wide Association Studies." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for matematiske fag, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-18503.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In Genome-Wide Association Studies (GWAS) the aim is to look for associationbetween genetic markers and phenotype (disease). For each genetic marker weperform an hypothesis test. Since the number of markers is high (in the order of hundred thousands), we use multiple hypothesis tests. One popular strategy in multippel testing is to estimate an effective number of independent tests, and then use methods based on independent tests to control the total type I error. The focus of this thesis has been to study different methods for estimating the effective number of independent tests. The methods are applied to a large data set on bipolar disorder and schizophrenia in Norwegian individuals from the TOP study at the University of Oslo and Oslo University Hospital (OUS). A key featureof these methods is the correlation between the genetic markers. The methodsconsidered in this thesis are based on either haplotype or genotype correlation andone focus of this thesis has been to study the difference between haplotype andgenotype correlation.
20

Valcarcel, Salamanca Beatriz. "Statistical association networks as complex phenotypes : new methods and applications." Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/18678.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent advances in ’omics technologies and the development of new computational techniques have greatly contributed to the identification of factors influencing the onset and progression of many common diseases. Yet, despite this great success, it is unlikely that the independent analysis of these data will elucidate the complex web of mechanisms involved in disease development. To enhance our knowledge of disease aetiology, new approaches for linking the large amount of available data need to be developed. As a step towards this goal, the aim of this thesis is to investigate and develop novel statistical methods for the integrative analysis of ’omic data. In particular, in this project we analyse genomic and metabolomic data in relation to the health outcomes in three study populations. To investigate how genetic and metabolic variables act as risk factors in the development of complex disorders, we have developed three novel analytical methodologies, namely ’Differential Network’, ’GEMINi: GEnome Metabolome Integrated Network analysis’ and ’Variance and Covariance regression’ and illustrate their use on real data sets. The results demonstrate the applicability of the new methodologies to identify key molecular changes undetectable with standard approaches. The approaches introduce here have the potential of providing insight into the biological basis of phenotypic variation and aid the generation of new hypotheses about molecular control and regulation in the context of systems biology.
21

Salem, Rany Mansour. "Statistical methods for genetic association analysis involving complex longitudinal data." Diss., [La Jolla] : [San Diego] : University of California, San Diego ; San Diego State University, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p3366492.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis (Ph. D.)--University of California, San Diego and San Diego State University, 2009.
Title from first page of PDF file (viewed Aug. 14, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references.
22

Michailidou, Kyriaki. "Statistical analyses of genome-wide association studies in breast cancer." Thesis, University of Cambridge, 2015. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.708642.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Siracusa, Michael Richard 1980. "Statistical modeling and analysis of audio-visual association in speech." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/30182.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2005.
Includes bibliographical references (p. 183-186).
Currently, most dialog systems are restricted to single user environments. This thesis aims to promote an un-tethered multi-person dialog system by exploring approaches to help solve the speech correspondence problem (i.e. who, if anyone, is currently speaking). We adopt a statistical framework in which this problem is put in the form of a hypothesis test and focus on the subtask of discriminating between associated and non-associated audio-visual observations. Various methods for modeling our audio-visual observations and ways of carrying out this test are studied and their relative performance is compared. We discuss issues that arise from the inherently high dimensional nature of audio-visual data and address these issues by exploring different techniques for finding low-dimensional informative subspaces in which we can perform our hypothesis tests. We study our ability to learn a person-specific as well as a generic model for measuring audio-visual association and evaluate performance oil multiple subjects taken from MIT's AVTIMIT database.
by Michael Richard Siracusa.
S.M.
24

Antonyuk, Alexander. "Statistical methodology for QTL mapping and genome-wide association studies." Thesis, University of Oxford, 2009. https://ora.ox.ac.uk/objects/uuid:23393c76-b7ef-44c2-a06f-3b23e3a6d936.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This work deals with statistical tests of association between genetic markers and disease phenotypes. The main criterion used for comparing the tests is statistical power. First we consider animal models and then data from association studies of humans. For the animal section, we analyse a dataset from a prominent mouse experiment which developed a heterogeneous stock of mice via multiple crosses. This stock is characterised by small distances between recombinants which allows fine mapping of genetic loci, but also by uncertainty in haplotypes. We start by highlighting the disadvantages of the currently used approach to deal with this uncertainty and suggest a method that has greater statistical power and is computationally efficient. The method applies the EM algorithm to the broad class of exponential family distributions of phenotypes. We also develop a Bayesian version of the method, for which we extend the widely used IRLS algorithm to maximisation of the weighted posterior. Then we move on to genome-wide association studies (GWAS), where two situations are considered: known and unknown minor allele frequency. First we develop an innovative Bayesian model with the optimal prior for the known population MAF. We demonstrate that not only it is more powerful than any frequentist test considered (the size of the advantage depends on prevalence of the disease and MAF), but also that the frequentist tests change ranking in terms of power. A remarkable property of the frequentist tests, the advantage of discarding part of the data to gain power, is highlighted. The second chapter on GWAS considers the currently more common situation of the unknown MAF, when the Armitage test is known to be the most powerful frequentist method. We show that the suggested model is more powerful in the broad selection of settings considered, including the three different allele effect models: additive, dominant and recessive. For both known and unknown MAF cases we point out that the parameters are constrained and demonstrate how to gain power by taking this constraint into account.
25

Stanislas, Virginie. "Statistical approaches to detect epistasis in genome wide association studies." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLE040/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
De nombreux travaux de recherche portent sur la détection et l’étude des interactions dans les études d’association pangénomique (GWAS). La plupart des méthodes proposées se concentrent principalement sur les interactions entre polymorphismes simples de l’ADN (SNPs), mais des stratégies de regroupement peuvent également être envisagées.Dans cette thèse, nous développons une approche originale pour la détection des interactions à l’échelle des gènes. De nouvelles variables représentant les interactions entre deux gènes sont définies à l’aide de méthodes de réduction de dimension. Ainsi, toutes les informations apportées par les marqueurs génétiques sont résumées au niveau du gène. Ces nouvelles variables d’interaction sont ensuite introduites dans un modèle de régression. La sélection des effets significatifs est réalisée à l’aide d’une méthode de régression pénalisée basée sur le Group LASSO avec contrôle du taux de fausse découvertes.Nous comparons les différentes méthodes de modélisation des variables d’interaction à travers des études de simulations afin de montrer les bonnes performances de notre approche. Enfin, nous illustrons son utilisation pratique pour identifier des interactions entre gènes en analysant deux jeux de données réelles
A large amount of research has been devoted to the detection and investigation of epistatic interactions in Genome-Wide Association Studies (GWAS). Most of the literature focuses on interactions between single-nucleotide polymorphisms (SNPs), but grouping strategies can also be considered.In this thesis, we develop an original approach for the detection of interactions at the gene level. New variables representing the interactions between two genes are defined using dimensionality reduction methods. Thus, all information brought from genetic markers is summarized at the gene level. These new interaction variables are then introduced into a regression model. The selection of significant effects is done using a penalized regression method based on Group LASSO controlling the False Discovery Rate.We compare the different methods of modeling interaction variables through simulations in order to show the good performance of our proposed approach. Finally, we illustrate its practical use for identifying gene-gene interactions by analyzing two real data sets
26

Petersen, Ann-Kristin. "Statistical incorporation of metabolites in the genome-wide association study approach." Diss., Ludwig-Maximilians-Universität München, 2013. http://nbn-resolving.de/urn:nbn:de:bvb:19-161680.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Lee, Yiu-fai, and 李耀暉. "Analysis for segmental sharing and linkage disequilibrium: a genomewide association study on myopia." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43912217.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
28

He, Feng, and 贺峰. "Detection of parent-of-origin effects and association in relation to aquantitative trait." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B44921408.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Huang, Yungui. "Association statistics under the PPL framework." Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/985.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this dissertation, the posterior probability of linkage (PPL) framework is extended to the analysis of case-control (CC) data and three new linkage disequilibrium (LD) statistics are introduced. These statistics measure the evidence for or against LD, rather than testing the null hypothesis of no LD, and they therefore avoid the need for multiple testing corrections. They are suitable not only for CC designs but also can be used in application to family data, ranging from trios to complex pedigrees, all under the same statistical framework, allowing for the unified analysis of these disparate data structures. They also provide the other core advantages of the PPL framework, including the use of sequential updating to accumulate LD evidence across potentially heterogeneous sets of subsets of data; parameterization in terms of a very general trait likelihood, which simultaneously considers dominant, recessive, and additive models; and a straightforward mechanism for modeling two-locus epistasis. Finally, being implemented within the PPL framework, the new statistics readily allow linkage information obtained from distinct data, to be incorporated into LD analyses in the form of a prior probability distribution. Performance of the proposed LD statistics is examined using simulated data. In addition, the effects of key modeling violations on performance are assessed. These statistics are also applied to a previously published type 1 diabetes (T1D) family dataset with a few candidate genes with previously reported weak associations, and another T1D CC dataset also previously published as a genome-wide association (GWA) study with some strong associations reported. The new LD statistics under the PPLD framework confirm most of the findings in the published work and also find some new SNPs suspected of being associated with T1D. Sequential updating between the family dataset and the CC dataset dramatically increased the association signal strength for a CTLA4 SNP genotyped in both studies. Linkage information gleaned from the family dataset is also combined into the LD analysis of the CC dataset to demonstrate the utility of this unique feature of the PPL framework, and specifically for the new LD statistics.
30

Qiao, Dandi. "Statistical Approaches for Next-Generation Sequencing Data." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10689.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
During the last two decades, genotyping technology has advanced rapidly, which enabled the tremendous success of genome-wide association studies (GWAS) in the search of disease susceptibility loci (DSLs). However, only a small fraction of the overall predicted heritability can be explained by the DSLs discovered. One possible explanation for this ”missing heritability” phenomenon is that many causal variants are rare. The recent development of high-throughput next-generation sequencing (NGS) technology provides the instrument to look closely at these rare variants with precision and efficiency. However, new approaches for both the storage and analysis of sequencing data are in imminent needs. In this thesis, we introduce three methods that could be utilized in the management and analysis of sequencing data. In Chapter 1, we propose a novel and simple algorithm for compressing sequencing data that leverages on the scarcity of rare variant data, which enables the storage and analysis of sequencing data efficiently in current hardware environment. We also provide a C++ implementation that supports direct and parallel loading of the compressed format without requiring extra time for decompression. Chapter 2 and 3 focus on the association analysis of sequencing data in population-based design. In Chapter 2, we present a statistical methodology that allows the identification of genetic outliers to obtain a genetically homogeneous subpopulation, which reduces the false positives due to population substructure. Our approach is computationally efficient that can be applied to all the genetic loci in the data and does not require pruning of variants in linkage disequilibrium (LD). In Chapter 3, we propose a general analysis framework in which thousands of genetic loci can be tested simultaneously for association with complex phenotypes. The approach is built on spatial-clustering methodology, assuming that genetic loci that are associated with the target phenotype cluster in certain genomic regions. In contrast to standard methodology for multi-loci analysis, which has focused on the dimension reduction of data, the proposed approach profits from the availability of large numbers of genetic loci. Thus it will be especially relevant for whole-genome sequencing studies which commonly record several thousand loci per gene.
31

Lundell, Jill F. "Tuning Hyperparameters in Supervised Learning Models and Applications of Statistical Learning in Genome-Wide Association Studies with Emphasis on Heritability." DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7594.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an R software package called EZtune that can be used to automatically tune three widely used machine learning algorithms: support vector machines, gradient boosting machines, and adaboost. The second portion of this dissertation investigates the implementation of machine learning methods in finding locations along a genome that are associated with a trait. The performance of methods that have been commonly used for these types of studies, and some that have not been commonly used, are assessed using simulated data. The affect of the strength of the relationship between the genetic code and the trait is of particular interest. It was found that the strength of this relationship was the most important characteristic in the efficacy of each method.
32

Zhao, Jinghua. "Statistical power analysis and related issues in human genetic linkage and association." Thesis, King's College London (University of London), 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.405641.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Guan, Ting. "Novel Statistical Methods for Multiple-variant Genetic Association Studies with Related Individuals." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/96243.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Genetic association studies usually include related individuals. Meanwhile, high-throughput sequencing technologies produce data of multiple genetic variants. Due to linkage disequilibrium (LD) and familial relatedness, the genotype data from such studies often carries complex correlations. Moreover, missing values in genotype usually lead to loss of power in genetic association tests. Also, repeated measurements of phenotype and dynamic covariates from longitudinal studies bring in more opportunities but also challenges in the discovery of disease-related genetic factors. This dissertation focuses on developing novel statistical methods to address some challenging questions remaining in genetic association studies due to the aforementioned reasons. So far, a lot of methods have been proposed to detect disease-related genetic regions (e.g., genes, pathways). However, with multiple-variant data from a sample with relatedness, it is critical to account for the complex genotypic correlations when assessing genetic contribution. Recognizing the limitations of existing methods, in the first work of this dissertation, the Adaptive-weight Burden Test (ABT) --- a score test between a quantitative trait and the genotype data with complex correlations --- is proposed. ABT achieves higher power by adopting data-driven weights, which make good use of the LD and relatedness. Because the null distribution has been successfully derived, the computational simplicity of ABT makes it a good fit for genome-wide association studies. Genotype missingness commonly arises due to limitations in genotyping technologies. Imputation of the missing values in genotype usually improves quality of the data used in the subsequent association test and thus increases power. Complex correlations, though troublesome, provide the opportunity to proper handling of genotypic missingness. In the second part of this dissertation, a genotype imputation method is developed, which can impute the missingness in multiple genetic variants via the LD and the relatedness. The popularity of longitudinal studies in genetics and genomics calls for methods deliberately designed for repeated measurements. Therefore, a multiple-variant genetic association test for a longitudinal trait on samples with relatedness is developed, which treats the longitudinal measurements as observations of functions and thus takes into account the time factor properly.
PHD
34

Karns, Rebekah A. B. S. "Integrative and Multivariate Statistical Approaches to Assessing Phenotypic and Genotypic Determinants of Complex Disease." University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1335554184.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
35

Shringarpure, Suyash. "Statistical Methods for studying Genetic Variation in Populations." Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/117.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The study of genetic variation in populations is of great interest for the study of the evolutionary history of humans and other species. Improvement in sequencing technology has resulted in the availability of many large datasets of genetic data. Computational methods have therefore become quite important in analyzing these data. Two important problems that have been studied using genetic data are population stratification (modeling individual ancestry with respect to ancestral populations) and genetic association (finding genetic polymorphisms that affect a trait). In this thesis, we develop methods to improve our understanding of these two problems. For the population stratification problem, we develop hierarchical Bayesian models that incorporate the evolutionary processes that are known to affect genetic variation. By developing mStruct, we show that modeling more evolutionary processes improves the accuracy of the recovered population structure. We demonstrate how nonparametric Bayesian processes can be used to address the question of choosing the optimal number of ancestral populations that describe the genetic diversity of a given sample of individuals. We also examine how sampling bias in genotyping study design can affect results of population structure analysis and propose a probabilistic framework for modeling and correcting sample selection bias. Genome-wide association studies (GWAS) have vastly improved our understanding of many diseases. However, such studies have failed to uncover much of the variation responsible for a number of common multi-factorial diseases and complex traits. We show how artificial selection experiments on model organisms can be used to better understand the nature of genetic associations. We demonstrate using simulations that using data from artificial selection experiments improves the performance of conventional methods of performing association. We also validate our approach using semi-simulated data from an artificial selection experiment on Drosophila Melanogaster.
36

Gale, Joanne. "Statistical Methods for the Analysis of Quantitative Trait Data in Genetic Association Studies." Thesis, University of Oxford, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.504345.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
37

De, Tisham. "Statistical approaches for copy number variation detection and association with complex human phenotypes." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/45494.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Copy number variants (CNVs) play an important role in the disease pathogenesis, including epilepsy, diabetes and many others. CNVs, are also known to affect cellular phenotypes through several phenomenon such as gene dosage. Next generation technologies for sequencing (DNA and RNA) and metabolite profiling (metabolomics) has led to the systematic discovery and evaluation of various genomic variants and their relationship to multiple phenotypes. Such approaches often involve application of several statistical and machine learning methods for unravelling new relationships between genomic variants and phenotypes i.e. disease outcomes or quantitative traits characterized at the molecular level. This thesis explores and develops several statistical methods for CNV detection and association with complex human phenotypes, in particular for epilepsy drug-response, epilepsy susceptibility, metabolomics and gene expression. In more detail, chapter 3, describes a genome wide CNV association analysis for two phenotypes including epilepsy susceptibility and epilepsy drug response. I have identified several important candidate genes for these two phenotypes, including the top most associated genes, SLC9A1 (p-value=6.69E-15) for epilepsy susceptibility and WWOX (p-value=1.93E-3) for epilepsy drug response. These associations were replicated in a separate Australian cohort and were further validated in lab and in-silico, leading to some positive and negative confirmation. In chapter 4, I present CNV association with metabolomic data in the exonic regions of the TSPAN8 gene. A strong association signal was detected in the 6th exon and 7th exon of the TSPAN8 gene, where a large proportion of metabonomic lipid phenotypes were found to be associated with univariate (P-value=7.64E-4) and multivariate (P-value=1.33E-6) approaches. These CNVs were also found to be nominally associated with type 2 diabetes (P-value=3.32e-7). In addition, I also carried out advanced multivariate based association analysis to corroborate these results and further reported sequencing based validation results for TSPAN8 exonic CNVs in different human populations from the 1000 genomes project. In chapter 5, I report a genome wide CNV association analysis with gene expression in ten different regions of the human brain. I identified a novel CNV near the DRD5 gene which was found to be strongly associated with gene expression. Further, I have reported on-going efforts to replicate and validate this finding. Each of these different phenotype categories analysed posed its own unique challenges and required specific approaches for analysis and interpretation.
38

Guinot, Florent. "Statistical learning for omics association and interaction studies based on blockwise feature compression." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLE029/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Depuis la dernière décennie le développement rapide des technologies de génotypage a profondément modifié la façon dont les gènes impliqués dans les troubles mendéliens et les maladies complexes sont cartographiés, passant d'approches gènes candidats aux études d'associations pan-génomique, ou Genome-Wide Association Studies (GWASs). Ces études visent à identifier, au sein d'échantillons d'individus non apparentés, des marqueurs génétiques impliqués dans l'expression de maladies complexes. Ces études exploitent le fait qu'il est plus facile d'établir, à partir de la population générale, de grandes cohortes de personnes affectées par une maladie et partageant un facteur de risque génétique qu'au sein d'échantillons apparentés issus d'une même famille, comme c'est le cas dans les études familiales traditionnelles.D'un point de vue statistique, l'approche standard est basée sur le test d'hypothèse: dans un échantillon d'individus non apparentés, des individus malades sont testés contre des individus sains à un ou plusieurs marqueurs. Cependant, à cause de la grande dimension des données, ces procédures de tests classiques sont souvent sujettes à des faux positifs, à savoir des marqueurs faussement identifiés comme étant significatifs. Une solution consiste à appliquer une correction sur les p-valeurs obtenues afin de diminuer le seuil de significativité, augmentant en contrepartie le risque de manquer des associations n’ayant qu'un faible effet sur le phénotype.De plus, bien que cette approche ait réussi à identifier des marqueurs génétiques associés à des maladies multi-factorielles complexes (maladie de Crohn, diabète I et II, maladie coronarienne,…), seule une faible proportion des variations phénotypiques attendues des études familiales classiques a été expliquée. Cette héritabilité manquante peut avoir de multiples causes parmi les suivantes: fortes corrélations entre les variables génétiques, structure de la population, épistasie (interactions entre gènes), maladie associée aux variants rares,...Les principaux objectifs de cette thèse sont de développer de nouvelles méthodes statistiques pouvant répondre à certaines des limitations mentionnées ci-dessus. Plus précisément, nous avons développé deux nouvelles approches: la première exploite la structure de corrélation entre les marqueurs génétiques afin d'améliorer la puissance de détection dans le cadre des tests d'hypothèses tandis que la seconde est adaptée à la détection d'interactions statistiques entre groupes de marqueurs méta-génomiques et génétiques permettant une meilleure compréhension de la relation complexe entre environnement et génome sur l'expression d'un caractère
Since the last decade, the rapid advances in genotyping technologies have changed the way genes involved in mendelian disorders and complex diseases are mapped, moving from candidate genes approaches to linkage disequilibrium mapping. In this context, Genome-Wide Associations Studies (GWAS) aim at identifying genetic markers implied in the expression of complex disease and occuring at different frequencies between unrelated samples of affected individuals and unaffected controls. These studies exploit the fact that it is easier to establish, from the general population, large cohorts of affected individuals sharing a genetic risk factor for a complex disease than within individual families, as is the case with traditional linkage analysis.From a statistical point of view, the standard approach in GWAS is based on hypothesis testing, with affected individuals being tested against healthy individuals at one or more markers. However, classical testing schemes are subject to false positives, that is markers that are falsely identified as significant. One way around this problem is to apply a correction on the p-values obtained from the tests, increasing in return the risk of missing true associations that have only a small effect on the phenotype, which is usually the case in GWAS.Although GWAS have been successful in the identification of genetic variants associated with complex multifactorial diseases (Crohn's disease, diabetes I and II, coronary artery disease,…) only a small proportion of the phenotypic variations expected from classical family studies have been explained .This missing heritability may have multiple causes amongst the following: strong correlations between genetic variants, population structure, epistasis (gene by gene interactions), disease associated with rare variants,…The main objectives of this thesis are thus to develop new methodologies that can face part of the limitations mentioned above. More specifically we developed two new approaches: the first one is a block-wise approach for GWAS analysis which leverages the correlation structure among the genomic variants to reduce the number of statistical hypotheses to be tested, while in the second we focus on the detection of interactions between groups of metagenomic and genetic markers to better understand the complex relationship between environment and genome in the expression of a given phenotype
39

Tachmazidou, Ioanna. "Bayesian statistical methods for genetic association studies with case-control and cohort design." Thesis, Imperial College London, 2008. http://hdl.handle.net/10044/1/4398.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. We propose a coalescent-based model for association mapping which potentially increases the power to detect disease-susceptibility variants in genetic association studies with case-control and cohort design. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions and we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium (LD) therein assuming a perfect phylogeny. The haplotype space is then partitioned into disjoint clusters within which the phenotype-haplotype association is assumed to be the same. The novelty of our approach consists in the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common mutation. Our approach is fully Bayesian and we develop Markov Chain Monte Carlo algorithms to sample efficiently over the space of possible partitions. We have also developed a Bayesian survival regression model for high-dimension and small sample size settings. We provide a Bayesian variable selection procedure and shrinkage tool by imposing shrinkage priors on the regression coefficients. We have developed a computationally efficient optimization algorithm to explore the posterior surface and find the maximum a posteriori estimates of the regression coefficients. We compare the performance of the proposed methods in simulation studies and using real datasets to both single-marker analyses and recently proposed multi-marker methods and show that our methods perform similarly in localizing the causal allele while yielding lower false positive rates. Moreover, our methods offer computational advantages over other multi-marker approaches.
40

Mathieson, Iain. "Genes in space : selection, association and variation in spatially structured populations." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:85f051b6-2121-49cf-9468-3ca7ba77cc4a.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Spatial structure in a population creates distinctive patterns in genetic data. There are two reasons to model this process. First, since the genetic structure of a population is induced by its historical spatial structure, it can be used to make inference about history and demography. Second, these models provide corrections to other analyses that are confounded by spatial structure. Since is it is now common to collect genome-wide data on many thousands of samples, a major challenge is to develop fast, scalable, approximate algorithms that can analyse these datasets. A practical approach is to focus on subsets of the data that are most informative, for example rare variants. First we look at the problem of estimating selection coefficients in spatially structured populations. We demonstrate this approach using classical datasets of moth colour morph frequencies, and then use it in a model incorporating both ancient and modern DNA to estimate the selective advantage of one of the best known examples of local adaptation in humans, lactase persistence in Europeans. Next, we turn to the problem of association studies in spatially structured populations. We demonstrate that rare variants are more confounded by non-genetic risk than common variants. Excess confounding is a consequence of the fact that rare variants are highly in- formative about recent ancestry and therefore, in a spatially explicit model, about location. Finally, we use this insight into rare variants to develop methods for inference about population history using rare variant and haplotype sharing as simple summary statistics. These approaches are extremely fast and can be applied to genome-wide data on thousands of samples, yet they provide an accurate description of the history of a population, both identifying recent ancestry and estimating migration rates between subpopulations.
41

Speed, Douglas Christopher. "Exploring nonlinear regression methods, with application to association studies." Thesis, University of Cambridge, 2011. https://www.repository.cam.ac.uk/handle/1810/241092.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The field of nonlinear regression is a long way from reaching a consensus. Once a method decides to explore nonlinear combinations of predictors, a number of questions are raised, such as what nonlinear combinations to permit and how best to search the resulting model space. Genetic Association Studies comprise an area that stands to gain greatly from the development of more sophisticated regression methods. While these studies' ability to interrogate the genome has advanced rapidly over recent years, it is thought that a lack of suitable regression tools prevents them from achieving their full potential. I have tried to investigate the area of regression in a methodical manner. In Chapter 1, I explain the regression problem and outline existing methods. I observe that both linear and nonlinear methods can be categorised according to the restrictions enforced by their underlying model assumptions and speculate that a method with as few restrictions as possible might prove more powerful. In order to design such a method, I begin by assuming each predictor is tertiary (takes no more than three distinct values). In Chapters 2 and 3, I propose the method Sparse Partitioning. Its name derives from the way it searches for high scoring partitions of the predictor set, where each partition defines groups of predictors that jointly contribute towards the response. A sparsity assumption supposes most predictors belong in the 'null group' indicating they have no effect on the outcome. In Chapter 4, I compare the performance of Sparse Partitioning to existing methods using simulated and real data. The results highlight how greatly a method's power depends on the validity of its model assumptions. For this reason, Sparse Partitioning appears to offer a robust alternative to current methods, as its lack of restrictions allows it to maintain power in scenarios where other methods will fail. Sparse Partitioning relies on Markov chain Monte Carlo estimation, which limits the size of problem on which it can be used. Therefore, in Chapter 5, I propose a deterministic version ofthe method which, although less powerful, is not affected by convergence issues. In Chapter 6, I describe Bayesian Projection Pursuit, which adds spline fitting into the method to cope withnon-tertiary predictors.
42

Gaye, Amadou. "Study of the key determinants of statistical power in large scale genetic association studies." Thesis, University of Leicester, 2013. http://hdl.handle.net/2381/27882.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A large number of participants is often required by association studies investigating the causal mechanisms of complex diseases because of the generally weak causal effects involved in these conditions. The large sample sizes necessary for adequately powered analyses are mainly achieved by large studies. This can be an expensive undertaking and it is important that the correct sample size is identified. But, the analysis of the statistical power of large consortia and major biobanks demands that a number of complicating issues are taken into proper account. This includes the impact of unmeasured aetiological determinants and the quality of measurement of both outcome and explanatory variables. Conventional methods to analyse power use closed-form solutions that are not flexible enough to allow for these elements to be taken easily into account and this results in a potentially substantial overestimation of the actual power. In this thesis, I describe the radical rebuilding of an existing power calculator known as ESPRESSO to develop and implement the ESPRESSO-forte algorithm. ESPRESSO-forte is intended as a comprehensive study simulation platform aimed at supporting the design of large scale association studies and biobanks. I then applied the newly developed software to two real world scientific problems: (1) to assess the power of a large multi-provincial Canadian cohort for the study of quantitative traits; and (2) to estimate the impact of the particular standard operating procedures that were applied to the collecting and processing of biosamples in UK Biobank, on the likely power of future nested case-control studies. Some analyses now explore the role of copy-number variants (CNVs) in disease. I evaluated the accuracy of CNVs genotypes measured on four SNP genotyping platforms to inform future studies that plan to use existing SNP intensity data to measure CNVs or carry de novo CNV measurements from SNP genotyping platforms.
43

Bouaziz, Matthieu. "Statistical methods to account for different sources of bias in Genome-Wide association studies." Thesis, Evry-Val d'Essonne, 2012. http://www.theses.fr/2012EVRY0023/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les études d'association à grande échelle sont devenus un outil très performant pour détecter les variants génétiques associés aux maladies. Ce manuscrit de doctorat s'intéresse à plusieurs des aspects clés des nouvelles problématiques informatiques et statistiques qui ont émergé grâce à de telles recherches. Les résultats des études d'association à grande échelle sont critiqués, en partie, à cause du biais induit par la stratification des populations. Nous proposons une étude de comparaison des stratégies qui existent pour prendre en compte ce problème. Leurs avantages et limites sont discutés en s'appuyant sur divers scénarios de structure des populations dans le but de proposer des conseils et indications pratiques. Nous nous intéressons ensuite à l'interférence de la structure des populations dans la recherche génétique. Nous avons développé au cours de cette thèse un nouvel algorithme appelé SHIPS (Spectral Hierarchical clustering for the Inference of Population Structure). Cet algorithme a été appliqué à un ensemble de jeux de données simulés et réels, ainsi que de nombreux autres algorithmes utilisés en pratique à titre de comparaison. Enfin, la question du test multiple dans ces études d'association est abordée à plusieurs niveaux. Nous proposons une présentation générale des méthodes de tests multiples et discutons leur validité pour différents designs d'études. Nous nous concertons ensuite sur l'obtention de résultats interprétables aux niveaux de gènes, ce qui correspond à une problématique de tests multiples avec des tests dépendants. Nous discutons et analysons les différentes approches dédiées à cette fin
Genome-Wide association studies have become powerful tools to detect genetic variants associated with diseases. This PhD thesis focuses on several key aspects of the new computational and methodological problematics that have arisen with such research. The results of Genome-Wide association studies have been questioned, in part because of the bias induced by population stratification. Many stratégies are available to account for population stratification scenarios are highlighted in order to propose pratical guidelines to account for population stratification. We then focus on the inference of population structure that has many applications for genetic research. We have developed and present in this manuscript a new clustering algoritm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS). This algorithm in the field to propose a comparison of their performances. Finally, the issue of multiple-testing in Genome-Wide association studies is discussed on several levels. We propose a review of the multiple-testing corrections and discuss their validity for different study settings. We then focus on deriving gene-wise interpretation of the findings that corresponds to multiple-stategy to obtain valid gene-disease association measures
44

Pollock, Jeffrey. "Statistical modelling and Bayesian inference for match outcomes and team behaviour in association football." Thesis, Heriot-Watt University, 2016. http://hdl.handle.net/10399/3097.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis presents advances in modelling and inference for match outcomes in the association football English Premier League. We firstly extend earlier models by introducing a behavioural aspect which can be used to investigate how teams react to the state of play in a match. We show that the model, in its simplest form, outperforms existing models and is able to select a portfolio of pro table bets against a bookmaker. Secondly, we introduce a dynamic component to the model by allowing team ability parameters to vary stochastically in time. We employ particle filtering methods to cope with a mixture of static and dynamic parameters and find that the updating of posterior distributions is particularly fast, a necessary attribute should we wish to update parameter estimates while matches are in-play. Furthermore, it is shown that the methods are able to recover model parameters based on simulated league data. Finally, we propose an extension to the model so that we are able to investigate how a team modifies its behaviour based on their league situation. We consider league positions that are closely attainable and suggest that since teams modify their behaviour based on their current league position, outcomes of different matches are not necessarily independent.
45

Zhu, Shaojuan. "Associative memory as a Bayesian building block /." Full text open access at:, 2008. http://content.ohsu.edu/u?/etd,655.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Edsberg, Erik. "A statistical simulation-based framework for sample size considerations in case-control SNP association studies." Thesis, Norwegian University of Science and Technology, Department of Mathematical Sciences, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9763.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

In the thesis, a statistical simulation-based framework is presented that is intended for making sample size and power considerations prior to case-control association studies. It reviews biological background and biallelic single- and multiple-SNP disease models, with a focus on single-SNP models. Odds ratios, multiple testing, sample size, statistical power and the genomeSIM package are also reviewed. The framework is tested with the MAX stat method on a dominant disease model, demonstrating that it can be used for assessing whether different sample sizes are sufficient for detecting a causal SNP.

47

He, Ran. "Some Statistical Aspects of Association Studies in Genetics and Tests of the Hardy-Weinberg Equilibrium." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1187009967.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Fogle, Orelle Ryan. "Human Micro-Range/Micro-Doppler Signature Extraction, Association, and Statistical Characterization for High-Resolution Radar." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1307733951.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Amin, Al Olama Seyed Ali. "Genetic epidemiology of prostate cancer statistical analyses of genome-wide association studies of prostate cancer." Thesis, University of Cambridge, 2013. https://www.repository.cam.ac.uk/handle/1810/252290.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Avallone, Kimberly M. "Anxiety Sensitivity as a Mediator of the Association between Asthma and Smoking." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406811550.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

До бібліографії