Dissertations / Theses: 'Bayesian LASSO'

1

Han, Yuchen. "Bayesian Variable Selection Using Lasso." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1491775118610981.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Xing, Guan. "LASSOING MIXTURES AND BAYESIAN ROBUST ESTIMATION." Case Western Reserve University School of Graduate Studies / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=case1164135815.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Gao, Di. "Bayesian Lasso Models – With Application to Sports Data." Diss., North Dakota State University, 2018. https://hdl.handle.net/10365/27949.

Full text

Abstract:

Several statistical models were proposed by researchers to fulfill the objective of correctly predicting the winners of sports game, for example, the generalized linear model (Magel & Unruh, 2013) and the probability self-consistent model (Shen et al., 2015). This work studied Bayesian Lasso generalized linear models. A hybrid model estimation approach of full and Empirical Bayesian was proposed. A simple and efficient method in the EM step, which does not require sample mean from the random samples, was also introduced. The expectation step was reduced to derive the theoretical expectation directly from the conditional marginal. The findings of this work suggest that future application will significantly cut down the computation load. Due to Lasso (Tibshirani, 1996)’s desired geometric property, the Lasso method provides a sharp power in selecting significant explanatory variables and has become very popular in solving big data problem in the last 20 years. This work was constructed with Lasso structure hence can also be a good fit to achieve dimension reduction. Dimension reduction is necessary when the number of observations is less than the number of parameters or when the design matrix is non-full rank. A simulation study was conducted to test the power of dimension reduction and the accuracy and variation of the estimates. For an application of the Bayesian Lasso Probit Linear Regression to live data, NCAA March Madness (Men’s Basketball Division I) was considered. In the end, the predicting bracket was used to compare with the real tournament result, and the model performance was evaluated by bracket scoring system (Shen et al., 2015).

APA, Harvard, Vancouver, ISO, and other styles

4

Joo, LiJin. "Bayesian lasso| An extension for genome-wide association study." Thesis, New York University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10243856.

Full text

Abstract:

In genome-wide association study (GWAS), variable selection has been used for prioritizing candidate single-nucleotide polymorphism (SNP). Relating densely located SNPs to a complex trait, we need a method that is robust under various genetic architectures, yet is sensitive enough to detect the marginal difference between null and non-null factors. For this problem, ordinary Lasso produced too many false positives, and Bayesian Lasso by Gibbs samplers became too conservative when selection criterion was posterior credible sets. My proposals to improve Bayesian Lasso include two aspects: To use stochastic approximation, variational Bayes for increasing computational efficiency and to use a Dirichlet-Laplace prior for separating small effects from nulls better. Both a double exponential prior of Bayesian Lasso and a Dirichlet-Laplace prior have a global-local mixture representation, and variational Bayes can effectively handle the hierarchies of a model due to the mixture representation. In the analysis of simulated and real sequencing data, the proposed methods showed meaningful improvements on both efficiency and accuracy.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhou, Xiaofei. "Bayesian Lasso for Detecting Rare Genetic Variants Associated with Common Diseases." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563455460578675.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Wang, Meng. "Family-Based Bayesian LASSO for Detecting Association of Rare Haplotypes with Common Diseases." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398896091.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Zhang, Yiran. "Bayesian Variable Selection for High-Dimensional Data with an Ordinal Response." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1565283865507018.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Xia, Shuang. "Detecting Rare Haplotype-Environment Interaction and Dynamic Effects of Rare Haplotypes using Logistic Bayesian LASSO." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1406246686.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Fragoso, Tiago de Miranda. "Seleção bayesiana de variáveis em modelos multiníveis da teoria de resposta ao item com aplicações em genômica." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-14112014-110028/.

Full text

Abstract:

As investigações sobre as bases genéticas de doenças complexas em Genômica utilizam diversos tipos de informação. Diversos sintomas são avaliados de maneira a diagnosticar a doença, os indivíduos apresentam padrões de agrupamento baseados, por exemplo no seu parentesco ou ambiente comum e uma quantidade imensa de características dos indivíduos são medidas por meio de marcadores genéticos. No presente trabalho, um modelo multiníveis da teoria de resposta ao item (TRI) é proposto de forma a integrar todas essas fontes de informação e caracterizar doenças complexas através de uma variável latente. Além disso, a quantidade de marcadores moleculares induz um problema de seleção de variáveis, para o qual uma seleção baseada nos métodos da busca estocástica e do LASSO bayesiano são propostos. Os parâmetros do modelo e a seleção de variáveis são realizados sob um paradigma bayesiano, no qual um algoritmo Monte Carlo via Cadeias de Markov é construído e implementado para a obtenção de amostras da distribuição a posteriori dos parâmetros. O mesmo é validado através de estudos de simulação, nos quais a capacidade de recuperação dos parâmetros, de escolha de variáveis e características das estimativas pontuais dos parâmetros são avaliadas em cenários similares aos dados reais. O processo de estimação apresenta uma recuperação satisfatória nos parâmetros estruturais do modelo e capacidade de selecionar covariáveis em espaços de dimensão elevada apesar de um viés considerável nas estimativas das variáveis latentes associadas ao traço latente e ao efeito aleatório. Os métodos desenvolvidos são então aplicados aos dados colhidos no estudo de associação familiar \'Corações de Baependi\', nos quais o modelo multiníveis se mostra capaz de caracterizar a síndrome metabólica, uma série de sintomas associados com o risco cardiovascular. O modelo multiníveis e a seleção de variáveis se mostram capazes de recuperar características conhecidas da doença e selecionar um marcador associado.
Recent investigations about the genetic architecture of complex diseases use diferent sources of information. Diferent symptoms are measured to obtain a diagnosis, individuals may not be independent due to kinship or common environment and their genetic makeup may be measured through a large quantity of genetic markers. In the present work, a multilevel item response theory (IRT) model is proposed that unifies all these diferent sources of information through a latent variable. Furthermore, the large ammount of molecular markers induce a variable selection problem, for which procedures based on stochastic search variable selection and the Bayesian LASSO are considered. Parameter estimation and variable selection is conducted under a Bayesian framework in which a Markov chain Monte Carlo algorithm is derived and implemented to obtain posterior distribution samples. The estimation procedure is validated through a series of simulation studies in which parameter recovery, variable selection and estimation error are evaluated in scenarios similar to the real dataset. The estimation procedure showed adequate recovery of the structural parameters and the capability to correctly nd a large number of the covariates even in high dimensional settings albeit it also produced biased estimates for the incidental latent variables. The proposed methods were then applied to the real dataset collected on the \'Corações de Baependi\' familiar association study and was able to apropriately model the metabolic syndrome, a series of symptoms associated with elevated heart failure and diabetes risk. The multilevel model produced a latent trait that could be identified with the syndrome and an associated molecular marker was found.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhang, Han. "Detecting Rare Haplotype-Environmental Interaction and Nonlinear Effects of Rare Haplotypes using Bayesian LASSO on Quantitative Traits." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu149969433115895.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Azevedo, Camila Ferreira. "Ridge, lasso and bayesian additive-dominance genomic models and new estimators for the experimental accuracy of genome selection." Universidade Federal de Viçosa, 2015. http://www.locus.ufv.br/handle/123456789/7176.

Full text

Abstract:

Submitted by Marco Antônio de Ramos Chagas (mchagas@ufv.br) on 2016-01-13T08:21:37Z No. of bitstreams: 1 texto completo.pdf: 1062061 bytes, checksum: 36720a2028bf76afcba32f3865472cd7 (MD5)
Made available in DSpace on 2016-01-13T08:21:37Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1062061 bytes, checksum: 36720a2028bf76afcba32f3865472cd7 (MD5) Previous issue date: 2015-10-26
A principal contribuição da genética molecular no melhoramento é a utilização direta das informações de DNA no processo de identificação de indivíduos geneticamente superiores. Sob esse enfoque, idealizou-se a seleção genômica ampla (Genome Wide Selection – GWS), a qual consiste na análise de um grande número de marcadores SNPs (Single Nucleotide Polymorphisms) amplamente distribuídos no genoma. Este trabalho de simulação apresenta uma abordagem completa para a seleção genômica por meio de adequados modelos genéticos incluindo efeitos aditivos e devido à dominância, que são essenciais para a seleção de clones e de cruzamentos, bem como para melhorar a estimativa de efeitos aditivos para a seleção. Até o momento, as abordagens via Ridge Bayesiana e Lasso para modelos aditivo-dominante não foram avaliados e comparados na literatura. Neste trabalho, foram avaliados o desempenho de 10 modelos de predição aditivo-dominante (incluindo os modelos existentes e propostas de modificação). Um novo método Bayesiano/Lasso modificado (chamado BayesA* B* ou t-BLASSO) obteve melhor desempenho na estimação de valores genéticos genômicos dos indivíduos, em todos os quatro cenários (dois níveis de herdabilidades × duas arquiteturas genéticas). Os métodos do tipo BayesA*B* apresentaram melhor capacidade para recuperar a razão entre a variância de dominância e a variância aditiva. Além disso, o papel das três fontes de informação da genética quantitativa (chamadas de desequilíbrio de ligação, co-segregação e relações de parentesco) na seleção genômica foram elucidadas pela decomposição da herdabilidade e da acurácia nos três componentes, mostrando suas relações com a estrutura de populações e o melhoramento genético, a curto e longo prazo. Além disso, neste trabalho de simulação também foi desenvolvido dois novos estimadores para a acurácia preditiva da seleção genômica. O trabalho propõe e avalia o desempenho e a eficiência destes novos estimadores chamados estimador regularizado (RE) e estimador híbrido (HE). O estimador regularizado leva em consideração tanto a herdabilidade genômica quanto a herdabilidade da característica, além da capacidade preditiva. Enquanto, o estimador híbrido (HE), combina as acurácias experimental e esperada. As comparações entre RE e HE com o estimador tradicional (TE) foram feitas sob quatro procedimentos de validação. Em geral, RE apresentou acurácias mais próximas aos valores paramétricos, principalmente quando há seleção de marcadores. RE também foi menos tendencioso e mais preciso, com desvios padrão menores do que o estimador tradicional. Diante dos resultados, o TE pode ser usado apenas com a validação independente, em que tende a ter um melhor desempenho do que RE, embora superestimando a acurácia. O estimador híbrido (HE) provou ser muito eficaz na ausência de validação. Enquanto, que a validação independente mostrou-se superior em relação aos procedimentos de Jacknife, perseguindo melhor a acurácia paramétrica com ou sem seleção de marcador. As seguintes inferências podem ser feitas de acordo com o estimador de acurácia e tipo de validação: (i) a acurácia mais provável: HE sem validação; (ii) a maior acurácia possível (acurácia superestimada): TE com validação independente; (iii) a menor acurácia possível (acurácia subestimada): RE com validação independente.
The main contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. Under this approach, genome-wide selection (GWS) can be used with this purpose. GWS consists in analyzing of a large number of SNP markers widely distributed in the genome. This simulation work presents a complete approach for genomic selection by using adequate genetic models including dominance effects, which are essential for selecting crosses and clones as well as for improving the estimation of additive effects for parent selection. To date, the approaches via Ridge, Lasso and Bayesian additive-dominance models have not been evaluated and compared in the literature.The performance of 10 additive-dominance prediction models (including current ones and proposed modifications) were evaluated. A new modified Bayesian/Lasso method (called BayesA*B* or t-BLASSO) performed best in the prediction of genomic breeding value of individuals, in all the four scenarios (two heritabilities × two genetic architectures). The BayesA*B*-type methods showed better ability for recovering the dominance variance/additive variance ratio. Also, the role of the three quantitative genetics information sources (called linkage disequilibrium, co- segregation and pedigree relationships) in genomic selection were elucidated by decomposing the heritability and accuracy in the three components and showing their relations with the structure of populations and the genetic improvement in the short and long run. Moreover, this simulation work also, we developed the new estimators for the prediction accuracy of genomic selection. The work proposes and evaluates the performance and efficiency of these new estimators called regularized estimator (RE) and hybrid estimator (HE). The regularized estimator takes in consideration both the genomic and trait heritabilities, in addition to the predictive ability. The hybrid estimator (HE), combines both experimental and expected accuracies. The comparisons of the RE and HE with the traditional (TE) were done under four validation procedures. In general, the new estimator presented accuracies closer to the parametric ones, mainly when selecting markers. It was also less biased and more precise, with smaller standard deviations than the traditional estimator. The TE can be used only with independent validation, where it tends to perform better than RE, although overestimating the accuracy. The hybrid estimator (HE) proved to be very effective in the absence of validation. The independent validation showed to be superior over the Jacknife procedures, chasing better the parametric accuracy with or without marker selection. The following inferences can be made according to the accuracy estimator and kind of validation: (i) most probable accuracy: HE without validation; (ii) highest possible accuracy: TE with independent validation; (iii) lowest possible accuracy: RE with independent validation.
Sem Agência de Fomento

APA, Harvard, Vancouver, ISO, and other styles

12

Ocloo, Isaac Xoese. "Energy Distance Correlation with Extended Bayesian Information Criteria for feature selection in high dimensional models." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1625238661031258.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Marques, Matheus Augustus Pumputis. "Análise e comparação de alguns métodos alternativos de seleção de variáveis preditoras no modelo de regressão linear." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-23082018-210710/.

Full text

Abstract:

Neste trabalho estudam-se alguns novos métodos de seleção de variáveis no contexto da regressão linear que surgiram nos últimos 15 anos, especificamente o LARS - Least Angle Regression, o NAMS - Noise Addition Model Selection, a Razão de Falsa Seleção - RFS (FSR em inglês), o LASSO Bayesiano e o Spike-and-Slab LASSO. A metodologia foi a análise e comparação dos métodos estudados e aplicações. Após esse estudo, realizam-se aplicações em bases de dados reais e um estudo de simulação, em que todos os métodos se mostraram promissores, com os métodos Bayesianos apresentando os melhores resultados.
In this work, some new variable selection methods that have appeared in the last 15 years in the context of linear regression are studied, specifically the LARS - Least Angle Regression, the NAMS - Noise Addition Model Selection, the False Selection Rate - FSR, the Bayesian LASSO and the Spike-and-Slab LASSO. The methodology was the analysis and comparison of the studied methods. After this study, applications to real data bases are made, as well as a simulation study, in which all methods are shown to be promising, with the Bayesian methods showing the best results.

APA, Harvard, Vancouver, ISO, and other styles

14

Hidalgo, André Marubayashi. "Fine mapping and single nucleotide polymorphism effects estimation on pig chromosomes 1, 4, 7, 8, 17 and X." Universidade Federal de Viçosa, 2011. http://locus.ufv.br/handle/123456789/4753.

Full text

Abstract:

Made available in DSpace on 2015-03-26T13:42:22Z (GMT). No. of bitstreams: 1 texto completo.pdf: 313433 bytes, checksum: 724d13b2161e04cdd66459909e393dfe (MD5) Previous issue date: 2011-07-08
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Mapeamento de loci de caracaterística quantitativas (QTL) geralmente resultam na detecção de regiões genômicas que explicam parte da variação quantitativa da característica. Entretanto essas regiões são muito amplas e não permitem uma acurada identificação dos genes. Dessa forma, torna-se necessário o estreitamento dos intervalos onde os QTL estão localizados. Com a seleção genômica ampla (GWS), foram desenvolvidas ferramentas estatísticas de forma a se estimar os efeitos de cada marcador. A partir dos valores desses efeitos, pode-se analisar quais são os marcadores de maiores efeitos. Assim, objetivou-se realizar o mapeamento fino dos cromossomos suínos 1, 4, 7, 8, 17, e X, usando marcadores microsatélites e polimorfismo de base única (SNP), em uma população F2 produzida pelo cruzamento de varrões da raça naturalizada brasileira Piau com fêmeas comerciais, associados com características de desempenho, carcaça, orgãos internos, cortes e qualidade de carne. Também objetivou-se estimar os efeitos dos marcadores SNP nas características que tiveram QTL detectados, analisar quais são os mais expressivos e verificar se eles estão localizados dentro do intervalo de confiança do QTL. Os QTL foram identificados por meio do método regressão por intervalo de mapeamento e as análises foram realizadas pelo software GridQTL. O efeito de cada marcador foi estimado pela regressão de LASSO Bayesiano, usando o software R. No total, 32 QTL foram encontrados ao nível cromossômico de significância de 5%, destes, 12 eram significativos ao nível cromossômico de 1% e 7 destes eram significativos ao nível genômico de 5%. Seis de sete QTL apresentaram marcadores de efeito expressivo dentro do intervalo de confiança do QTL. Resultados deste estudo confirmaram QTL de outros trabalhos e identificaram vários outros novos. Os resultados encontrados utilizando marcadores microsatélites junto com SNPs aumentaram a saturação do genoma levando a um menor intervalo de confiança dos QTL encontrados. Os métodos usados foram importantes para estimar os efeitos dos marcadores, e também para localizar aqueles com efeitos mais expressivos dentro do intervalo de confiança do QTL, validando os QTL encontrados pelo método da regressão.
Quantitative Trait Loci (QTL) mapping efforts often result in the detection of genomic regions that explain part of the quantitative trait variation. However, these regions are very large and do not allow accurate gene identification, hence the interval must be narrowed where the QTL was located. With the genome wide selection (GWS), many statistical tools have been developed in order to estimate the effects for each marker. With the marker effects values it is possible to analyze which markers have large effects. Hence, the objective of this investigation was to fine map pig chromosomes 1, 4, 7, 8, 17 and X, using microsatellites and SNP markers, in a F2 population produced by crossing naturalized Brazilian Piau boars with commercial females, associated with performance, carcass, internal organs, cut yields and meat quality traits. A further aim was to estimate the effects of single nucleotide polymorphism (SNP) markers on traits with detected QTL, analyze the most expressive ones and verify whether the markers with larger effects were indeed within the QTL confidence interval. QTL were identified by regression interval mapping using the GridQTL software. Individual marker effects were estimated by Bayesian LASSO regression using the R software. In total, 32 QTL for the studied traits were significant at the 5% chromosome-wide level, including 12 significant QTL at the 1% chromosome-wide level and 7 significant at the 5% genome-wide level. Six out of seven QTL with genome-wide significance had markers of large effect within their confidence interval. These results confirmed some previous QTL and identified numerous novel QTL for the investigated traits. Our results have shown that the use of microsatellites and SNP markers that increase the genome saturation lead to QTL of smaller confidence intervals. The methods used were also valuable to estimate the marker effects and to locate the most expressive markers within the QTL confidence interval, validating those QTL found by the regression method.

APA, Harvard, Vancouver, ISO, and other styles

15

Hashem, Hussein Abdulahman. "Regularized and robust regression methods for high dimensional data." Thesis, Brunel University, 2014. http://bura.brunel.ac.uk/handle/2438/9197.

Full text

Abstract:

Recently, variable selection in high-dimensional data has attracted much research interest. Classical stepwise subset selection methods are widely used in practice, but when the number of predictors is large these methods are difficult to implement. In these cases, modern regularization methods have become a popular choice as they perform variable selection and parameter estimation simultaneously. However, the estimation procedure becomes more difficult and challenging when the data suffer from outliers or when the assumption of normality is violated such as in the case of heavy-tailed errors. In these cases, quantile regression is the most appropriate method to use. In this thesis we combine these two classical approaches together to produce regularized quantile regression methods. Chapter 2 shows a comparative simulation study of regularized and robust regression methods when the response variable is continuous. In chapter 3, we develop a quantile regression model with a group lasso penalty for binary response data when the predictors have a grouped structure and when the data suffer from outliers. In chapter 4, we extend this method to the case of censored response variables. Numerical examples on simulated and real data are used to evaluate the performance of the proposed methods in comparisons with other existing methods.

APA, Harvard, Vancouver, ISO, and other styles

16

Bitto, Angela, and Sylvia Frühwirth-Schnatter. "Achieving shrinkage in a time-varying parameter model framework." Elsevier, 2019. http://dx.doi.org/10.1016/j.jeconom.2018.11.006.

Full text

Abstract:

Shrinkage for time-varying parameter (TVP) models is investigated within a Bayesian framework, with the aim to automatically reduce time-varying Parameters to staticones, if the model is overfitting. This is achieved through placing the double gamma shrinkage prior on the process variances. An efficient Markov chain Monte Carlo scheme is devel- oped, exploiting boosting based on the ancillarity-sufficiency interweaving strategy. The method is applicable both to TVP models for univariate a swell as multivariate time series. Applications include a TVP generalized Phillips curve for EU area inflation modeling and a multivariate TVP Cholesky stochastic volatility model for joint modeling of the Returns from the DAX-30index.

APA, Harvard, Vancouver, ISO, and other styles

17

NICOLAZZI, EZEQUIEL LUIS. "New trends in dairy cattle genetic evaluation." Doctoral thesis, Università Cattolica del Sacro Cuore, 2011. http://hdl.handle.net/10280/966.

Full text

Abstract:

I sistemi di valutazione genetica nel mondo sono in rapido sviluppo. Attualmente, i programmi di selezione “tradizionale” basati su fenotipi e rapporti di parentela tra gli animali vengono integrati, e nel futuro potrebbero essere sostituiti, dalle informazioni molecolari. In questo periodo di transizione, questa tesi riguarda ricerche su entrambi i tipi di valutazioni: dall’accertamento sull’accuratezza degli indici genetici internazionali (tradizionali), allo studio di metodi statistici utilizzati per integrare informazioni genomiche nella selezione (selezione genomica). Tre capitoli valutano gli approcci per stimare i valori genetici dai dati genomici riducendo il numero di variabili indipendenti. In modo particolare, la correzione di Bonferroni e il test di permutazioni con regressione a marcatori singoli (Capitolo III), analisi delle componenti principali con BLUP (Capitolo IV) e indice Fst tra razze con BayesA (Capitolo VI). Inoltre, il Capitolo V analizza l’accuratezza dei valori genomici con BLUP, BayesA e Bayesian LASSO includendo tutte le variabili disponibili. I risultati di questa tesi indicano che il progresso genetico atteso dall’analisi dei dati simulati può effettivamente essere ottenuto, anche se ulteriori ricerche sono necessarie per ottimizzare l’utilizzo delle informazioni molecolari in modo da ottimizzare i risultati per tutti i caratteri sotto selezione.
Genetic evaluation systems are in rapid development worldwide. In most countries, “traditional” breeding programs based on phenotypes and relationships between animals are currently being integrated and in the future might be replaced by the introduction of molecular information. This thesis stands in this transition period, therefore it covers research on both types of genetic evaluations: from the assessment of the accuracy of (traditional) international genetic evaluations to the study of statistical methods used to integrate genomic information into breeding (genomic selection). Three chapters investigate and evaluate approaches for the estimation of genetic values from genomic data reducing the number of independent variables. In particular, Bonferroni correction and Permutation test combined with single marker regression (Chapter III), principal component analysis combined with BLUP (Chapter IV) and Fst across breeds combined with BayesA (Chapter VI). In addition, Chapter V analyzes the accuracy of direct genomic values with BLUP, BayesA and Bayesian LASSO including all available variables. The results of this thesis indicate that the genetic gains expected from the analysis of simulated data can be obtained on real data. Still, further research is needed to optimize the use of genome-wide information and obtain the best possible estimates for all traits under selection.

APA, Harvard, Vancouver, ISO, and other styles

18

NICOLAZZI, EZEQUIEL LUIS. "New trends in dairy cattle genetic evaluation." Doctoral thesis, Università Cattolica del Sacro Cuore, 2011. http://hdl.handle.net/10280/966.

Full text

Abstract:

I sistemi di valutazione genetica nel mondo sono in rapido sviluppo. Attualmente, i programmi di selezione “tradizionale” basati su fenotipi e rapporti di parentela tra gli animali vengono integrati, e nel futuro potrebbero essere sostituiti, dalle informazioni molecolari. In questo periodo di transizione, questa tesi riguarda ricerche su entrambi i tipi di valutazioni: dall’accertamento sull’accuratezza degli indici genetici internazionali (tradizionali), allo studio di metodi statistici utilizzati per integrare informazioni genomiche nella selezione (selezione genomica). Tre capitoli valutano gli approcci per stimare i valori genetici dai dati genomici riducendo il numero di variabili indipendenti. In modo particolare, la correzione di Bonferroni e il test di permutazioni con regressione a marcatori singoli (Capitolo III), analisi delle componenti principali con BLUP (Capitolo IV) e indice Fst tra razze con BayesA (Capitolo VI). Inoltre, il Capitolo V analizza l’accuratezza dei valori genomici con BLUP, BayesA e Bayesian LASSO includendo tutte le variabili disponibili. I risultati di questa tesi indicano che il progresso genetico atteso dall’analisi dei dati simulati può effettivamente essere ottenuto, anche se ulteriori ricerche sono necessarie per ottimizzare l’utilizzo delle informazioni molecolari in modo da ottimizzare i risultati per tutti i caratteri sotto selezione.
Genetic evaluation systems are in rapid development worldwide. In most countries, “traditional” breeding programs based on phenotypes and relationships between animals are currently being integrated and in the future might be replaced by the introduction of molecular information. This thesis stands in this transition period, therefore it covers research on both types of genetic evaluations: from the assessment of the accuracy of (traditional) international genetic evaluations to the study of statistical methods used to integrate genomic information into breeding (genomic selection). Three chapters investigate and evaluate approaches for the estimation of genetic values from genomic data reducing the number of independent variables. In particular, Bonferroni correction and Permutation test combined with single marker regression (Chapter III), principal component analysis combined with BLUP (Chapter IV) and Fst across breeds combined with BayesA (Chapter VI). In addition, Chapter V analyzes the accuracy of direct genomic values with BLUP, BayesA and Bayesian LASSO including all available variables. The results of this thesis indicate that the genetic gains expected from the analysis of simulated data can be obtained on real data. Still, further research is needed to optimize the use of genome-wide information and obtain the best possible estimates for all traits under selection.

APA, Harvard, Vancouver, ISO, and other styles

19

Karlsson, Jonas, and Roger Karlsson. "Inkrementell responsanalys : Vilka kunder bör väljas vid riktad marknadsföring?" Thesis, Linköpings universitet, Statistik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-96593.

Full text

Abstract:

If customers respond differently to a campaign, it is worthwhile to find those customers who respond most positively and direct the campaign towards them. This can be done by using so called incremental response analysis where respondents from a campaign are compared with respondents from a control group. Customers with the highest increased response from the campaign will be selected and thus may increase the company’s return. Incremental response analysis is applied to the mobile operator Tres historical data. The thesis intends to investigate which method that best explain the incremental response, namely to find those customers who give the highest incremental response of Tres customers, and what characteristics that are important.The analysis is based on various classification methods such as logistic regression, Lassoregression and decision trees. RMSE which is the root mean square error of the deviation between observed and predicted incremental response, is used to measure the incremental response prediction error. The classification methods are evaluated by Hosmer-Lemeshow test and AUC (Area Under the Curve). Bayesian logistic regression is also used to examine the uncertainty in the parameter estimates.The Lasso regression performs best compared to the decision tree, the ordinary logistic regression and the Bayesian logistic regression seen to the predicted incremental response. Variables that significantly affect the incremental response according to Lasso regression are age and how long the customer had their subscription.

APA, Harvard, Vancouver, ISO, and other styles

20

Kim, Byung-Jun. "Semiparametric and Nonparametric Methods for Complex Data." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99155.

Full text

Abstract:

A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis. For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study. For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis. Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system.
Doctor of Philosophy
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data. First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings. Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome. Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.

APA, Harvard, Vancouver, ISO, and other styles

21

Fu, Haoda. "Sparsity and smoothness for disease rate maping via spatial bayesian lasso." 2007. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

"Bayesian model selection for semiparametric structural equation models with modified group Lasso." 2014. http://repository.lib.cuhk.edu.hk/en/item/cuhk-1291541.

Full text

Abstract:

Selecting an appropriate model is a crucial issue for applying structural equation models (SEMs) in real applications. Due to the model complexity, however, it is quite challenging to perform model selection on semiparametric SEMs with functional structural equations. In this thesis, we propose a modified Bayesian adaptive group Lasso procedure to perform model selection and estimation for semiparametric SEMs. By considering a novel formulation of basis expansions to approximate the unknown functions with certain penalties imposed, we are able to introduce a partial linear structure that combines the advantages of linear and nonparametric formulations for structural equations. The nonlinear, linear, or none structures in structural equations can be automatically detected with the proposed method. In addition, the group Lasso with adaptive penalties not only largely alleviates the model selection difficulties caused by the group effects and correlations introduced by basis expansions of latent variables, but also reduces the bias of traditional Lasso procedures. Simulation studies demonstrate that the proposed methodology performs satisfactorily. The proposed method is applied to analyze a real data set of diabetic kidney disease, which provides us some meaningful insights.
在结构方程模型的实际应用中，选择一个合适的模型是一个核心问题。但是由于模型的复杂性，对于含有函数型结构的半参数结构方程模型进行模型选择十分困难。在本文中，我们提出了一种新的贝叶斯自适应群Lasso，并应用它来对半参数结构方程模型同时进行参数估计和模型选择。我们在非参数结构方程模型中引入了部分线性结构，并通过一种新的基底函数展开来近似结构方程里的未知函数。这种结构同时具备了线性模型和非参数模型的优势。本文的方法可以自动识别半参数结构方程模型里面的非线性和线性结构，并筛除不重要的变量。这种带有自适应惩罚的群Lasso不仅减小了传统Lasso方法在估计参数时产生的偏差，而且解决了由潜变量的基底表示导致的组效应和相关性引起的模型选择的困难。由模拟实验的结果可以看出本文提出的方法十分有效。我们还应用所提出的方法分析了一组关于糖尿病型肾病的数据，并得到了一些有意义的结果。
Feng, Xiangnan.
Thesis M.Phil. Chinese University of Hong Kong 2014.
Includes bibliographical references (leaves 51-56).
Abstracts also in Chinese.
Title from PDF title page (viewed on 18, October, 2016).
Detailed summary in vernacular field only.

APA, Harvard, Vancouver, ISO, and other styles

23

Adjogou, Adjobo Folly Dzigbodi. "Analyse statistique de données fonctionnelles à structures complexes." Thèse, 2017. http://hdl.handle.net/1866/20581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Paccapelo, María Valeria. "Modelos de selección genómica para caracteres cuantitativos basados en marcadores moleculares aplicados al mejoramiento de maíz." Master's thesis, 2015. http://hdl.handle.net/11086/2355.

Full text

Abstract:

1. Introducción - 2. Objetivos - 2.1 Objetivos Generales - 2.2 Objetivos Específicos - 3. Materiales - 3.1. Proceso de generación de datos para selección genómica - 3.2. Datos fenotípicos - 3.3. Datos de marcadores moleculares - 4. Metodología - 4.1. Metodología para el análisis de datos fenotípicos - 4.2. Metodología para el análisis de datos moleculares - 4.2.1. Introducción a los marcadores moleculares - 4.2.2. Análisis de los marcadores moleculares - 4.3. Modelos de selección genómica - 4.3.1. Selección de variables y ajuste por Mínimos Cuadrados - 4.3.2. Estimación Penalizada: regresión de Ridge - 4.3.3. Selección de Variables y Estimación Penalizada: Regresión LASSO - 4.3.4. Evaluación de la habilidad predictiva de los modelos - 5. Resultados - 5.1. Resultados del análisis de datos fenotípicos - 5.2. Resultados del análisis de los datos de marcadores moleculares - 5.3. Aplicación de métodos de selección genómica a una población y carácter - 5.3.1. Aplicación de Selección de variables y ajuste por Mínimos Cuadrados (SMC) - 5.3.2. Aplicación de Regresión de Ridge clásica (RR) - 5.3.3. Aplicación de la Regresión de Ridge BLUP (RR-BLUP) - 5.3.4. Aplicación de la Regresión de Ridge Bayesiana (BRR) - 5.3.5. Aplicación de la regresión LASSO Bayesiana (BLR) - 5.3.6. Aplicación de la regresión LASSO (LR) - 5.4. Evaluación de la habilidad predictiva de los modelos de selección genómica - 6. Conclusiones - 7. Discusión - 8. Referencias - 9. Anexo
En la actualidad, los modelos de selección genómica (SG) han cobrado gran importancia ya que permiten predecir los valores genéticos de los individuos en función de marcadores moleculares (MM). La incorporación de numerosos MM en modelos de regresión conduce a problemas de dimensionalidad y multicolinealidad. Esta tesis tuvo como objetivo evaluar seis métodos de SG que confrontan estas dificultades (selección de variables, estimación penalizada y la combinación de ambos) desde enfoques clásicos o bayesianos y evaluar su habilidad predictiva para tres caracteres fenotípicos observados en 20 poblaciones de maíz (Zea mays L.). Los resultados indican que la habilidad predictiva se vio asociada a la heredabilidad del carácter y fue superior para los métodos penalizados, entre los que se recomienda la Regresión de Ridge vía modelos mixtos (RR-BLUP). Este trabajo permitió analizar diferentes técnicas estadísticas aplicadas a la SG en un contexto propio de un programa de mejoramiento genético de maíz.
Fil: Paccapelo, María Valeria. Universidad Nacional de Córdoba; Argentina.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Bayesian LASSO'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles