Dissertations / Theses on the topic 'Genotype data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 48 dissertations / theses for your research on the topic 'Genotype data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Brinza, Dumitru. "Discrete Algorithms for Analysis of Genotype Data." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/19.
Full textGroth, Philip. "Knowledge management and discovery for genotype/phenotype data." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2009. http://dx.doi.org/10.18452/16033.
Full textIn diseases with a genetic component, examination of the phenotype can aid understanding the underlying genetics. Technologies to generate high-throughput phenotypes, such as RNA interference (RNAi), have been developed to decipher functions for genes. This large-scale characterization of genes strongly increases phenotypic information. It is a challenge to interpret results of such functional screens, especially with heterogeneous data sets. Thus, there have been only few efforts to make use of phenotype data beyond the single genotype-phenotype relationship. Here, methods are presented for knowledge discovery in phenotypes across species and screening methods. The available databases and various approaches to analyzing their content are reviewed, including a discussion of hurdles to be overcome, e.g. lack of data integration, inadequate ontologies and shortage of analytical tools. PhenomicDB 2 is an approach to integrate genotype and phenotype data on a large scale, using orthologies for cross-species phenotypes. The focus lies on the uptake of quantitative and descriptive RNAi data and ontologies of phenotypes, assays and cell-lines. Then, the results of a study are presented in which the large set of phenotype data from PhenomicDB is taken to predict gene annotations. Text clustering is utilized to group genes based on their phenotype descriptions. It is shown that these clusters correlate well with indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. The clusters are then used to predict gene function by carrying over annotations from well-annotated genes to less well-characterized genes. Finally, the prototype PhenoMIX is presented, integrating genotype and phenotype data with clustered phenotypes, orthologies, interaction data and other similarity measures. Data grouped by these measures are evaluated for theirnpredictiveness in gene functions and phenotype terms.
Yang, Li. "A Goodness-of-fit Association Test for Whole Genome Sequencing Data." Digital WPI, 2013. https://digitalcommons.wpi.edu/etd-theses/296.
Full textO'Connell, Jared Michael. "Statistical methods for genotype microarray data on large cohorts of individuals." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:4e3328cf-0d8e-4587-b24d-9b59fa220f32.
Full textPestana, Valeria. "Modeling drug response in cancer cell linesusing genotype and high-throughput“omics” data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166744.
Full textROSA, Rogério dos Santos. "Associating genotype sequence properties to haplotype inference errors." Universidade Federal de Pernambuco, 2015. https://repositorio.ufpe.br/handle/123456789/16011.
Full textMade available in DSpace on 2016-03-16T15:28:48Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) RogerioSantosRosa_Tese.pdf: 1740026 bytes, checksum: aa346f64c34419c4b83269ccb99ade6a (MD5) Previous issue date: 2015-03-12
Haplotype information has a central role in the understanding and diagnosis of certain illnesses, and also for evolution studies. Since that type of information is hard to obtain directly, computational methods to infer haplotype from genotype data have received great attention from the computational biology community. Unfortunately, haplotype inference is a very hard computational biology problem and the existing methods can only partially identify correct solutions. I present neural network models that use different properties of the data to predict when a method is more prone to make errors. I construct models for three different Haplotype Inference approaches and I show that our models are accurate and statistically relevant. The results of our experiments offer valuable insights on the performance of those methods, opening opportunity for a combination of strategies or improvement of individual approaches. I formally demonstrate that Linkage Disequilibrium (LD) and heterozygosity are very strong indicators of Switch Error tendency for four methods studied, and I delineate scenarios based on LD measures, that reveal a higher or smaller propension of the HI methods to present inference errors, so the correlation between LD and the occurrence of errors varies among regions along the genotypes. I present evidence that considering windows of length 10, immediately to the left of a SNP (upstream region), and eliminating the non-informative SNPs through Fisher’s Test leads to a more suitable correlation between LD and Inference Errors. I apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the accuracy of the haplotype inference results, developing models for two databases (considering only Humans) and using two error metrics. The accuracy of our results and the stability of our proposed models are supported by statistical evidence.
Haplótipos têm um papel central na compreensão e diagnóstico de determinadas doenças e também para estudos de evolução. Este tipo de informação é difícil de obter diretamente, diante disto, métodos computacionais para inferir haplótipos a partir de dados genotípicos têm recebido grande atenção da comunidade de biologia computacional. Infelizmente, a Inferência de Halótipos é um problema difícil e os métodos existentes só podem predizer parcialmente soluções corretas. Foram desenvolvidos modelos de redes neurais que utilizam diferentes propriedades dos dados para prever quando um método é mais propenso a cometer erros. Foram calibrados modelos para três abordagens de Inferência de Haplótipos diferentes e os resultados validados estatisticamente. Os resultados dos experimentos oferecem informações valiosas sobre o desempenho e comportamento desses métodos, gerando condições para o desenvolvimento de estratégias de combinação de diferentes soluções ou melhoria das abordagens individuais. Foi demonstrado que Desequilíbrio de Ligação (LD) e heterozigosidade são fortes indicadores de tendência de erro, desta forma foram delineados cenários com base em medidas de LD, que revelam quando um método tem maior ou menor propensão de cometer erros. Foi identificado que utilizando janelas de 10 SNPs (polimorfismo de um único nucleotídeo), imediatamente a montante, e eliminando os SNPs não informativos pelo Teste de Fisher leva-se a uma correlação mais adequada entre LD e a ocorrência de erros. Por fim, foi aplicada análise de Regressão Linear para explorar a relevância de várias propriedades biologicamente significativas das sequências de genótipos para a precisão dos resultados de Inferência de Haplótipos, estimou-se modelos para duas bases de dados (considerando apenas humanos) utilizando duas métricas de erro. A precisão dos resultados e a estabilidade dos modelos propostos foram validadas por testes estatísticos.
Liu, Lian. "Topics in measurement error and missing data problems." Thesis, [College Station, Tex. : Texas A&M University, 2007. http://hdl.handle.net/1969.1/ETD-TAMU-1627.
Full textRimal, Suraj. "POPULATION STRUCTURE INFERENCE USING PCA AND CLUSTERING ALGORITHMS." OpenSIUC, 2021. https://opensiuc.lib.siu.edu/theses/2860.
Full textStrömstedt, Hallberg Simon, and Jonas Giek. "Simulerad effektivisering av genotypdataanalys genom poolade data." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-296223.
Full textBosch, Puig Lluís. "Age-and genotype-related changes in intramuscular fat content and composition in pigs using longitudinal data." Doctoral thesis, Universitat de Lleida, 2011. http://hdl.handle.net/10803/77959.
Full textLa presente Tesis Doctoral se emmarca en una línea de investigación del Departamento de Producción Animal de la Universidad de Lleida dedicada a la mejora genética de la calidad de la carne en porcino, en particular del contenido y la composición de la grasa intramuscular. La Tesis se compone de cuatro estudios, centrándose el primero de ellos en el desarrollo de un método para determinar el contenido y la composición de la grasa intramuscular a partir de biopsias y muestras post-mortem pequeñas con las que luego poder realizar estudios mediante diseños longitudinales. La metodología propuesta ha resultado útil, demostrándose que, especialmente para el contenido de grasa intramuscular, los especímenes pequeños del músculo objetivo son tan informativos como muestras grandes de otros músculos. En el segundo estudio se ha investigado mediante un experimento con datos longitudinales, obtenidos según la metodología descrita anteriormente, el efecto de la edad sobre el contenido y la composición de la grasa intramuscular y subcutánea durante el engorde de cerdos Duroc. Se concluye que un retraso en la edad de sacrificio comporta un aumento del contenido de grasa intramuscular y de ácido oleico, aunque ello se consigue a costa de disminuir la velocidad de crecimiento magro. Por otra parte, se demuestra que la grasa intramuscular y la grasa subcutánea tienen patrones distintos de crecimiento y composición y que la cantidad de grasa por sí misma influye en su composición. El que un cerdo sea más graso de lo esperado a una edad determinada es debido, en el caso de la grasa intramuscular, a que ha aumentado el contenido de grasa monoinsaturada, en especial de oleico, mientras que, en el de la subcutánea, a que se ha incrementado el de la saturada. En los dos últimos estudios se examina si la variación alélica en los genes IGF-1 (insulin-like growth factor-1) y LEP (leptina), así como la concentración de IGF-1 y leptina en plasma, se asocian con el contenido y la composición de la grasa intramuscular y, en caso de que así fuera, si tal asociación es función de la edad. Se constata que los polimorfismos moleculares estudiados no son neutrales respecto al contenido de grasa intramuscular, pero, también, que sus efectos no son constantes a lo largo del crecimiento. En este sentido, tanto la edad como el estado de engrasamiento pueden modificarlos.
This PhD is part of a line of research conducted in the Department of Animal Production of the Universitat de Lleida dedicated to the genetic improvement of pig meat quality, with particular reference to intramuscular fat content and composition. The PhD comprises four studies, with the first one focusing on the development of a method to jointly determine the content and composition of intramuscular fat from biopsies and small post-mortem samples and, in this way, to carry out studies with longitudinal data. It has been found that this particular methodology is useful and, in for intramuscular fat, small specimens of the target muscle are as informative as large samples of other muscles. In the second study the effect of age on the content and composition of the intramuscular and subcutaneous fat in the fattening period in Duroc pigs was investigated by an experiment using longitudinal data obtained following the methodology described above. It was concluded that a delay in the age of slaughter of the pig leads to an increase in intramuscular fat and oleic acid, although this comes at the cost of reducing the rate of lean growth. Moreover, it was proved that intramuscular and subcutaneous fat behaved differently in terms of fat accretion and composition and that the amount of fat itself affected composition. Whereas, for the intramuscular fat, values above the expected at a given age were because of increased monounsaturated fatty acid content, especially oleic acid, for the subcutaneous fat, they were due to the increased saturated fatty acid content. The final two studies considered whether allelic variation at the IGF-1 (insuline-like growth factor-1) and LEP (leptin) genes, as well as the concentration of IGF-1 and leptin in plasma, are associated to intramuscular fat content and composition and, if so, whether this is a function of age. It can be seen that the molecular polymorphisms studied are not neutral with regard to the content of intramuscular fat, but that their effects are not constant throughout the growing period. In this sense, both age and fatness can modify them.
Wang, Yuker, Victoria Carlton, George Karlin-Neumann, Ronald Sapolsky, Li Zhang, Martin Moorhead, Zhigang Wang, et al. "High quality copy number and genotype data from FFPE samples using Molecular Inversion Probe (MIP) microarrays." BioMed Central, 2009. http://hdl.handle.net/10150/610039.
Full textElom, Hilary, and Shimin Zheng. "The distribution of hepatitis c virus genotypes in US population. Data from NHANES 2006-2016." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/asrf/2018/schedule/116.
Full textAndersson, Alfred. "Neural networks for imputation of missing genotype data : An alternative to the classical statistical methods in bioinformatics." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413635.
Full textRoshyara, Nab Raj, Holger Kirsten, Katrin Horn, Peter Ahnert, and Markus Scholz. "Impact of pre-imputation SNP-filtering on genotype imputation results." Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-151874.
Full textTecle, Tesfaldet. "Biomolecular characterization of mumps virus genotypes with varying neurovirulence /." Stockholm : [Karolinska institutets bibl.], 2002. http://diss.kib.ki.se/2002/91-7349-234-5.
Full textErdogan, Onur. "Predicting The Disease Of Alzheimer (ad) With Snp Biomarkers And Clinical Data Based Decision Support System Using Data Mining Classification Approaches." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614832/index.pdf.
Full textclinical data which is informative for the prediction or the diagnosis of the particular diseases. So far, there is no established approach for selecting the representative SNP subset and patients&rsquo
clinical data, and data mining methodology that is based on finding hidden and key patterns over huge databases. This approach have the highest potential for extracting the knowledge from genomic datasets and to select the number of SNPs and most effective clinical features for diseases that are informative and relevant for clinical diagnosis. In this study we have applied one of the widely used data mining classification methodology: &ldquo
decision tree&rdquo
for associating the SNP Biomarkers and clinical data with the Alzheimer&rsquo
s disease (AD), which is the most common form of &ldquo
dementia&rdquo
. Different tree construction parameters have been compared for the optimization, and the most efficient and accurate tree for predicting the AD is presented.
Shabalina, Taisiia [Verfasser]. "Optimisation of genetic evaluations for longevity in Holstein dairy cattle through special consideration of health traits, SNP marker data and genotype by environment interactions / Taisiia Shabalina." Gießen : Universitätsbibliothek, 2021. http://d-nb.info/1233036637/34.
Full textChurchhouse, Claire. "Bayesian methods for estimating human ancestry using whole genome SNP data." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:0cae8a4a-6989-485b-a7cb-0a03fb86096d.
Full textPiovesan, Pamela. "Validação cruzada com correção de autovalores e regressão isotônica nos modelos AMMI." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-16102007-113618/.
Full textThis paper presents the application of AMMI models for a thorough study about the effect of the interaction between genotypes and environments in multi-environments experiments. Through the decomposition of the sum of squares of these interactions, one searches to select the number of terms that explains this interaction, discarding its noise in. There are two ways for choosing these terms: cross-validation and hypotheses test. The focus will be on the crossvalidation for its advantage of being one prediction criterion of evaluation. Two methods of cross-validation are presented , both outlined by Eastment and Krzanowski (1982) and Gabriel (2002). These methods use the decomposition by singular values in order to obtain eigenvalues referred to the matrix of interactions, whose sum of squares accurately gives us the sum of squares of the interation. As these eigenvalues either over- or underestimated (ARAÚJO; DIAS, 2002), these techniques of validation will be improved through the correction of these eigenvalues and, in order to rearrange them, isotonic regression will be used . A comparative study between these methods through real data will be carried out.
Brown, Steven Richard. "A design of experiments approach for engineering carbon metabolism in the yeast Saccharomyces cerevisiae." Thesis, University of Exeter, 2016. http://hdl.handle.net/10871/26158.
Full textThibord, Florian. "Variation génétique et plasmatique des microARNs : impact sur les paramètres biologiques de l’hémostase OPTIMIR, a novel algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis A Genome Wide Association Study on plasma FV levels identified PLXDC2 as a new modifier of the coagulation process." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS379.
Full textMicroRNAs (miRNA) are small non coding RNAs with an average size of 22 nucleotides, mainly known to regulate gene expression in the cytoplasm. These small RNAs are estimated to regulate the majority of human genes, and are potentially involved in several diseases. MiRNA sequences might contain genetic variants and can undergo post-transcriptional variations, which generate miRNA isoforms called isomiRs. In order to accurately detect and quantify miRNA expression, isomiRs as well as paralogous miRNAs must be accounted for. The optimiR pipeline developed during this project overcome these challenges by integrating genetic information and by implementing an original strategy based on local alignement. Sequencing data were obtained from the MARTHA cohort, which is composed of french unrelated patients who experienced venous thrombosis (VTE). Normalized expression of 162 miRNAs from 334 patients were used to analyze: 1) the genetic determinants of miRNA expression; 2) the association of miRNA expression levels with VTE recurence; 3) the correlations between miRNA expression levels and hemostatic traits. As a whole, these analyses allowed me to identify miRNAs of interest for the study of VTE and hemostasis
Bresso, Emmanuel. "Organisation et exploitation des connaissances sur les réseaux d'intéractions biomoléculaires pour l'étude de l'étiologie des maladies génétiques et la caractérisation des effets secondaires de principes actifs." Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0122/document.
Full textThe understanding of human diseases and drug mechanisms requires today to take into account molecular interaction networks. Recent studies on biological systems are producing increasing amounts of data. However, complexity and heterogeneity of these datasets make it difficult to exploit them for understanding atypical phenotypes or drug side-effects. This thesis presents two knowledge-based integrative approaches that combine data management, graph visualization and data mining techniques in order to improve our understanding of phenotypes associated with genetic diseases or drug side-effects. Data management relies on a generic data warehouse, NetworkDB, that integrates data on proteins and their properties. Customization of the NetworkDB model and regular updates are semi-automatic. Graph visualization techniques have been coupled with NetworkDB. This approach has facilitated access to biological network data in order to study genetic disease etiology, including X-linked intellectual disability (XLID). Meaningful sub-networks of genes have thus been identified and characterized. Drug side-effect profiles have been extracted from NetworkDB and subsequently characterized by a relational learning procedure coupled with NetworkDB. The resulting rules indicate which properties of drugs and their targets (including networks) preferentially associate with a particular side-effect profile
Hartung, Karin. "Biometrical approaches for analysing gene bank evaluation data on barley (Hordeum spec.)." [S.l. : s.n.], 2007. http://nbn-resolving.de/urn:nbn:de:bsz:100-opus-2251.
Full textYilmaz, Kutay. "Seeding Date and Genotype Maturity Interactions on Grain Sorghum [Sorghum bicolor –(L.) Moench] Performance In North Dakota." Thesis, North Dakota State University, 2020. https://hdl.handle.net/10365/32043.
Full textGalván, Femenía Iván. "Compositional methodology and statistical inference of family relationships using genetic markers." Doctoral thesis, Universitat de Girona, 2020. http://hdl.handle.net/10803/672178.
Full textAquesta tesi doctoral és un compendi de tres articles de recerca produïts entre el 2015-2019. Els tres articles són aportacions diferents basades en la metodologia de les dades composicionals i en la inferència estadística de relacions familiars. En el primer treball d'aquesta tesi, revisem els mètodes gràfics clàssics utilitzats per detectar relacions familiars i introduïm l'anàlisi de les dades composicionals per a la investigació de relacions familiars. En el segon, es proposa l'anàlisi de dades de genotips compartits idèntics per estat en lloc de les clàssiques dades d'al·lels compartits. El tercer article finalitza la tesi amb l'elaboració de la raó de versemblances per inferir tres quarts germans en bases de dades genètiques. Per il·lustrar els resultats, s'utilitzen marcadors genètics de projectes de població humana com el Projecte de la Diversitat del Genoma Humà, el Projecte 1000 Genomes i una cohort humana prospectiva local dels genomes de Catalunya (GCAT)
Programa de Doctorat en Tecnologia
Osman, Mohammed A. "Effect of water stress on the physiology, growth, and morphology of three pearl millet genotypes." Diss., The University of Arizona, 1988. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu_e9791_1988_11_sip1_w.pdf&type=application/pdf.
Full textSaïdou, Abdoul-Aziz. "Etude moléculaire, évolution et caractérisation de gènes impliqués dans l'adaptation du mil (Pennisetum glaucum L.) aux changements climatiques." Thesis, Montpellier, SupAgro, 2011. http://www.theses.fr/2011NSAM0002/document.
Full textIn last decades, climate changes led to temperature increase and rainfall variation across the globe. One of the key consequences of these changes is their impact on agriculture and food security. In sahelian countries, food security relies on a few cereal crops, among which pearl millet plays a crucial role for population food supply. Sahel region is facing the impact of rainfall variability and drought since the 1970s. Flowering time variation is one of the main adaptations that allow pearl millet cultivation in drier and shorter rainy seasons. The genetic bases of this complex trait are still understudied. We developed an association mapping framework for the analysis of genotype-phenotype relationship in pearl millet. We successfully identified two genes associated with flowering time variation in pearl millet (PHYC and MADS11). We confirmed these associations using QTL studies. For PHYC, we also examined the pattern of linkage disequilibri um on a chromosomal region extending to 80 kb around the gene, and we developed a Markov Chain Monte Carlo approach (MCMC) to compare six genes identified in this region. Our results suggest that, among the polymorphisms observed in this region, polymorphisms in PHYC are the best candidate for a direct causative role. The second part of this project addressed methodological examination of association mapping framework to deal with genotype by environment interactions. The results of this work were discussed with regard to the challenge of pearl millet crop adaptation to climate change
Pironti, Alejandro [Verfasser], and Thomas [Akademischer Betreuer] Lengauer. "Improving and validating data-driven genotypic interpretation systems for the selection of antiretroviral therapies / Alejandro Pironti ; Betreuer: Thomas Lengauer." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2016. http://d-nb.info/1122110626/34.
Full textBaker, George L. "Flavor formation and sensory perception of selected peanut genotypes (Arachis hypogea L.) as affected by storage water activity, roasting, and planting date." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE1000105.
Full textTitle from title page of source document. Document formatted into pages; contains xii, 130 p.; also contains graphics. Includes vita. Includes bibliographical references.
Santos-Ciminera, Patricia Dantas Ciminera Patricia Dantas Santos Santos Patricia. "Molecular epidemiology of epidemic severe malaria caused by Plasmodium vivax in the state of Amazonas, Brazil /." Download the dissertation in PDF, 2005. http://www.lrc.usuhs.mil/dissertations/pdf/Santos2005.pdf.
Full textLuo, Yuqun. "Incorporation of Genetic Marker Information in Estimating Modelparameters for Complex Traits with Data From Large Complex Pedigrees." The Ohio State University, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=osu1039109696.
Full textPeña, Marisol Garcia. "Alternativas de análise para experimentos G × E multiatributo." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-04052016-111857/.
Full textUsually, in the experiments genotype by environment (G×E) it is common to observe the behaviour of genotypes in relation to different attributes in the environments considered. The analysis of such experiments have been widely discussed for the case of a single attribute. This thesis presents some alternatives of analysis, considering genotypes, environments and attributes simultaneously. The first, is based on the mixture maximum likelihood method - Mixclus and the three-mode principal component analysis, these two methods have been very used in the psychology and chemistry, but little in agriculture. The second, is a methodology that combines the additive main effects and multiplicative interaction models - AMMI, efficient model for the analysis of experiments (G×E) with one attribute, and the generalised procrustes analysis, which allows compare configurations of points and provide a numerical measure of how much they differ. Finally, an alternative to perform data imputation in the experiments (G×E) is presented, because, a very frequent situation in these experiments, is the presence of missing values. It is concluded that the proposed methodologies are useful tools for the analysis of experiments (G×E) multi-attribute.
Liang, Bor-Cherng, and 梁博程. "Haplotype Decomposition and Reconstruction from Large Scale Genotype Data." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/19723834795990807865.
Full text國立清華大學
資訊工程學系
92
In this thesis, we address the problem of haplotype decomposition and reconstruction. While focusing on large scale genotype data, we propose a new framework to determine the haplotype block partitions and to resolve the haplotype pair of each genotype. In implementing the decomposition scheme, we formulate a dynamic programming algorithm to minimize the total number of tag SNPs. For structuring the reconstruction method, we introduce an at-least-one perfect-phylogeny-tree model within each block, and use tiling blocks consisting of tag SNPs among blocks. It turns out that the two elements are well coupled and lead to an accurate and efficient haplotype reconstruction system. Our approach is closely related to the work of Eskin et al.. However, the perfect phylogeny model used in their scheme is restricted by only one perfect phylogeny tree within a block. We instead adopt a more flexible criterion that requires at least one perfect phylogeny tree. Furthermore, in dealing with the difficult problem of resolving whole haplotypes among blocks, we go further to take into account all blocks, whereas their work only considers two adjacent blocks. Specifically, the contributions of our work can be characterized by: (i) an at-least-one prefect-phylogeny-tree model, to fit the real genotype data and improve the accuracy of haplotype resolving within a block; (ii) an informative score function, to resolve a genotype into the most likely pair of haplotypes; (iii) tiling blocks consisting of tag SNPs, to make all of the choices resolvable; and (iii) mutual relation among blocks, to resolve whole haplotypes among blocks by considering all blocks, and to reduce the effects caused by a few erratic choices. We have also included various experimental results to illustrate the advantages of the proposed method. Keywords: Haplotype, tag SNPs, perfect phylogeny tree, tiling block
Xing, L., X. Zhou, Yonghong Peng, R. Zhang, J. Hu, J. Yu, and B. Liu. "Integrating phenotype-genotype data for prioritization of candidate symptom genes." 2013. http://hdl.handle.net/10454/9755.
Full textSymptoms and signs (symptoms in brief) are the essential clinical manifestations for traditional Chinese medicine (TCM) diagnosis and treatments. To gain insights into the molecular mechanism of symptoms, this paper presents a network-based data mining method to integrate multiple phenotype-genotype data sources and predict the prioritizing gene rank list of symptoms. The result of this pilot study suggested some insights on the molecular mechanism of symptoms.
Groth, Philip [Verfasser]. "Knowledge management and discovery for genotype-phenotype data / von Philip Groth." 2009. http://d-nb.info/1000377008/34.
Full textSchwender, Holger [Verfasser]. "Statistical analysis of genotype and gene expression data / by Holger Schwender." 2007. http://d-nb.info/997828072/34.
Full textWang, Wen-Chang, and 王紋璋. "A Monte Carlo Method for Linkage Analysis with Sibship Genotype Data." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/04078523516426903881.
Full text國立中央大學
數學研究所
92
We propose a new Monte Carlo approach to the problem of calculating the conditional probability of inheritance patterns given sibship genotype data in multipoint linkage analysis. By limiting the study to sibships, we hope to have a linkage analysis method that can incorporate general crossover process model and can be used to examine the issue of genetic interference in the context of linkage studies. This thesis is separated into three parts. In Part I, we introduce the new Monte Carlo approach of multipoint linkage analysis. Our approach is mainly an application of importance sampling method. The crossover distribution used in this approach is estimated from the CEPH and Icelandic family genotype data. Estimation of this crossover distribution is described in Part III. To make the computation efficient, we show that the calculation of the probability of legal ordered parental genotype given sibship genotype and inheritance patterns can be carried out quickly by a straight-forward classification of inheritance patterns. To evaluate the performance of our method, we compare the performance of our method with that of GENEHUNTER in terms of the accuracy in calculating the conditional probability of IBD sharing for sib-pairs in CEPH families given the sibship genotype on chromosome 19. In Part II, we deal with the set of legal inheritance vectors for a sibship at one marker. We classify the sibships according to the genotype of the sibs into 9 classes, and list explicitly the set of legal inheritance vectors for each class. Because an inheritance pattern is legal at several markers if it is legal at everyone of these markers, results in Part II can be extended directly to the set of legal inheritance patterns for many markers. We use the result to reduce the time and memory needed in our Monte Carlo approach in Part I. In Part III, we provide a nonparametric estimate of the crossover distribution on the basis of the CEPH family genotype data, the Icelandic family genotype data, and the order of the markers where genotype data are taken. The only assumption employed in this approach is that, in one meiosis, there is at most one crossover point between markers close enough to each other. This estimated crossover distribution can be used in multipoint linkage analysis without the assumption of no interference.
Schriek, Cornelis Arnold. "Analysis and standardization of marker genotype data for DNA fingerprinting applications." Diss., 2011. http://hdl.handle.net/2263/28908.
Full textDissertation (MSc)--University of Pretoria, 2011.
Biochemistry
unrestricted
Li, X., X. Zhou, Yonghong Peng, B. Liu, R. Zhang, J. Hu, J. Yu, C. Jia, and C. Sun. "Network based integrated analysis of phenotype-genotype data for prioritization of candidate symptom genes." 2014. http://hdl.handle.net/10454/10724.
Full textSymptoms and signs (symptoms in brief) are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM). To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. This paper presents a network-based approach for the integrated analysis of multiple phenotype-genotype data sources and the prediction of the prioritizing genes for the associated symptoms. The method first calculates the similarities between symptoms and diseases based on the symptom-disease relationships retrieved from the PubMed bibliographic database. Then the disease-gene associations and protein-protein interactions are utilized to construct a phenotype-genotype network. The PRINCE algorithm is finally used to rank the potential genes for the associated symptoms. The proposed method gets reliable gene rank list with AUC (area under curve) 0.616 in classification. Some novel genes like CALCA, ESR1, and MTHFR were predicted to be associated with headache symptoms, which are not recorded in the benchmark data set, but have been reported in recent published literatures. Our study demonstrated that by integrating phenotype-genotype relationships into a complex network framework it provides an effective approach to identify candidate genes of symptoms.
NSFC Project (61105055, 81230086), China 973 Program (2014CB542903), The National Key Technology R&D Program (2013BAI02B01, 2013BAI13B04), the National S&T Major Special Project on Major New Drug Innovation (2012ZX09503-001-003), and the Fundamental Research Funds for the Central Universities.
Chen, Sui-Pi, and 陳穗碧. "A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/38870357626630831310.
Full text國立交通大學
統計學研究所
100
The detection of susceptibility genes for complex disease is a major challenge for human geneticists. The phenomenon of epistasis, or gene-gene interactions, is particularly difficult to handle for traditional statistical techniques. Over the past few decades, three kind of approaches have been proposed to address this issue. First, a set of approaches modified logistic regression have the direct ability to interpret the result. But it is limited by the parametric model which does not describe the nonlinear relationship between the epistasis and the phenotype. Second, data-mining or machine-learning methods, such as MDR and CART, do not fit a single prespecified model, but rather they attempt to step through the space of possible models in a computationally efficient way to address the problem from the regression-based approach. However, as genomic technologies rapidly advance, the explosion of epistasis makers make exhaustive searches of multilocus combinations computationally infeasible. Bayesian model selection techniques offer an alternative approach for selecting loci and the interactions between them that are the best predictors of phenotype. A representative algorithm is Bayesian epistasis association mapping (BEAM). This paper applies a Bayesian formulation of a clustering procedure for identification of gene-gene interactions under case-control studies, called Bayesian clustering for detecting epistasis (BCDE) model. BCDE model uses the Dirichlet process mixtures to model SNP marker partitions and the Gibbs weighted Chinese restaurant sampling to simulate posterior distributions of these partitions. Unlike the representative Bayesian epistasis detection algorithm BEAM where markers are partitioned into three groups, BCDE model can be evaluated at any given partition, regardless of the number of groups. We further develop a permutation test to validate the disease association for SNP subsets identified by BCDE model, which can yield results that are more robust to model specification and prior assumptions. Performance of BCDE model and comparison with BEAM are examined on various simulated data and a schizophrenia SNP dataset.
Ng, K. C. S., J. C. S. Ngabonziza, P. Lempens, Jong B. C. de, Leth F. van, and Conor J. Meehan. "Bridging the TB data gap: in silico extraction of rifampicin-resistant tuberculosis diagnostic test results from whole genome sequence data." 2019. http://hdl.handle.net/10454/17491.
Full textBackground: Mycobacterium tuberculosis rapid diagnostic tests (RDTs) are widely employed in routine laboratories and national surveys for detection of rifampicinresistant (RR)-TB. However, as next-generation sequencing technologies have become more commonplace in research and surveillance programs, RDTs are being increasingly complemented by whole genome sequencing (WGS). While comparison between RDTs is difficult, all RDT results can be derived from WGS data. This can facilitate continuous analysis of RR-TB burden regardless of the data generation technology employed. By converting WGS to RDT results, we enable comparison of data with different formats and sources particularly for low- and middle-income high TB-burden countries that employ different diagnostic algorithms for drug resistance surveys. This allows national TB control programs (NTPs) and epidemiologists to utilize all available data in the setting for improved RR-TB surveillance. Methods: We developed the Python-based MycTB Genome to Test (MTBGT) tool that transforms WGS-derived data into laboratory-validated results of the primary RDTs—Xpert MTB/RIF, XpertMTB/RIF Ultra, GenoType MDRTBplus v2.0, and GenoscholarNTM+MDRTB II. The tool was validated through RDT results of RR-TB strains with diverse resistance patterns and geographic origins and applied on routine-derived WGS data. Results: The MTBGT tool correctly transformed the single nucleotide polymorphism (SNP) data into the RDT results and generated tabulated frequencies of the RDT probes as well as rifampicin-susceptible cases. The tool supplemented the RDT probe reactions output with the RR-conferring mutation based on identified SNPs. The MTBGT tool facilitated continuous analysis of RR-TB and Xpert probe reactions from different platforms and collection periods in Rwanda. Conclusion: Overall, the MTBGT tool allows low- and middle-income countries to make sense of the increasingly generated WGS in light of the readily available RDT.
Erasmus Mundus Joint Doctorate Fellowship grant 2016- 1346.
Shields, Phil. "Estimating grain yield using spectral reflectance data in winter wheat genotypes." 1987. http://hdl.handle.net/2097/22227.
Full textCai, Yimei. "Estimation of the seed dispersal distribution with genotypic data." 2007. http://purl.galileo.usg.edu/uga%5Fetd/cai%5Fyimei%5F200712%5Fphd.
Full textFarrell, John J. "The prediction of HLA genotypes from next generation sequencing and genome scan data." Thesis, 2014. https://hdl.handle.net/2144/14694.
Full textThornton-Wells, Tricia A. "Comparison of three clustering methods for dissecting trait heterogeneity in simulated genotypic data." Diss., 2005. http://etd.library.vanderbilt.edu/ETD-db/available/etd-07182005-122343/.
Full textCollier, Robert. "Empirically Evaluated Improvements to Genotypic Spatial Distance Measurement Approaches for the Genetic Algorithm." Thesis, 2012. http://hdl.handle.net/10214/3565.
Full textLu, Kun-Chuan, and 呂坤泉. "Effects of Harvest Date on Pod Maturity, Yield and Quality of Four Peanut Genotypes." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/85856815036275153826.
Full text國立中興大學
農藝學研究所
84
To study the effects of harvest date on pod maturity, yield and quality of peanut, two Spanish type (TN 11 and TNG 6) and two Virginia type (Li-chu-tzae and VB313 ) peanut genotypes were grown with two space-in-rows in the field of the Taiwan Agricultural Research Institute in the spring and fall crop seasons of 1995. The results were summarized as follows: 1.The first and second branches of peanuts set more pods than any of other branches. More pods set at the upper branches of Virginia type than Spanish type. Due to growth habits, some upper pods may have earlier maturity than lower pods. 2.Peanuts grown in 30 cm of space-in-row had higher pod weight and number per plant than those in 10 cm of space- in-row. However, peanuts grown in 10cm of space-in-row had significant higher pod and seed yield per hectare than those in 30cm of space-in-row. 3.The maturing pods usually were less than 70% for four genotypes grown in both spring and fall crop seasons. For the Spanish type peanuts, 108-115 days after first bloom stage in spring or 88-95 days after first bloom stage in fall is the most appropriate harvest date. As for the Virgina type peanuts, the most appropriate harvest date is about 115 days and 95 days after first bloom stage in sprint and fall, separately. 4.Higher oil content was found in the well maturing peanuts and in the spring season. However, protein content was slightly affected by the change of harvest date.
Breazel, Ellen Hepfer. "Effect of common errors in microsatellite data on estimates of population differentiation and inferring genotypic structure of complex disease loci using genome-wide expression data." 2008. http://purl.galileo.usg.edu/uga%5Fetd/breazel%5Fellen%5Fh%5F200808%5Fphd.
Full text