To see the other types of publications on this topic, follow the link: Genomics – methods.

Dissertations / Theses on the topic 'Genomics – methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Genomics – methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Eriksen, Niklas. "Combinatorial methods in comparative genomics." Doctoral thesis, KTH, Mathematics, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3508.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lo, Chi Ho. "Statistical methods for high throughput genomics." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/13762.

Full text
Abstract:
The advancement of biotechnologies has led to indispensable high-throughput techniques for biological and medical research. Microarray is applied to monitor the expression levels of thousands of genes simultaneously, while flow cytometry (FCM) offers rapid quantification of multi-parametric properties for millions of cells. In this thesis, we develop approaches based on mixture modeling to deal with the statistical issues arising from both high-throughput biological data sources. Inference about differential expression is a typical objective in analysis of gene expression data. The use of Bayesian hierarchical gamma-gamma and lognormal-normal models is popular for this type of problem. Some unrealistic assumptions, however, have been made in these frameworks. In view of this, we propose flexible forms of mixture models based on an empirical Bayes approach to extend both frameworks so as to release the unrealistic assumptions, and develop EM-type algorithms for parameter estimation. The extended frameworks have been shown to significantly reduce the false positive rate whilst maintaining a high sensitivity, and are more robust to model misspecification. FCM analysis currently relies on the sequential application of a series of manually defined 1D or 2D data filters to identify cell populations of interest. This process is time-consuming and ignores the high-dimensionality of FCM data. We reframe this as a clustering problem, and propose a robust model-based clustering approach based on t mixture models with the Box-Cox transformation for identifying cell populations. We describe an EM algorithm to simultaneously handle parameter estimation along with transformation selection and outlier identification, issues of mutual influence. Empirical studies have shown that this approach is well adapted to FCM data, in which a high abundance of outliers and asymmetric cell populations are frequently observed. Finally, in recognition of concern for an efficient automated FCM analysis platform, we have developed an R package called flowClust to automate the gating analysis with the proposed methodology. Focus during package development has been put on the computational efficiency and convenience of use at users' end. The package offers a wealth of tools to summarize and visualize features of the clustering results, and is well integrated with other FCM packages.
APA, Harvard, Vancouver, ISO, and other styles
3

Fuxelius, Hans-Henrik. "Methods and Applications in Comparative Bacterial Genomics." Doctoral thesis, Uppsala universitet, Molekylär evolution, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8398.

Full text
Abstract:
Comparative studies of bacterial genomes, now counting in the hundreds, generate massive amounts of information. In order to support a systematic and efficient approach to genomic analyses, a database driven system with graphic visualization of genomic properties was developed - GenComp. The software was applied to studies of obligate intracellular bacteria. In all studies, ORFs were extracted and grouped into ORF-families. Based on gene order synteny, orthologous clusters of core genes and variable spacer ORFs were identified and extracted for alignments and computation of substitution frequencies. The software was applied to the genomes of six Chlamydia trachomatis strains to identify the most rapidly evolving genes. Five genes were chosen for genotyping, and close to a 3-fold higher discrimination capacity was achieved than that of serotypes. With GenComp as the backbone, a massive comparative analysis were performed on the variable gene set in the Rickettsiaceae, which includes Rickettsia prowazekii and Orientia tsutsugamushi, the agents of epidemic and scrub typhus, respectively. O. tsutsugamushi has the most exceptional bacterial genome identified to date; the 2.2 Mb genome is 200-fold more repeated than the 1.1 Mb R. prowazekii genome due to an extensive proliferation of conjugative type IV secretion systems and associated genes. GenComp identified 688 core genes that are conserved across 7 closely related Rickettsia genomes along with a set of 469 variably present genes with homologs in other species. The analysis indicates that up to 70% of the extensively degraded and variably present genes represent mobile genetic elements and genes putatively acquired by horizontal gene transfer. This explains the paradox of the high pseudogene load in the small Rickettsia genomes. This study demonstrates that GenComp provides an efficient system for pseudogene identification and may help distinguish genes from spurious ORFs in the many pan-genome sequencing projects going on worldwide.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Yang. "Statistical Methods for Large-Scale Integrative Genomics." Thesis, Harvard University, 2016. http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493551.

Full text
Abstract:
In the past 20 years, we have witnessed a significant advance of high-throughput genetic and genomic technologies. With the massively generated genomics data, there is a pressing need for statistical methods that can utilize them to make quantitative inference on substantive scientific questions. My research has been focusing on statistical methods for large-scale integrative genomics. The human genome encodes more than 20,000 genes, while the functions of about 50% (>10,000) genes remains unknown up to date. The determination of the functions of the poorly characterized genes is crucial for understanding biological processes and human diseases. In the era of Big Data, the availability of massive genomic data provides us unprecedented opportunity to identify the association between genes and predict their biological functions. Genome sequencing data and mRNA expression data are the two most important classes of genomic data. This thesis presents three research projects in self-contained chapters: (1) a statistical framework for inferring evolutionary history of human genes and identifying gene modules with shared evolutionary history from genome sequencing data, (2) a statistical method to predict frequent and specific gene co-expression by integrating a large number of mRNA expression datasets, and (3) robust variable and interaction selection for high-dimensional classification problem under the discriminant analysis and logistic regression model. Chapter 1. Human has more than 20,000 genes but till now most of their functions are uncharacterized. Determination of the function for poorly characterized genes is crucial for understanding biological processes and study of human diseases. Functionally associated genes tend to gain and lose simultaneously during evolution, therefore identifying co-evolution of genes predicts gene-gene associations. In this chapter, we propose a mixture of tree-structured hidden Markov models for gene evolution process, and a Bayesian model-based clustering algorithm to detect gene modules with shared evolutionary history (named as evolutionary conserved modules, ECM). Dirichlet process prior is adopted for estimation of number of gene clusters and an efficient Gibbs sampler is developed for posterior distribution computation. By simulation study and benchmarks on real data sets, we show that our algorithm outperforms traditional methods that use simple metrics (e.g. Hamming distance, Pearson correlation) to measure the similarity between genes presence/absence patterns. We apply our methods on 1,025 canonical human pathways gene sets, and found a large portion of the detected gene associations are substantiated by other sources of evidence. The rest of genes have predicted functions of high priority to be verified by further biological experiments. Chapter 2. The availability of gene expression measurements across thousands of experimental conditions provides the opportunity to predict gene function based on shared mRNA expression. While many biological complexes and pathways are coordinately expressed, their genes may be organized into co-expression modules with distinct patterns in certain tissues or conditions, which can provide insight into pathway organization and function. We developed the algorithm CLIC (clustering by inferred co-expression, www.gene-clic.org) that clusters a set of functionally-related genes into co-expressed modules, highlights the most relevant datasets, and predicts additional co-expressed genes. Using a statistical Bayesian partition model, CLIC simultaneously partitions the input gene set into disjoint co-expression modules and weights the most relevant datasets for each module. CLIC then expands each module with additional members that co-express with the module’s genes more than the background model in the weighted datasets. We applied CLIC to (i) model the background correlation in each of 3,662 mouse and human microarray datasets from the Gene Expression Omnibus (GEO), (ii) partition each of 900 annotated complexes/pathways into co-expression modules, and (iii) expand each co-expression module with additional genes showing frequent and specific co-expression over multiple GEO datasets. CLIC provided very strong functional predictions for many completely uncharacterized genes, including a link between protein C7orf55 and the mitochondrial ATP synthase complex that we experimentally validated via CRISPR knock-out. CLIC software is freely available and should become increasingly powerful with the growing wealth of transcriptomic datasets. Chapter 3. Discriminant analysis and logistic regression are fundamental tools for classification problems. Quadratic discriminant analysis has the ability to exploit interaction effects of predictors, but the selection of interaction terms is non-trivial and the Gaussian assumption is often too restrictive for many real problems. Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms, where in the forward stage, a stepwise procedure is conducted to screen for important predictors with both main and interaction effects, and in the backward stage SODA remove insignificant terms so as to optimize the extended BIC (EBIC) criterion. Compared with existing methods on quadratic discriminant analysis variable selection (e.g., (Murphy et al., 2010), (Zhang and Wang, 2011) and (Maugis et al., 2011)), SODA can deal with high-dimensional data with the number of predictors much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. Theoretical analysis establishes the consistency of SODA under high-dimensional setting. Empirical performance of SODA is assessed on both simulated and real data and is found to be superior to all existing methods we have tested. For all the three real datasets we have studied, SODA selected more parsimonious models achieving higher classification accuracies compared to other tested methods.
Statistics
APA, Harvard, Vancouver, ISO, and other styles
5

Ausmees, Kristiina. "Efficient computational methods for applications in genomics." Licentiate thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-396409.

Full text
Abstract:
During the last two decades, advances in molecular technology have facilitated the sequencing and analysis of ancient DNA recovered from archaeological finds, contributing to novel insights into human evolutionary history. As more ancient genetic information has become available, the need for specialized methods of analysis has also increased. In this thesis, we investigate statistical and computational models for analysis of genetic data, with a particular focus on the context of ancient DNA. The main focus is on imputation, or the inference of missing genotypes based on observed sequence data. We present results from a systematic evaluation of a common imputation pipeline on empirical ancient samples, and show that imputed data can constitute a realistic option for population-genetic analyses. We also discuss preliminary results from a simulation study comparing two methods of phasing and imputation, which suggest that the parametric Li and Stephens framework may be more robust to extremely low levels of sparsity than the parsimonious Browning and Browning model. An evaluation of methods to handle missing data in the application of PCA for dimensionality reduction of genotype data is also presented. We illustrate that non-overlapping sequence data can lead to artifacts in projected scores, and evaluate different methods for handling unobserved genotypes. In genomics, as in other fields of research, increasing sizes of data sets are placing larger demands on efficient data management and compute infrastructures. The last part of this thesis addresses the use of cloud resources for facilitating such analysis. We present two different cloud-based solutions, and exemplify them on applications from genomics.
eSSENCE
APA, Harvard, Vancouver, ISO, and other styles
6

Pappalardo, Elisa. "Combinatorial optimization methods for problems in genomics." Doctoral thesis, Università di Catania, 2012. http://hdl.handle.net/10761/1029.

Full text
Abstract:
I recenti progressi in genomica hanno sollevato una miriade di problemi estremamente stimolanti dal punto di vista computazionale; in particolare, per molti di essi e' stata provata l'appartenenza alla classe dei problemi NP-hard. Sulla base di questi risultati, grande attenzione e' stata posta allo sviluppo di algoritmi che fornissero soluzioni soddisfacenti con uno sforzo computazionale contenuto; in tale contesto, i metodi di ottimizzazione rappresentano un valido approccio in quanto molti problemi richiedono l'individuazione di soluzioni caratterizzati da costo minimo. Questo lavoro di tesi introduce nuovi metodi di ottimizzazione combinatoria per l'analisi e il design di sequenze nucleotidiche. In particolare, la tesi e' focalizzata su metodi effi cienti per la risoluzione del Non-Unique Probe Selection Problem e del Closest String Problem. I risultati sperimentali hanno evidenziato che i nuovi approcci introdotti rappresentano metodi e fficienti e competitivi con lo stato dell'arte e, in molti casi, essi sono in grado di individuare soluzioni migliori rispetto a quelle note in letteratura.
APA, Harvard, Vancouver, ISO, and other styles
7

Ericsson, Ulrika. "A structural genomics pilot project : methods and applications /." Stockholm : Department of Biochemistry and Biophysics, Stockholm University, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-1060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tchitchek, Nicolas. "Novel statistical and geometrical methods for integrative genomics." Paris 7, 2011. http://www.theses.fr/2011PA077207.

Full text
Abstract:
Durant les trois années de mon projet de doctorat, j'ai développé plusieurs méthodes complémentaires pour l'analyse de données de type -omique, dont: (i) un modèle pour la génomique intégrative dans lequel toutes les sortes d'informations qui peuvent être obtenues sur un génome sont modélisées d'une manière probabiliste unifiée, permettant ainsi d'analyser les corrélations entre des données hétérogènes à l'échelle du génome, (ii) un test statistique ayant pour critère l'amplification de l'expression pour l'identification de gènes différentiellement et similairement exprimés entre deux conditions biologiques, et permettant la détermination d'intervalles de confiance concernant l'amplification, (iii) de nouvelles méthodes de réduction de dimensionnalité qui surpassent les autres méthodes existantes et produisant des représentations géométriques plus facilement interprétables dans le contexte de grands ensembles de données. Ces méthodes ont été appliquées à plusieurs nalyses et études biologiques dans le cadre de collaborations scientifiques: (i) afin d'identifier des domaines fonctionnels dans les régions promotrices de gènes candidats impliqués dans le pseudohypoaldostéronisme. (ii) pour découvrir les réponses transcriptionnelles qui sous-tendent les différences entre les virus pulmonaires faiblement et fortement pathogènes basé sur un ensemble de réponses transcriptomiques. (iii) afin d'étudier la progression du virus de l'hépatite C chez des patients infectés ayant subi une transplantation hépatique (iv) afin d'analyser une banque de marqueur de séquences exprimées obtenues à partir de cellules de sang périphérique de singes verts africains infectés ou non par le SIV
During the three years of my Ph. D. Project, I developed several complementary methods and frameworks for the analysis of -omics data, such as: (i) a framework for integrative genomics in which every kind of information that can be obtained about the genomic processes and features are modeled in a common probabilistic manner, allowing then to analyze the correlations among the heterogeneous genome-wide information, (ii) a fold-change based statistical test for the identification of differentially and similarly expressed genes between two biological conditions, allowing also the determination of confidence intervals of specific confidence levels for the fold-change. (iii) novel dimensionality reduction methods that outperform other related existing methods and provide more interpretable geometrical representations in the context of large dataset of-omics data. These methods have been applied to several biological analyses and studies as part of different scientific collaborations: (i) to identify functional Glucocorticoid Response Elements in the promoter regions of specific candidate genes involved in Type 1 Pseudohypoaldosteronism. (ii) to uncover the host transcriptional responses underlying differences between low- and high- pathogenic pulmonary viruses based on a compendium of host transcription responses of infected cells from mouse lungs. (iii) to study the progression of the hepatitis C virus in infected patients who underwent orthotopic liver transplantation, based on a cohort of transcriptome profiles for liver biopsy specimens, (iv) to analyze an Expression Sequence Tag library obtained from PBMC of African green monkeys infected or not by the SIV
APA, Harvard, Vancouver, ISO, and other styles
9

Ming, Jingsi. "Statistical methods for integrative analysis of genomic data." HKBU Institutional Repository, 2018. https://repository.hkbu.edu.hk/etd_oa/545.

Full text
Abstract:
Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still several challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, most complex traits are suggested to be highly polygenic, i.e., they are affected by a vast number of risk variants with individually small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Third, accumulating evidence from GWAS suggests the pervasiveness of pleiotropy, a phenomenon that some genetic variants can be associated with multiple traits, but there is a lack of unified framework which is scalable to reveal relationship among a large number of traits and prioritize genetic variants simultaneously with functional annotations integrated. In this thesis, we propose two statistical methods to address these challenges using integrative analysis of summary statistics from GWASs and functional annotations. In the first part, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization (EM) algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWASs of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes. In the second part, we propose a latent probit model (LPM) which combines summary statistics from multiple GWASs and functional annotations, to characterize relationship and increase statistical power to identify risk variants. LPM can also perform hypothesis testing for pleiotropy and annotations enrichment. To enable the scalability of LPM as the number of GWASs increases, we developed an efficient parameter-expanded EM (PX-EM) algorithm which can execute parallelly. We first validated the performance of LPM through comprehensive simulations, then applied it to analyze 44 GWASs with nine genic category annotations. The results demonstrate the benefits of LPM and can offer new insights of disease etiology.
APA, Harvard, Vancouver, ISO, and other styles
10

Sofer, Tamar. "Statistical Methods for High Dimensional Data in Environmental Genomics." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10403.

Full text
Abstract:
In this dissertation, we propose methodology to analyze high dimensional genomics data, in which the observations have large number of outcome variables, in addition to exposure variables. In the Chapter 1, we investigate methods for genetic pathway analysis, where we have a small number of exposure variables. We propose two Canonical Correlation Analysis based methods, that select outcomes either sequentially or by screening, and show that the performance of the proposed methods depend on the correlation between the genes in the pathway. We also propose and investigate criterion for fixing the number of outcomes, and a powerful test for the exposure effect on the pathway. The methodology is applied to show that air pollution exposure affects gene methylation of a few genes from the asthma pathway. In Chapter 2, we study penalized multivariate regression as an efficient and flexible method to study the relationship between large number of covariates and multiple outcomes. We use penalized likelihood to shrink model parameters to zero and to select only the important effects. We use the Bayesian Information Criterion (BIC) to select tuning parameters for the employed penalty and show that it chooses the right tuning parameter with high probability. These are combined in the “two-stage procedure”, and asymptotic results show that it yields consistent, sparse and asymptotically normal estimator of the regression parameters. The method is illustrated on gene expression data in normal and diabetic patients. In Chapter 3 we propose a method for estimation of covariates-dependent principal components analysis (PCA) and covariance matrices. Covariates, such as smoking habits, can affect the variation in a set of gene methylation values. We develop a penalized regression method that incorporates covariates in the estimation of principal components. We show that the parameter estimates are consistent and sparse, and show that using the BIC to select the tuning parameter for the penalty functions yields good models. We also propose the scree plot residual variance criterion for selecting the number of principal components. The proposed procedure is implemented to show that the first three principal components of genes methylation in the asthma pathway are different in people who did not smoke, and people who did.
APA, Harvard, Vancouver, ISO, and other styles
11

Lu, Rong. "Statistical Methods for Functional Genomics Studies Using Observational Data." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1467830759.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Lu, Mengyin. "Generalized Adaptive Shrinkage Methods and Applications in Genomics Studies." Thesis, The University of Chicago, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10974422.

Full text
Abstract:

Shrinkage procedures have played an important role in helping improve estimation accuracy for a variety of applications. In genomics studies, the gene-specific sample statistics are usually noisy, especially when sample size is limited. Hence some shrinkage methods (e.g. limma) have been proposed to increase statistical power in identifying differentially expressed genes. Motivated by the success of shrinkage methods, Stephens (2016) proposed a novel approach, Adaptive Shrinkage (ash) for large-scale hypothesis testing including false discovery rate and effect size estimation, based on the fundamental “unimodal assumption” (UA) that the distribution of the actual unobserved effects has a single mode.

Even though ash primarily dealt with normal or student-t distributed observations, the idea of UA can be widely applied to other types of data. In this dissertation, we propose a general flexible Bayesian shrinkage framework based on UA, which is easily applicable to a wide range of settings. This framework allows us to deal with data involving other noise distributions (gamma, F, Poisson, binomial, etc.). We illustrate its flexibility in a variety of genomics applications including: differential gene expression analysis on RNA-seq data; comparison between bulk RNA-seq and single cell RNA-seq data; gene expression distribution deconvolution for single cell RNA-seq data, etc.

APA, Harvard, Vancouver, ISO, and other styles
13

Sharpnack, Michael F. Sharpnack. "Integrative Genomics Methods for Personalized Treatment of Non-Small-Cell LungCancer." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1523890139956055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Manser, Paul. "Methods for Integrative Analysis of Genomic Data." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3638.

Full text
Abstract:
In recent years, the development of new genomic technologies has allowed for the investigation of many regulatory epigenetic marks besides expression levels, on a genome-wide scale. As the price for these technologies continues to decrease, study sizes will not only increase, but several different assays are beginning to be used for the same samples. It is therefore desirable to develop statistical methods to integrate multiple data types that can handle the increased computational burden of incorporating large data sets. Furthermore, it is important to develop sound quality control and normalization methods as technical errors can compound when integrating multiple genomic assays. DNA methylation is a commonly studied epigenetic mark, and the Infinium HumanMethylation450 BeadChip has become a popular microarray that provides genome-wide coverage and is affordable enough to scale to larger study sizes. It employs a complex array design that has complicated efforts to develop normalization methods. We propose a novel normalization method that uses a set of stable methylation sites from housekeeping genes as empirical controls to fit a local regression hypersurface to signal intensities. We demonstrate that our method performs favorably compared to other popular methods for the array. We also discuss an approach to estimating cell-type admixtures, which is a frequent biological confound in these studies. For data integration we propose a gene-centric procedure that uses canonical correlation and subsequent permutation testing to examine correlation or other measures of association and co-localization of epigenetic marks on the genome. Specifically, a likelihood ratio test for general association between data modalities is performed after an initial dimension reduction step. Canonical scores are then regressed against covariates of interest using linear mixed effects models. Lastly, permutation testing is performed on weighted correlation matrices to test for co-localization of relationships to physical locations in the genome. We demonstrate these methods on a set of developmental brain samples from the BrainSpan consortium and find substantial relationships between DNA methylation, gene expression, and alternative promoter usage primarily in genes related to axon guidance. We perform a second integrative analysis on another set of brain samples from the Stanley Medical Research Institute.
APA, Harvard, Vancouver, ISO, and other styles
15

Najafzadeh, Mehdi. "Integration of genomics into clinical care : methods for economic evaluation." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/41113.

Full text
Abstract:
Background: As genomic technologies become more affordable, the demand for having these data will increase. Decision-makers must anticipate the increasing influence of genomics on heath care systems and take into account the expectations of patients, the public, health care providers, and industry in this regard. This thesis demonstrates applications of several methods for evaluation of genomic technologies in medicine. Using four case studies, I have highlighted the advantages that each method can offer given the nature and scope of the research question in each case study. Objectives: My specific objectives in the case studies were: 1) To elicit the preferences of cancer patients as well as the public for a hypothetical, genetically-guided treatment for cancer (a discrete choice experiment ); 2) To estimate the relative importance of attributes which influence physicians’ decisions for using personalized medicine in their practice (a Best Worst Scaling choice experiment); 3) To evaluate the impact of three potential genomic/proteomic tests on the long term burden of COPD in Canada (a system dynamics model); 4) To measure the cost-effectiveness of adding a new molecular diagnostic test (DX) to the current diagnostic strategy for thyroid cancer (using a discrete event simulation). Methods: Through these case studies, I have demonstrated the particular advantages of using discrete choice experiment (DCE), best-worst scaling (BWS) experiment, system dynamics simulation, and discrete event simulation (DES) for evaluations of genomic technologies. Results: Using four case studies I exemplified the questions that emerge in the process of integrating genomics into clinical care. In addition to bridging the methodological gaps by incorporating several novel methods (BWS, dynamic systems, and DES), the selected case studies illustrated the practical issues regarding the integration of genomics into clinical care from the perspective of patients, the public, health care providers, and decisionmakers. Conclusion: Although the methods previously developed for health technology assessment can be applied to the evaluation of genomic technologies as well, methodological challenges in the evaluation of genomic applications entail utilizing more diverse and more sophisticated analytical tools.
APA, Harvard, Vancouver, ISO, and other styles
16

Walter, Klaudia. "Statistical methods for comparative genomics in the field of bioinformatics." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Rancoita, P. M. V. "Stochastic methods in cancer research. Applications to genomics and angiogenesis." Doctoral thesis, Università degli Studi di Milano, 2010. http://hdl.handle.net/2434/152007.

Full text
Abstract:
In recent years, interactions between mathematicians and biomedical researchers have increased due to both the complexity of the biological/medical issues and the development of new technologies, producing “large” data rich of information. Biomathematics is applied in many areas, such as epidemiology, clinical trial design, neuroscience, disease modeling, genomics, proteomics, etc. Cancer is a multistep process where the accumulation of genomic lesions alters cell biology. The latter is under control of several pathways and, thus, cancer can origin via different mechanisms affecting different pathways. However, usually, more than one of these mechanisms needs to be damaged before a cell becomes cancerous. Due to the general complexity of this disease and the different type of tumors, the efforts of cancer research cover several research areas such as, for example, immunology, genetics, cell biology, angiogenesis. As a consequence, many biostatistical topics can be applied. The thesis is divided into two parts. In the former, two Bayesian regression methods for the analysis of two types of cancer genomic data are proposed. In the latter, the properties of two estimators of the intensity of a stationary fibre process are studied, which can be applied for the characterization of angiogenic and vascular processes. (Pubblicata - vedi http://hdl.handle.net/2434/159517)
APA, Harvard, Vancouver, ISO, and other styles
18

Campbell, Kieran. "Probabilistic modelling of genomic trajectories." Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:24e6704c-8a7f-4967-9fcd-95d6034eab39.

Full text
Abstract:
The recent advancement of whole-transcriptome gene expression quantification technology - particularly at the single-cell level - has created a wealth of biological data. An increasingly popular unsupervised analysis is to find one dimensional manifolds or trajectories through such data that track the development of some biological process. Such methods may be necessary due to the lack of explicit time series measurements or due to asynchronicity of the biological process at a given time. This thesis aims to recast trajectory inference from high-dimensional "omics" data as a statistical latent variable problem. We begin by examining sources of uncertainty in current approaches and examine the consequences of propagating such uncertainty to downstream analyses. We also introduce a model of switch-like differentiation along trajectories. Next, we consider inferring such trajectories through parametric nonlinear factor analysis models and demonstrate that incorporating information about gene behaviour as informative Bayesian priors improves inference. We then consider the case of bifurcations in data and demonstrate the extent to which they may be modelled using a hierarchical mixture of factor analysers. Finally, we propose a novel type of latent variable model that performs inference of such trajectories in the presence of heterogeneous genetic and environmental backgrounds. We apply this to both single-cell and population-level cancer datasets and propose a nonparametric extension similar to Gaussian Process Latent Variable Models.
APA, Harvard, Vancouver, ISO, and other styles
19

Brumm, Jochen. "Finding functional groups of genes using pairwise relational data : methods and applications." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/688.

Full text
Abstract:
Genes, the fundamental building blocks of life, act together (often through their derived proteins) in modules such as protein complexes and molecular pathways to achieve a cellular function such as DNA repair and cellular transport. A current emphasis in genomics research is to identify gene modules from gene profiles, which are measurements (such as a mutant phenotype or an expression level), associated with the individual genes under conditions of interest; genes in modules often have similar gene profiles. Clustering groups of genes with similar profiles can hence deliver candidate gene modules. Pairwise similarity measures derived from these profiles are used as input to the popular hierarchical agglomerative clustering algorithms; however, these algorithms offer little guidance on how to choose candidate modules and how to improve a clustering as new data becomes available. As an alternative, there are methods based on thresholding the similarity values to obtain a graph; such a graph can be analyzed through (probabilistic) methods developed in the social sciences. However, thresholding the data discards valuable information and choosing the threshold is difficult. Extending binary relational analysis, we exploit ranked relational data as the basis for two distinct approaches for identifying modules from genomic data, both based on the theory of random graph processes. We propose probabilistic models for ranked relational data that allow candidate modules to be accompanied by objective confidence scores and that permit an elegant integration of external information on gene-gene relationships. We first followed theoretical work by Ling to objectively select exceptionally isolated groups as candidate gene modules. Secondly, inspired by stochastic block models used in the social sciences, we construct a novel model for ranked relational data, where all genes have hidden module parameters which govern the strength of all gene-gene relationships. Adapting a classical likelihood often used for the analysis of horse races, clustering is performed by estimating the module parameters using standard Bayesian methods. The method allows the incorporation of prior information on gene-gene relationships; the utility of using prior information in the form of protein-protein interaction data in clustering of yeast mutant phenotype profiles is demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
20

Jentzsch, Iris Miriam Vargas. "Comparative genomics of microsatellite abundance: a critical analysis of methods and definitions." Thesis, University of Canterbury. Biological Sciences, 2009. http://hdl.handle.net/10092/4282.

Full text
Abstract:
This PhD dissertation is focused on short tandemly repeated nucleotide patterns which occur extremely often across DNA sequences, called microsatellites. The main characteristic of microsatellites, and probably the reason why they are so abundant across genomes, is the extremely high frequency of specific replication errors occurring within their sequences, which usually cause addition or deletion of one or more complete tandem repeat units. Due to these errors, frequent fluctuations in the number of repetitive units can be observed among cellular and organismal generations. The molecular mechanisms as well as the consequences of these microsatellite mutations, both, on a generational as well as on an evolutionary scale, have sparked debate and controversy among the scientific community. Furthermore, the bioinformatic approaches used to study microsatellites and the ways microsatellites are referred to in the general literature are often not rigurous, leading to misinterpretations and inconsistencies among studies. As an introduction to this complex topic, in Chapter I I present a review of the knowledge accumulated on microsatellites during the past two decades. A major part of this chapter has been published in the Encyclopedia of Life Sciences in a Chapter about microsatellite evolution (see Publication 1 in Appendix II). The ongoing controversy about the rates and patterns of microsatellite mutation was evident to me since before starting this PhD thesis. However, the subtler problems inherent to the computational analyses of microsatellites within genomes only became apparent when retrieving information on microsatellite distribution and abundance for the design of comparative genomic analyses. There are numerous publications analyzing the microsatellite content of genomes but, in most cases, the results presented can neither be reliably compared nor reproduced, mainly due to the lack of details on the microsatellite search process (particularly the program’s algorithm and the search parameters used) and because the results are expressed in terms that are relative to the search process (i.e. measures based on the absolute number of microsatellites). Therefore, in Chapter II I present a critical review of all available software tools designed to scan DNA sequences for microsatellites. My aim in undertaking this review was to assess the comparability of search results among microsatellite programs, and to identify the programs most suitable for the generation of microsatellite datasets for a thorough and reproducible comparative analysis of microsatellite content among genomic sequences. Using sequence data where the number and types of microsatellites were empirical know I compared the ability of 19 programs to accurately identify and report microsatellites. I then chose the two programs which, based on the algorithm and its parameters as well as the output informativity, offered the information most suitable for biological interpretation, while also reflecting as close as possible the microsatellite content of the test files. From the analysis of microsatellite search results generated by the various programs available, it became apparent that the program’s search parameters, which are specified by the user in order to define the microsatellite characteristics to the program, influence dramatically the resulting datasets. This is especially true for programs suited to allow imperfections within tandem repeats, because imperfect repetitions can not be defined accurately as is the case for perfect ones, and because several different algorithms have been proposed to address this problem. The detection of approximate microsatellites is, however, essential for the study of microsatellite evolution and for comparative analyses based on microsatellites. It is now well accepted that small deviations from perfect tandem repeat structure are common within microsatellites and larger repeats, and a number of different algorithms have been developed to confront the challenge of finding and registering microsatellites with all expectable kinds of imperfection. However, biologists have still to apply these tools to their full potential. In biological analyses single tandem repeat hits are consistently interpreted as isolated and independent repeats. This interpretation also depends on the search strategy used to report the microsatellites in DNA sequences and, therefore, I was particularly interested in the capacity of repeat finding programs to report imperfect microsatellites allowing interpretations that are useful in a biological sense. After analzying a series of tandem repeat finding programs I optimized my microsatellite searches to yield the best possible datasets for assessing and comparing the degree of imperfection of microsatellites among different genomes (Chapter III) During the program comparisons performed in Chapter II, I show that the most critical search parameter influencing microsatellite search results is the minimum length threshold. Biologically speaking, there is no consensus with respect to the minimum length, beyond which a short tandem repeat is expected to become prone to microsatellite-like mutations. Usually, a single absolute value of ~12 nucleotides is assigned irrespective of motif length.. In other cases thresholds are assigned in terms of number of repeat units (i.e. 3 to 5 repeats or more), which are better applied individually for each motif. The variation in these thresholds is considerable and not always justifiable. In addition, any current minimum length measures are likely naïve because it is clear that different microsatellite motifs undergo replication slippage at different length thresholds. Therefore, in Chapter III, I apply two probabilistic models to predict the minimum length at which microsatellites of varying motif types become overrepresented in different genomes based on the individual oligonucleotide frequency data of these genomes. Finally, after a range of optimizations and critical analyses, I performed a preliminary analysis of microsatellite abundance among 24 high quality complete eukaryotic genomes, including also 8 prokaryotic and 5 archaeal genomes for contrast. The availability of the methodologies and the microsatellite datasets generated in this project will allow informed formulation of questions for more specific genome research, either about microsatellites, or about other genomic features microsatellites could influence. These datasets are what I would have needed at the beginning of my PhD to support my experimental design, and are essential for the adequate data interpretation of microsatellite data in the context of the major evolutionary units; chromosomes and genomes.
APA, Harvard, Vancouver, ISO, and other styles
21

Kuderna, Lukas 1989. "Application of genome assembly methods to human and non-human primate genomics." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/668648.

Full text
Abstract:
Genomic analyses are at the center of contemporary biology. These studies heavily rely on reference genome assemblies, yet those are typically highly fragmented. Having accurate representations of complex genomes, or parts thereof, is crucial to study human and primate evolution and disease. Here, we develop and apply new sequencing strategies and technologies to improve reference assemblies. We first explore the combinatorial potential of different datasets to generate a highly improved reference for the chimpanzee, a crucial species for the study of human origins. We are able to close 77% of the over 159.000 remaining gaps in the previous iteration of this species’ assembly and increase continuity by more than 750%. We then go on to develop a workflow to assemble the first human Y chromosome of African ancestry, using native flow-sorted chromosomes sequenced on a Nanopore device. We are able to assemble the Y chromosome to a reference grade quality and achieve unprecedented sequence resolution across structurally complex regions. These results open new avenues for comparative studies including the chimpanzee genome or human Y chromosomes.
Els anàlisis genòmics són el centre de la biologia contemporània. Aquests estudis depenen molt de l’assemblatge de genomes de referència, tot i que aquets en general estan molt fragmentats. Tenir representacions precises de genomes complexos, o parts d’aquests, és crucial per estudiar les malalties i l’evolució en humans i primats. En els estudis següents, desenvolupem i apliquem noves estratègies i tecnologies de seqüenciació per millorar els assemblatges de referència. En primer lloc, explorem el potencial de combinar diferents conjunts de dades per generar una referència substancialment millorada per al ximpanzé, una espècie crucial per a l'estudi dels orígens humans. Som capaços de tancar el 77% dels més de 159,000 buits que hi havia a la iteració prèvia de l’assemblatge d'aquesta espècie, i augmentar la continuïtat en més del 750%. A continuació, desenvolupem un protocol per assemblar el primer cromosoma Y humà d’ascendència africana, utilitzant cromosomes nadius aïllats per citometria de flux i seqüenciats mitjançant un dispositiu Nanopore. D’aquesta manera, aconseguim assemblar el cromosoma Y a una qualitat de referència i una resolució de seqüències sense precedents en regions estructuralment complexes. Aquests resultats obren noves vies per a estudis comparatius que inclouen el genoma del ximpanzé o els cromosomes Y humans.
APA, Harvard, Vancouver, ISO, and other styles
22

Bano, Fouzia. "Towards single cell genomics and proteomics: new methods in nanoscale surface biochemistry." Doctoral thesis, SISSA, 2009. http://hdl.handle.net/20.500.11767/4754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Michino, Mayako. "Developing new computational methods for characterization ORFS with unknown function." Thesis, Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/25208.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Alonso, Arnald. "Bioinformatics methods for the genomics and metabolomics analysis of immune-mediated inflammatory diseases." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/320191.

Full text
Abstract:
During the last decade, genomics have been widely used to the characterization of the molecular basis of common diseases. Genome-wide association studies (GWAS) have been highly successful in characterizing the genetic variation that influences human traits including the susceptibility to common diseases. In metabolomics, recent improvements of analytical technologies have enabled the analysis of complete metabolomic profiles. Using this approach, high-throughput metabolomics studies have already demonstrated a high potential for the discovery of disease biomarkers. The use of powerful high-throughput measurement technologies has resulted in the generation of large datasets of biological variation. In order to extract relevant biological information from this data, highly specialized bioinformatics methods are required. This thesis is focused on the development of new methodological tools to improve the processing of genomics and metabolomics high-throughput data. These new tools have been used in the analysis framework of the Immune-Mediated Inflammatory Diseases (IMIDs) Consortium. The IMID Consortium is a large Spanish network of biomedical researchers on autoimmune diseases, which holds one of the largest collections of biological samples from this group of diseases, as well as healthy controls. The first analysis tool that has been developed is a computationally efficient algorithm for simultaneous genotyping of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) using microarray data. This bioinformatics tool, called GStream, integrates the genotyping of both types of genomic variants into a single processing pipeline. We demonstrate that the developed algorithms provide a significant increase in genotyping accuracy and call rate when compared to previous algorithms. Using GStream, the researchers performing large-scale GWASs will not only benefit from the combined and fast genotyping of SNPs and CNVs but, more importantly, they will also improve the accuracy and therefore the statistical power of their studies. The second tool that was developed during this thesis was FOCUS, a bioinformatics framework that provides a complete data analysis workflow for high-throughput metabolomics studies based on one-dimensional nuclear magnetic resonance (NMR). FOCUS workflow includes quality control, peak alignment, peak picking and metabolite identification. The algorithms included in FOCUS were designed to overcome several technical challenges that can dramatically affect the quality of the results. FOCUS allows users to easily obtain high-quality NMR feature matrices, which are ready for chemometric analysis, as well as metabolite identification scores for each peak that greatly simplify the biological interpretation of the results. When tested against previous NMR data processing methodologies, FOCUS clearly showed a superior performance, even in datasets with high levels of spectral unalignment. he final research work included in this thesis is a GWAS in Crohn's disease (CD) clinical phenotypes. CD is the most prevalent chronic inflammatory disease of the bowel, and is characterized by segmental and transmural inflammation of the gastrointestinaltract. CD is a highly heterogeneous disease, with patients showing different degrees of severity. The identification of the genetic basis associated with disease severity is therefore a major objective in CD translational research. The present PhD thesis includes the first GWAS of clinically relevant phenotypes in CD. A total of 17 phenotypes associated with different clinical complications were analyzed. In this study, we identified new genetic regions significantly associated to complicated disease course, disease location, mild disease course, and erythema nodosum. These findings are of high relevance since they show the existence of a genetic component for disease heterogeneity that is independent of the genetic variation associated with susceptibility to CD.
Durant la darrera dècada, la genòmica ha jugat un paper clau en la caracterització de la base molecular de les malalties complexes. Els estudis d'associació de genoma complet (GWAS) han permès caracteritzar les regions genètiques que influencien fenotips humans tals com la susceptibilitat a desenvolupar malalties complexes. En metabolòmica, millores en les tecnologies analítiques han impulsat l'obtenció de perfils metabolòmics en grans cohorts de mostres. Els estudis resultants han demostrat també un gran potencial per a identificar biomarcadors d'utilitat en malalties humanes. L'aplicació de les tecnologies high-throughput permet generar grans conjunts de dades de variació biològica i l'extracció de la informació rellevant requereix l'aplicació de potents eines bioinformàtiques. Aquesta tesi es centra en el desenvolupament de nous mètodes per a millorar i agilitzar el processat de dades genòmiques i metabolòmiques high-throughput, així com la seva posterior implementació en forma d'aplicacions bioinformàtiques. Aquestes aplicacions s'han incorporat al flux d'anàlisi del consorci IMID (malalties inflamatòries mediades per immunitat). Aquest consorci és una xarxa espanyola d'investigadors biomèdics amb l'interès comú de l'estudi de malalties autoimmunes i disposa d'una de les col·leccions de mostres més extenses de pacients d'aquestes malalties. La primera eina bioinformàtica implementada consisteix en un conjunt d'algoritmes que integren el genotipat de polimorfismes de nucleòtid simple i variacions de nombre de còpies sobre dades de microarrays de genotipat. Aquesta eina, anomenada GStream, incorpora de forma eficient tot el flux d'anàlisi necessari per al genotipat en GWAS. S'ha demostrat que els algoritmes desenvolupats milloren significativament la precisió del genotipat i augmenten el nombre de variants genètiques identificades respecte a les metodologies anteriors. La utilització d'aquesta eina permet doncs ampliar el nombre de variants genètiques analitzades, incrementant de forma significativa el poder estadístic dels estudis genètics GWAS. La segona eina desenvolupada ha estat FOCUS. Es tracta d'una eina bioinformàtica integrada que inclou totes les etapes de processat d'espectres de ressonància magnètica nuclear per a estudis de metabolòmica. El flux d'anàlisi inclou el control de qualitat, l'alineament/quantificació de pics espectrals i la identificació dels metabolits associats als pics quantificats. Tots els algoritmes han estat dissenyats per a corregir els biaixos que limiten considerablement la qualitat dels resultats i que són un dels reptes tècnics de la metabolòmica actual. FOCUS obté una matriu numèrica d'alta qualitat llesta per a l'anàlisi quimiomètric, i genera uns scores d'identificació que simplifiquen la interpretació biològica dels resultats. FOCUS ha assolit un rendiment significativament superior al de metodologies prèvies. Aquesta tesi conclou amb el primer GWAS de fenotips clínics de malaltia de Crohn. Aquesta malaltia IMID és la malaltia inflamatòria intestinal de major prevalença i és molt heterogènia, amb pacients que presenten graus molt diferents de gravetat. La identificació de variants genètiques associades als fenotips d'aquesta malaltia és, per tant, un dels objectius més rellevants per a la investigació translacional. Un total de 17 fenotips han estat analitzats utilitzant cohorts de descobriment i validació per tal d'identificar i replicar loci de risc associats a cadascun d'ells. Els resultats de l'estudi han permès identificar, per primer cop, regions genètiques associades a l'evolució de la malaltia i a la seva localització. Aquests resultats són de gran rellevància ja que no tan sols han permès identificar noves vies biològiques associades a fenotips clínics, sinó que també demostren, per primer cop, la existència d'un component genètic de la heterogeneïtat a la malaltia de Crohn i que és independent de la variació genètica associada al risc de patir la malaltia.
APA, Harvard, Vancouver, ISO, and other styles
25

Brockmann, Christoph. "NMR protein structure determination in a structural genomics context developments, methods and applications /." [S.l.] : [s.n.], 2005. http://www.diss.fu-berlin.de/2006/219/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Bohnert, Regina [Verfasser], and Gunnar [Akademischer Betreuer] Rätsch. "Computational Methods for High-Throughput Genomics and Transcriptomics / Regina Bohnert ; Betreuer: Gunnar Rätsch." Tübingen : Universitätsbibliothek Tübingen, 2011. http://d-nb.info/1162699280/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Ferber, Kyle L. "Methods for Predicting an Ordinal Response with High-Throughput Genomic Data." VCU Scholars Compass, 2016. http://scholarscompass.vcu.edu/etd/4585.

Full text
Abstract:
Multigenic diagnostic and prognostic tools can be derived for ordinal clinical outcomes using data from high-throughput genomic experiments. A challenge in this setting is that the number of predictors is much greater than the sample size, so traditional ordinal response modeling techniques must be exchanged for more specialized approaches. Existing methods perform well on some datasets, but there is room for improvement in terms of variable selection and predictive accuracy. Therefore, we extended an impressive binary response modeling technique, Feature Augmentation via Nonparametrics and Selection, to the ordinal response setting. Through simulation studies and analyses of high-throughput genomic datasets, we showed that our Ordinal FANS method is sensitive and specific when discriminating between important and unimportant features from the high-dimensional feature space and is highly competitive in terms of predictive accuracy. Discrete survival time is another example of an ordinal response. For many illnesses and chronic conditions, it is impossible to record the precise date and time of disease onset or relapse. Further, the HIPPA Privacy Rule prevents recording of protected health information which includes all elements of dates (except year), so in the absence of a “limited dataset,” date of diagnosis or date of death are not available for calculating overall survival. Thus, we developed a method that is suitable for modeling high-dimensional discrete survival time data and assessed its performance by conducting a simulation study and by predicting the discrete survival times of acute myeloid leukemia patients using a high-dimensional dataset.
APA, Harvard, Vancouver, ISO, and other styles
28

Cresswell, Kellen Garrison. "Spectral methods for the detection and characterization of Topologically Associated Domains." VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/6100.

Full text
Abstract:
The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for identification of hierarchies. Additionally, there are no publicly available tools for comparison of TADs across datasets. These tools are necessary to conduct large-scale genome-wide analysis and comparison of 3D structure. To address the challenge of TAD identification, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification. Our method, implemented in an R package, SpectralTAD, has automatic parameter selection, is robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TADs. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. SpectralTAD is available at http://bioconductor.org/packages/SpectralTAD/. To address the problem of TAD comparison, we developed TADCompare. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of TAD boundary differences between datasets. Using this measure, we introduce methods for identifying differential and consensus TAD boundaries and tracking TAD boundary changes over time. We further propose a novel framework for the systematic classification of TAD boundary changes. Colocalization- and gene enrichment analysis of different types of TAD boundary changes revealed distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare.
APA, Harvard, Vancouver, ISO, and other styles
29

Leader, Debbie. "Methods for incorporating biological information into the statistical analysis of gene expression microarray data." Thesis, University of Auckland, 2009. http://hdl.handle.net/2292/5609.

Full text
Abstract:
Microarray technology has made it possible for researchers to simultaneously measure the expression levels of tens of thousands of genes. It is believed that most human diseases and biological phenomena occur through the interaction of groups of genes that are functionally related. To investigate the feasibility of incorporating functional information and/or constraints (based on biological and technical needs) into the classification process two approaches were examined in this thesis. The first of these approaches investigated the effect of incorporating a pre-filter into the gene selection step of the classifier construction process. Both simulated and real microarray datasets were used to assess the utility of this approach. The pre-filter was based on an early method for determining if a gene had undergone a biologically relevant level of differential expression between two classes. The genes retained by the pre-filter were ranked using one of five standard statistical ranking methods and the most highly ranked were used to construct a predictive classifier. To generate the simulated data a selection of different parametric and non-parametric techniques were employed. The results from these analyses showed that when the constraints that the pre-filter contains were placed on the classification analysis, the predictive performance of the classifiers were similar to when the pre-filter was not used. The second approach explored the feasibility of incorporating sets of functionally related genes into the classification process. Three publicly available datasets obtained from studies into breast cancer were used to assess the utility of this approach. A summary of each gene-set was derived by reducing the dimensionality of each gene-set via the use of Principal Co-ordinates Analysis. The reduced gene-sets were then ranked based on their ability to distinguish between the two classes (via Hotelling’s T2) and those most highly ranked were used to construct a classifier via logistic regression. The results from the analyses undertaken for this approach showed that it was possible to incorporate function information into the classification process whilst maintaining an equivalent (if not higher) level of predictive performance, as well as improving the biological interpretability of the classifier.
APA, Harvard, Vancouver, ISO, and other styles
30

Kural, Deniz. "Methods for Inter- and Intra-Species Genomics for the Detection of Variation and Function." Thesis, Boston College, 2014. http://hdl.handle.net/2345/bc-ir:104053.

Full text
Abstract:
Thesis advisor: Gabor T. Marth
This thesis concerns itself with the development of methods for comparing genomes. Chapter 2 is a comparative genomics investigation of coding regions across multiple species. Regions of the genome coding for proteins show higher conservation than non-coding regions. Furthermore, we show that a portion of coding regions are conserved beyond the requirements of protein conservation, supporting functions such as microRNA binding and splicing enhancement, providing the non-coding functional impetus to conservation. In Chapter 3, we focus on the detection and characterization of a particular type of structural variation - mobile element insertions (MEIs). While there are many types of mobile elements in the human genome, three of these are active and cause most of the MEI variation observed in humans: ALU, L1 and SVA elements. We detect variation across 1000 Genomes Pilot populations caused by these elements, assemble ALU elements to single nucleotide resolution, and determine actively copying species of this element. We've developed a variety of algorithmic approaches to MEI detection, and present these. Chapter 4 outlines an approach to remedy reference bias via the incorporation of variation data into the reference. In particular, we construct a pan-genome reference, demonstrated concretely via resolving ALU regions, and develop new alignment software to align against this enriched reference structure
Thesis (PhD) — Boston College, 2014
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
APA, Harvard, Vancouver, ISO, and other styles
31

Feldhahn, Magdalena [Verfasser], and Oliver [Akademischer Betreuer] Kohlbacher. "Computational Methods for Personalized Cancer Therapy Based on Genomics Data / Magdalena Feldhahn ; Betreuer: Oliver Kohlbacher." Tübingen : Universitätsbibliothek Tübingen, 2013. http://d-nb.info/1162844434/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Chaisson, Mark. "Combinatorial methods in computational genomics mammalian phylogenetics using microinversions and fragment assembly with short reads /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2008. http://wwwlib.umi.com/cr/ucsd/fullcit?p3337222.

Full text
Abstract:
Thesis (Ph. D.)--University of California, San Diego, 2008.
Title from first page of PDF file (viewed February 6, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 151-161).
APA, Harvard, Vancouver, ISO, and other styles
33

Lee, Yiu-fai, and 李耀暉. "Analysis for segmental sharing and linkage disequilibrium: a genomewide association study on myopia." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43912217.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Xinan. "NOVEL COMPUTATIONAL METHODS FOR SEQUENCING DATA ANALYSIS: MAPPING, QUERY, AND CLASSIFICATION." UKnowledge, 2018. https://uknowledge.uky.edu/cs_etds/63.

Full text
Abstract:
Over the past decade, the evolution of next-generation sequencing technology has considerably advanced the genomics research. As a consequence, fast and accurate computational methods are needed for analyzing the large data in different applications. The research presented in this dissertation focuses on three areas: RNA-seq read mapping, large-scale data query, and metagenomics sequence classification. A critical step of RNA-seq data analysis is to map the RNA-seq reads onto a reference genome. This dissertation presents a novel splice alignment tool, MapSplice3. It achieves high read alignment and base mapping yields and is able to detect splice junctions, gene fusions, and circular RNAs comprehensively at the same time. Based on MapSplice3, we further extend a novel lightweight approach called iMapSplice that enables personalized mRNA transcriptional profiling. As huge amount of RNA-seq has been shared through public datasets, it provides invaluable resources for researchers to test hypotheses by reusing existing datasets. To meet the needs of efficiently querying large-scale sequencing data, a novel method, called SeqOthello, has been developed. It is able to efficiently query sequence k-mers against large-scale datasets and finally determines the existence of the given sequence. Metagenomics studies often generate tens of millions of reads to capture the presence of microbial organisms. Thus efficient and accurate algorithms are in high demand. In this dissertation, we introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequences. It supports efficient query of a taxon using its k-mer signatures.
APA, Harvard, Vancouver, ISO, and other styles
35

Wirta, Valtteri. "Mining the transcriptome - methods and applications." Doctoral thesis, Stockholm : School of Biotechnology, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4115.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Huang, Yan. "NOVEL COMPUTATIONAL METHODS FOR TRANSCRIPT RECONSTRUCTION AND QUANTIFICATION USING RNA-SEQ DATA." UKnowledge, 2015. http://uknowledge.uky.edu/cs_etds/28.

Full text
Abstract:
The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We have proposed two methods directly addressing this challenge. First, we developed a novel method MultiSplice to accurately estimate the abundance of the well-annotated transcripts. Driven by the desire of detecting novel isoforms, a max-flow-min-cost algorithm named Astroid is designed for simultaneously discovering the presence and quantities of all possible transcripts in the transcriptome. We further extend an \emph{ab initio} pipeline of transcriptome analysis to large-scale dataset which may contain hundreds of samples. The effectiveness of proposed methods has been supported by a series of simulation studies, and their application on real datasets suggesting a promising opportunity in reconstructing mRNA transcriptome which is critical for revealing variations among cells (e.g. disease vs. normal).
APA, Harvard, Vancouver, ISO, and other styles
37

Stokes, Matthew Oliver. "Comparative genomics of blaCTX-M plasmids from veterinary and human 'Escherichia coli' and methods for their identification and differentiation." Thesis, Kingston University, 2014. http://eprints.kingston.ac.uk/29889/.

Full text
Abstract:
The blaCTX-M gene confers resistance to penicillins and cephalosporins and is now the most widely disseminated plasmid mediated Extended Spectrum beta-lactamase (ESBL). Plasmids harbouring blaCTX-M have been recovered from both human and animals isolates, with increasing evidence for the transmission between hosts which is a major public health concern. The aim of this study was to investigate the relationship of blaCTX-M plasmids from UK human and veterinary E. coli isolates with plasmids previously sequenced around the world, and develop molecular markers to identify and differentiate plasmids. Molecular markers were first established as a suitable method for identifying plasmids by studying the prevalence of IncK pCT-like plasmids, which were found to be associated with 30% of CTX-M-14 producers in the UK, with plasmids mobilising the gene between unrelated isolates from cattle, turkeys and humans. Seven blaCTX-M plasmids, belonging to four incompatibility groups, from E. coli were isolated, fully sequenced and annotated. The only human sequenced plasmid was pH19 the first IncZ blaCTX-M-14 to be sequenced. The plasmids sequenced from animals included pSAM7 from cattle, the first IncX4 plasmid to be sequenced with blaCTX-M-14b in a novel transposition unit. Plasmid pCH01 was isolated from a chicken isolate and harboured the blaCTX-M-3 and is the first IncA/C group CTX-M to be sequenced. The four IncI1γ blaCTX-M-1 plasmids from chicken (pCH02 and pCH03), cattle (pCT01) and turkey (pT01), are the first ST3 and IncI1γ blaCTX-M-1 plasmids to be sequenced. Comparative analysis found UK plasmids shared approximately 70-99% sequence coverage with previously published sequences from different hosts and bacterial species around the world. This demonstrated that plasmids in the UK were closely related to plasmids found elsewhere, and no genetic characteristics were identified why these plasmid could not exist in either human or animals isolates, with the main differences observed in the inserted resistance regions. In all plasmids the blaCTX-M was associated with ISEcp1, and included a novel blaCTX-M-14b and blaCTX-M-3 transposition unit in pSAM7 and pCH01 respectively. In 6/7 plasmids the ISEcp1-blaCTX-M was not associated with any other resistance regions, inserting as separate events. Molecular markers were designed from the comparative analysis between plasmids that were capable of both identifying and differentiating plasmids belonging to the same incompatibility group. Five groups were identified for IncX4, eight for IncZ, B and K, 12 for IncI1γ and 14 for IncA/C. Markers were used in screening of field isolates to identify similar plasmids, with novel combinations being observed, not previously identified in silico. These markers represent a new non-sequencing based tool to identify and characterise plasmids further, benefitting the study of plasmids and their epidemiology.
APA, Harvard, Vancouver, ISO, and other styles
38

Cubuk, Cankut. "Modeling Functional Modules Using Statistical and Machine Learning Methods." Doctoral thesis, Universitat Politècnica de València, 2020. http://hdl.handle.net/10251/156175.

Full text
Abstract:
[ES] La comprensión de los aspectos de la funcionalidad de las células que cuentan para los mecanismos de las enfermedades es el mayor reto de la medicina personalizada. A pesar de la disponibilidad creciente de los datos de genómica y transcriptómica, sigue existiendo una notable brecha entre la detección de las perturbaciones en la expresión de genes y la comprensión de su contribución en los mecanismos moleculares que últimamente tienen relación importante con el fenotipo estudiado. A lo largo de la última década, distintos modelos computacionales y matemáticos se han propuesto para el análisis de las rutas. Sin embargo, estos modelos no toman en cuenta los mecanismos dinámicos de las rutas como la estructura y las interacciones entre genes y proteínas. En esta tesis doctoral, presento dos modelos matemáticos ligeramente distintos, para integrar los datos transcriptómicos masivos de humano con un conocimiento previo de de las rutas de señalización y metabólicas para estimar las actividades mecánicas que están detrás de esas rutas (MPAs). Las MPAs son variables continuas con valores de nivel individual que pueden ser usadas con los modelos de aprendizaje de máquinas y métodos estadísticos para determinar los biomarcadores que podemos usar para los diagnósticos tempranos y la clasificación de subtipos de enfermedades, además de poder sugerir las dianas terapéuticas potenciales para las intervenciones individualizadas. El objetivo global es desarrollar nuevos y avanzados enfoques de la biología de sistemas para proponer unas hipótesis funcionales que nos ayuden a entender e interpretar los mecanismos complejos de las enfermedades. Estos mecanismos son cruciales para mejorar los tratamientos personalizados y predecir los resultados clínicos. En primer lugar, contribuí al desarrollo de un método que está diseñado para extraer las subrutas elementales desde la ruta de señalización con sus actividades estimadas. Posteriormente, este algoritmo se ha adaptado a los módulos metabólicos y se ha implementado como una herramienta web. Finalmente , el método ha revelado un panorama metabólico para una lista completa de diferentes tipos de cánceres. En este estudio, analicé el perfil metabólico de 25 tipos de cáncer distintos y se validó el método usando varios enfoques computacionales y experimentales. Cada método desarrollado en esta tesis ha sido enfrentado a otros métodos similares existentes, evaluados por sus sensibilidades y especificidades, experimentalmente validados cuando fue posible y usados para predecir resultados clínicos de varios tipos de cánceres. La investigación descrita en esta tesis y los resultados obtenidos fueron publicados en distintas revistas arbitradas que están relacionadas con el cáncer y biología de sistemas, y también en los periódicos nacionales.
[CA] La comprensió dels aspectes de la funcionalitat de les cèl·lules que compten per als mecanismes de les malalties és el major repte de la medicina personalitzada. Malgrat la disponibilitat creixent de les dades de genòmica i transcriptómica, continua existint una notable bretxa entre la detecció de les pertorbacions en l'expressió de gens i la comprensió de la seua contribució en els mecanismes moleculars que últimament tenen relació important amb el fenotip estudiat. Al llarg de l'última dècada, diferents models computacionals i matemàtics s'han proposat per a l'anàlisi de les rutes. No obstant això, aquests models no tenen en compte els mecanismes dinàmics de les rutes com l'estructura i les interaccions entre gens i proteïnes. En aquesta tesi doctoral, presente dos models matemàtics lleugerament diferents, per a integrar les dades transcriptómicos massius d'humà amb un coneixement previ de de les rutes de senyalització i metabòliques per a estimar les activitats mecàniques que estan darrere d'aqueixes rutes (MPAs). Les MPAs són variables contínues amb valors de nivell individual que poden ser usades amb els models d'aprenentatge de màquines i mètodes estadístics per a determinar els biomarcadores que podem usar per als diagnòstics primerencs i la classificació de subtipus de malalties, a més de poder suggerir les dianes terapèutiques potencials per a les intervencions individualitzades. L'objectiu global és desenvolupar nous i avançats enfocaments de la biologia de sistemes per a proposar unes hipòtesis funcionals que ens ajuden a entendre i interpretar els mecanismes complexos de les malalties. Aquests mecanismes són crucials per a millorar els tractaments personalitzats i predir els resultats clínics. En primer lloc, vaig contribuir al desenvolupament d'un mètode que està dissenyat per a extraure les subrutas elementals des de la ruta de senyalització amb les seues activitats estimades. Posteriorment, aquest algorisme s'ha adaptat als mòduls metabòlics i s'ha implementat com una eina web. Finalment, el mètode ha revelat un panorama metabòlic per a una llista completa de diferents tipus de càncers. En aquest estudi, vaig analitzar el perfil metabòlic de 25 tipus de càncer diferents i es va validar el mètode usant diversos enfocaments computacionals i experimentals. Cada mètode desenvolupat en aquesta tesi ha sigut enfrontat a altres mètodes similars existents, avaluats per les seues sensibilitats i especificitats, experimentalment validats quan va ser possible i usats per a predir resultats clínics de diversos tipus de càncers. La investigació descrita en aquesta tesi i els resultats obtinguts van ser publicats en diferents revistes arbitrades que estan relacionades amb el càncer i biologia de sistemes, i també en els periòdics nacionals.
[EN] Understanding the aspects of the cell functionality that account for disease or drug action mechanisms is the main challenge for precision medicine. In spite of the increasing availability of genomic and transcriptomic data, there is still a gap between the detection of perturbations in gene expression and the understanding of their contribution to the molecular mechanisms that ultimately account for the phenotype studied. Over the last decade, different computational and mathematical models have been proposed for pathway analysis. However, they are not taking into account the dynamic mechanisms contained by pathways as represented in their layout and the interactions between genes and proteins. In this thesis, I present two slightly different mathematical models to integrate human transcriptomic data with prior knowledge of signalling and metabolic pathways to estimate the Mechanistic Pathway Activities (MPAs). MPAs are continuous and individual level values that can be used with machine learning and statistical methods to determine biomarkers for the early diagnosis and subtype classification of the diseases, and also to suggest potential therapeutic targets for individualized therapeutic interventions. The overall objective is, developing new and advanced systems biology approaches to propose functional hypotheses that help us to understand and interpret the complex mechanism of the diseases. These mechanisms are crucial for robust personalized drug treatments and predict clinical outcomes. First, I contributed to the development of a method which is designed to extract elementary sub-pathways from a signalling pathway and to estimate their activity. Second, this algorithm adapted to metabolic modules and it is implemented as a webtool. Third, the method used to reveal a pan-cancer metabolic landscape. In this study, I analyzed the metabolic module profile of 25 different cancer types and the method is also validated using different computational and experimental approaches. Each method developed in this thesis was benchmarked against the existing similar methods, evaluated for their sensitivity and specificity, experimentally validated when it is possible and used to predict clinical outcomes of different cancer types. The research described in this thesis and the results obtained were published in different systems biology and cancer-related peer-reviewed journals and also in national newspapers.
Cubuk, C. (2020). Modeling Functional Modules Using Statistical and Machine Learning Methods [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/156175
TESIS
APA, Harvard, Vancouver, ISO, and other styles
39

Weiss, Bruno. "Genômica comparativa de Microcystis aeruginosa (Cyanobacteria: Chroococcales), com ênfase em genes envolvidos com síntese de produtos naturais." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/11/11138/tde-15082017-173620/.

Full text
Abstract:
A ampla diversidade metabólica das cianobactérias é associada não somente a sua importância nos ciclos biogeoquímicos, mas também a sua distribuição global. Tal característica também é responsável pela capacidade destes organismos em produzir uma ampla variedade de substâncias de estruturas incomuns e atividades de interesse para o homem. Microcystis é um gênero cianobacteriano reconhecido como produtor de mais de duas centenas de produtos naturais, incluindo cianotoxinas. Microcystis aeruginosa é uma espécie frequentemente encontrada em florações, portanto causando preocupações sobre sua influência ecológica, especialmente em corpos d\'água doce utilizados para consumo humano. Desta forma, o objetivo deste trabalho foi o levantamento da diversidade e quantidade de metabólitos secundários que podem ser produzidos pela espécie, através de análises genômicas, além de variáveis que podem potencialmente interferir nas análises computacionais, procurando-se por padrões na espécie, e comparando-se 18 linhagens de todos os continentes. Foi encontrado o total de 235 agrupamentos relacionados ao metabolismo secundário, categorizados em 12 classes segundo as estruturas de seus produtos, nas 18 linhagens, evidenciando a riqueza de agrupamentos relacionados ao metabolismo secundário encontrados nesta espécie. Destes agrupamentos, os mais abundantes pertencem às categorias dos Terpenos, Híbridos, Bacteriocinas e NRPS. Entre as NRPS, nenhuma foi comum a todas as linhagens. Ainda, a quantidade de agrupamentos variou entre 6 e 21, e a quantidade de categorias de produtos variou entre 4 e 10, mostrando uma distribuição heterogênea de agrupamentos e tipos de metabólitos preditos. Esta distribuição heterogênea foi detalhada para melhor compreensão deste padrão encontrado na espécie. Dos agrupamentos de NRPS, os três mais frequentes foram selecionados para uma análise pormenorizada de sua estrutura e sequência: aeruginosina (15 linhagens), microcistina (11 linhagens), e micropeptina (15 linhagens). O agrupamento de micropeptina encontrado nas linhagens SPC777, TAIHU98 e PCC 9806 se mostrou amplamente dissimilar com relação à referência utilizada, potencialmente indicando um erro de identificação causado pela plataforma antiSMASH utilizada para a localização dos agrupamentos. Análises de colinearidade genômica mostram uma baixíssima sintenia entre os genomas das linhagens em análise, sugerindo frequentes eventos de reorganização genômica. Ainda, análises de pangenoma mostram um cenário em que mais genomas desta espécie são necessários para a estimativa da quantidade total de genes diferentes que a espécie pode possuir, o que é interessante para futuros estudos de procura de metabólitos secundários. Análises do genoma cerne apontam para uma estimativa segura de 1.944 genes comuns a todos os genomas desta espécie, o que corresponde entre 35% e 50% dos genes em cada linhagem. Análises estatísticas apontam para diferentes graus de interferência não linear da quantidade de sequências contíguas na observação de diferentes padrões de outras características genômicas, sugerindo precaução nas expectativas com relação ao metabolismo secundário em caso de linhagens em que a montagem gênica ultrapasse o limite superior aproximado de 100 sequências contíguas.
The wide metabolic diversity of cyanobacteria is associated not only with their importance in biogeochemical cycles, but also with their global distribution. Such a feature is also responsible for the ability of these organisms to produce a wide variety of substances with unusual structures and activities of interest to man. Microcystis is a cyanobacterial genus recognized as a producer of more than two hundred natural products, including cyanotoxins. Microcystis aeruginosa is a species frequently found in cyanobacterial blooms, thus causing concerns about its ecological influence, especially in freshwater bodies used for human consumption. In this way, the objective of this work was the survey of the diversity and quantity of secondary metabolites that can be produced by the species, through genomic analyzes, besides variables that can potentially interfere in the computational analyzes, searching for patterns in the species, and comparing 18 strains from all the continents. A total of 235 clusters, categorized in 12 classes according to the structure of their products, were found in the 18 strains, evidencing the richness of clusters related to the secondary metabolism found in this species. Of these clusters, the most abundant belong to the categories of Terpenes, Hybrids, Bacteriocins and NRPS. Among NRPS, none were common to all strains. Also, the number of groups ranged from 6 to 21, and the number of product categories ranged from 4 to 10, showing a heterogeneous distribution of predicted groupings and types of metabolites. Such a heterogeneous distribution was detailed for a better understanding of this pattern found in the species. Of the NRPS clusters, the three most frequent were selected for a detailed analysis of their structure and sequence: aeruginosin (15 strains), microcystin (11 strains), and micropeptin (15 strains). The micropeptide cluster found in the SPC777, TAIHU98 and PCC 9806 strains was widely dissimilar to the reference, só potentially indicating an identification error caused by the antiSMASH platform used to locate the clusters. Genomic collinearity analyzes showed a very low synteny among the genomes of the strains under analysis, suggesting frequent events of genomic reorganization. Also, pangenome analyzes show a scenario in which more genomes of this species are needed for the estimation of the total amount of different genes the species may possess, which is interesting for future studies conserning secondary metabolites. Coregenome analyzes point to a reliable estimate of 1,944 genes common to all genomes of this species, which corresponds to 30% up to 50% of the genes in each strain. Statistical analyzes point to different degrees of non-linear interference of the number of contiguous sequences on the observation of different patterns of other genomic characteristics, suggesting necessary caution about expectations regarding the secondary metabolism in case of strains in which the gene assembly exceeds the approximate upper limit of 100 contiguous sequences.
APA, Harvard, Vancouver, ISO, and other styles
40

Haddon, Andrew L. "Evaluation of Some Statistical Methods for the Identification of Differentially Expressed Genes." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/1913.

Full text
Abstract:
Microarray platforms have been around for many years and while there is a rise of new technologies in laboratories, microarrays are still prevalent. When it comes to the analysis of microarray data to identify differentially expressed (DE) genes, many methods have been proposed and modified for improvement. However, the most popular methods such as Significance Analysis of Microarrays (SAM), samroc, fold change, and rank product are far from perfect. When it comes down to choosing which method is most powerful, it comes down to the characteristics of the sample and distribution of the gene expressions. The most practiced method is usually SAM or samroc but when the data tends to be skewed, the power of these methods decrease. With the concept that the median becomes a better measure of central tendency than the mean when the data is skewed, the tests statistics of the SAM and fold change methods are modified in this thesis. This study shows that the median modified fold change method improves the power for many cases when identifying DE genes if the data follows a lognormal distribution.
APA, Harvard, Vancouver, ISO, and other styles
41

Fontseré, Alemany Clàudia 1992. "Genomic analysis of wild and captive chimpanzee populations from non-invasive samples using target capture methods." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/670317.

Full text
Abstract:
Wild chimpanzee populations are considered to be under threat of extinction due to the damaging consequences of human impact into their natural habitat and illegal trade. Conservation genomics is an emerging field that has the potential to guide conservation efforts not only in the wild (in situ) but also outside their natural range (ex situ). In this thesis, we have explored to which extent target capture methods on specific genomic regions can provide insights into chimpanzee genetic diversity in captive and wild populations. Specifically, we have characterized the ancestry and inbreeding of 136 European captive chimpanzees to aid their management in captivity and inferred the origin of 31 confiscated individuals from illegal trade by sequencing ancestry informative SNPs. Also, we have examined molecular strategies to maximize the library complexity in target capture methods from fecal samples so they can be applied in large-scale genomic studies. Finally, we have captured the chromosome 21 from 828 fecal samples collected across the entire extant chimpanzee range. As a result of our high density sampling scheme, we have found strong evidence of population stratification in chimpanzee populations and we have discovered new local genetic diversity that is linked to its geographic origin. Finally, with this newly generated dataset and fine-grained geogenetic map, we have implemented a strategy for the geolocalization of chimpanzees which has a direct conservation application
Les poblacions salvatges de ximpanzés estan en perill d'extinció a causa de les dramàtiques conseqüències associades a l’impacte humà en el seu hàbitat natural i al tràfic il·legal. La genòmica de la conservació és un camp emergent que té el potencial de guiar esforços de conservació d’espècies en perill d’extinció no només en el seu hàbitat natural (in situ) sinó també en captivitat (ex situ). En aquesta tesi, hem analitzat fins a quin punt els mètodes de captura de regions específiques del genoma són una bona eina per explorar la diversitat genètica dels ximpanzés tant en poblacions captives com salvatges. Concretament, hem caracteritzat la subespècie i els nivells de consanguinitat de 136 ximpanzés de zoos europeus amb l'objectiu de guiar-ne la seva gestió en captivitat, i hem inferit l'origen de 31 individus confiscats del tràfic il·legal a través de la seqüenciació de SNPs informatius de llinatge. També hem posat en pràctica estratègies moleculars per maximitzat la complexitat de les llibreries en la captura de regions específiques a partir de mostres fecals i així poder ser aplicades en estudis genòmics a gran escala. Finalment, hem capturat el cromosoma 21 de 828 mostres fecals recollides per tota la distribució geogràfica dels ximpanzé. Arran de l’alta densitat de mostreig, hem trobat evidències que apunten a una alta estratificació poblacional en els ximpanzés i hem desxifrat nova diversitat genètica vinculada a l’origen geogràfic dels individus. Finalment, amb el conjunt de dades generat i el mapa geogenètic obtingut, hem implementat una estratègia per la geolocalització de ximpanzés amb aplicació directe per a la conservació.
APA, Harvard, Vancouver, ISO, and other styles
42

Eiderbrant, Kristina. "Development of quantitative PCR methods for diagnosis of bacterial vaginosis and vaginal yeast infection." Thesis, Linköpings universitet, Institutionen för klinisk och experimentell medicin, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-68269.

Full text
Abstract:
Vaginitis is a vaginal infection which affects many women all over the world. The disorder is characterized by an infection of the vaginal area which can cause problems like abnormal vaginal discharge, itching and redness. The two most common causes of vaginitis are bacterial vaginosis and Candida vaginitis. The prevalence of bacterial vaginosis in Sweden is around 10-20 % and approximately 75 % of all women will once in their lifetime suffer from vaginal yeast infection. The clinical symptoms of vaginal infections are not specific and the diagnosis methods of bacterial vaginosis and Candida vaginitis are subjective and depended on the acuity of the clinician. Due to the lack of standardized and objective diagnostic tools, misdiagnosis and consequently incorrect treatment may occur. As vaginal infections and symptoms impact greatly of women´s quality of life and vaginitis have been associated with serious public health consequences, it is essential to diagnose and treat the conditions correctly. Hence, there is a great need of better methods of diagnosing these conditions. The aim of this master thesis was to develop quantitative species-specific real-time PCR assays to use in diagnosing the two most common causes of vaginitis i.e. bacterial vaginosis and Candida vaginitis. Potential markers for bacterial vaginosis (Atopobium vaginae, BVAB2, Gardnerella vaginalis, Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus jensenii, Lactobacillus iners, Megasphaera type 1, Megasphaera type 2, Mobiluncus curtisii, Mobiluncus mulieris and Leptotrichia/Sneathia species) and Candida vaginitis (Candida albicans, Candida glabrata, Candida parapsilosis and Candida tropicalis) were chosen. Primers and probes were designed and tested on reference strains and vaginal samples. Single- and multiplex PCR reactions were successfully optimized with the designed oligonucleotides. Furthermore, standard curves with excellent linearity were created and covered more than five orders of magnitude. These developed quantitative species-specific real-time PCR assays will, in a prospective medical validation, quantify 300 vaginal samples from women visiting the RFSU Clinic in Stockholm.
APA, Harvard, Vancouver, ISO, and other styles
43

Abbott, Diana Lee. "Conditional linkage methods--searching for modifier genes in a large Amish pedigree with known Von Willebrand disease major gene modification." Diss., University of Iowa, 2009. https://ir.uiowa.edu/etd/223.

Full text
Abstract:
Von Willebrand Disease (VWD) is the most common bleeding disorder. In addition to known major genes, genetic modifiers, such as ABO blood group, affect quantitative outcome measures for VWD. The data consist of an 854-member Amish pedigree with established linkage of VWD to a locus within the Von Willebrand Factor (VWF) gene on chromosome 12. The DNA sequence of the causative mutation is known. Phenotypic information and genotypic data consisting of VWF mutation status and a genome screen of markers are available for 385 pedigree members. Genetic modifiers of the VWF mutation are investigated using known and new conditional linkage methods that search for modifier genes of a major gene with known mutation. The MCMC-based program LOKI was used to conduct multipoint linkage analysis of VWD outcome measures while controlling for the VWF mutation. Adjustment for the mutation did not eliminate the linkage signal on chromosome 12 in the same location as the VWF mutation. Evidence for QTLs was also found on six other chromosomes. Smod, a score statistic that detects evidence of a genetic modifier conditional on linkage to a major gene, was developed for sib pair data. To limit the modifier gene main effect, Smod was developed so that variance due to the modifier locus is bounded above by the variance of the interaction between major gene and modifier gene. The performance of Smod was compared to other published score statistics. Power to detect linkage to the modifier locus depended on major gene and modifier gene risk allele frequencies, relative contribution of the major gene main effect to the interaction effect, and the upper bound on the modifier gene main effect. The Amish pedigree was broken up into sib pair data and analyzed using Smod and other score statistics. Using these statistics, the strongest evidence for QTLs for VWD was also found on chromosome 12 in the region of the VWF mutation. Combined with the LOKI results, further analysis will help determine if intragenic modification is occurring or if linkage disequilibrium between the mutation and analyzed markers is driving results.
APA, Harvard, Vancouver, ISO, and other styles
44

Jean, Géraldine. "In silico methods for genome rearrangement analysis : from identification of common markers to ancestral reconstruction." Thesis, Bordeaux 1, 2008. http://www.theses.fr/2008BOR13704/document.

Full text
Abstract:
L'augmentation du nombre de génomes totalement séquencés rend de plus en plus efficace l'étude des mécanismes évolutifs à partir de la comparaison de génomes contemporains. L'un des principaux problèmes réside dans la reconstruction d'architectures de génomes ancestraux plausibles afin d'apporter des hypothèses à la fois sur l'histoire des génomes existants et sur les mécanismes de leur formation. Toutes les méthodes de reconstruction ancestrale ne convergent pas nécessairement vers les mêmes résultats mais sont toutes basées sur les trois mêmes étapes : l'identification des marqueurs communs dans les génomes contemporains, la construction de cartes comparatives des génomes, et la réconciliation de ces cartes en utilisant le critère de parcimonie maximum. La qualité importante des données à analyser nécessite l'automatisation des traitements et résoudre ces problèmes représente de formidables challenges computationnels. Affiner le modèles et outils mathématiques existants par l'ajout de contraintes biologiques fortes rend les hypothèses établies biologiquement plus réalistes. Dans cette thèse, nous proposons une nouvelle méthode permettant d'identifier des marqueurs communs pour des espèces évolutivement distantes. Ensuite, nous appliquons sur les cartes comparatives reconstituées une nouvelle méthode pour la reconstruction d'architectures ancestrales basée sur les adjacences entre les marqueurs calculés et les distances génomiques entre les génomes contemporains. Enfin, après avoir corrigé l'algorithme existant permettant de déterminer une séquence optimale de réarrangements qui se sont produits durant l'évolution des génomes existants depuis leur ancêtre commun, nous proposons un nouvel outil appelé VIRAGE qui permet la visualisation animée des scénarios de réarrangements entre les espèces
Abstract
APA, Harvard, Vancouver, ISO, and other styles
45

Finotello, Francesca. "Computational methods for the analysis of gene expression from RNA sequencing data." Doctoral thesis, Università degli studi di Padova, 2014. http://hdl.handle.net/11577/3423789.

Full text
Abstract:
In every living organism, the entirety of its hereditary information is encoded, in the form of DNA, through the so-called genome. The genome consists in both genes and non-coding sequences and contains the whole information needed to determine all the properties and functions of each single cell. Cells can access and translate specific instructions of this code through gene expression, namely by selectively switching on and off a particular set of genes. Thanks to gene expression, the information encoded into the active genes is transcribed into RNAs. This set of RNAs reflects the current state of a cell and can reveal pathological mechanisms underlying diseases. In recent years, a novel methodology for RNA sequencing, called RNA-seq, is replacing microarrays for the study of gene expression. The sequencing framework of RNA-seq methodology enables to investigate at high resolution all the RNA species present in a sample, characterizing their sequences and quantifying their abundances at the same time. In practice, millions of short sequences, called reads, are sequenced from random positions of the input RNAs. These reads can then be computationally mapped on a reference genome to reveal a transcriptional map, where the number of reads aligned on each gene, called counts, gives a measure of its level of expression. At first glance, this scheme may seem very simple, but the implementation of the whole analysis workflow is in fact complex and not well defined. So far, many computational methods have been proposed to perform the different steps of RNA-seq data analysis, but a unified processing pipeline is still lacking. The aim of my Ph.D. research project was the implementation of a robust computational pipeline for RNA-seq data analysis, from data pre-processing to differential expression detection. The definition of the different analysis modules was carried out through several steps. First, we drafted a basic analysis framework through the study of RNA-seq data features and the dissection of data models and state-of-the-art algorithmic strategies. Then, we focused on count bias, which is one of the most challenging aspects of RNA-seq data analysis. We demonstrated that some biases affecting counts can be effectively corrected with current normalization methods, while others, like length bias, cannot be completely removed without introducing additional systematic errors. Thus, we defined a novel approach to compute RNA-seq counts, which strongly reduces length bias prior to normalization and is robust to the upstream processing steps. Finally, we defined the complete analysis pipeline considering the best preforming methods and optimized some specific processing steps to enable correct expression estimates even in the presence of high-similarity genomic sequences. The implemented analysis pipeline was applied to a real case study to identify the genes involved in the pathogenesis of spinal muscular atrophy (SMA) from RNA-seq data of patients and healthy controls. SMA is a degenerative neuromuscular disease that has no cure and represents one of the major genetic causes of infant mortality. We identified a set of genes related to skeletal muscle and connective tissue disorders whose patterns of differential expression correlate with phenotype and may underlie protective mechanisms against SMA progression. Some putative positive targets identified by this analysis are currently under biological validation since they might improve diagnostic screening and therapy. To pose the basis for future research, which will focus on the optimization of the processing pipeline and to its extension to the analysis of dynamic expression data, we designed two time-series RNA-seq data sets: a real one and a simulated one. The experimental and sequencing design of the real data set, as well as the modelling of the synthetic data, have been an integral part of the Ph.D. activity. Overall, this thesis considers each step of the RNA-seq data processing and provides some valuable guidelines in a fast-evolving research field that, up to now, has prevented the establishment of a stable and standardized analysis scheme.
Il patrimonio genetico di ogni organismo vivente è codificato, sotto forma di DNA, nel genoma. Il genoma è costituito da geni e da sequenze non codificanti e racchiude in sé tutte le informazioni necessarie al corretto funzionamento delle cellule dell'organismo. Le cellule possono accedere a specifiche istruzioni di questo codice tramite un processo chiamato espressione genica, ovvero attivando o disattivando un particolare set di geni e trascrivendo l'informazione necessaria in RNA. L'insieme degli RNA trascritti caratterizza quindi un preciso stato cellulare e può fornire importanti informazioni sui meccanismi coinvolti nella patogenesi di una malattia. Recentemente, una metodologia per il sequenziamento dell'RNA, chiamata RNA-seq, sta rapidamente sostituendo i microarray nello studio dell'espressione genica. Grazie alle proprietà delle tecnologie di sequenziamento su cui è basato, l'RNA-seq permette di misurare il numero di RNA presenti in un campione e al contempo di "leggerne" l'esatta sequenza. In realtà, il sequenziamento produce milioni di sequenze, chiamate "read", che rappresentano piccole stringhe lette da posizioni random degli RNA in input. Le read devono quindi essere mappate con un algoritmo su un genoma di riferimento, in modo da ricostruire una mappa trascrizionale, in cui il numero di read allineate su ciascun gene dà una misura digitale (chiamata "count") del suo livello di espressione. Sebbene a prima vista questa procedura possa sembrare molto semplice, lo schema di analisi integrale è in realtà molto complesso e non ben definito. In questi anni sono stati sviluppati diversi metodi per ciascuna delle fasi di elaborazione, ma non è stata tuttora definita una pipeline di analisi dei dati RNA-seq standardizzata. L'obiettivo principale del mio progetto di dottorato è stato lo sviluppo di una pipeline computazionale per l'analisi di dati RNA-seq, dal pre-processing alla misura dell'espressione genica differenziale. I diversi moduli di elaborazione sono stati definiti e implementati tramite una serie di passi successivi. Inizialmente, abbiamo considerato e ridefinito metodi e modelli per la descrizione e l'elaborazione dei dati, in modo da stabilire uno schema di analisi preliminare. In seguito, abbiamo considerato più attentamente uno degli aspetti più problematici dell'analisi dei dati RNA-seq: la correzione dei bias presenti nei count. Abbiamo dimostrato che alcuni di questi bias possono essere corretti in modo efficace tramite le tecniche di normalizzazione correnti, mentre altri, ad esempio il "length bias", non possono essere completamente rimossi senza introdurre ulteriori errori sistematici. Abbiamo quindi definito e testato un nuovo approccio per il calcolo dei count che minimizza i bias ancora prima di procedere con un'eventuale normalizzazione. Infine, abbiamo implementato la pipeline di analisi completa considerando gli algoritmi più robusti e accurati, selezionati nelle fasi precedenti, e ottimizzato alcun step in modo da garantire stime dell'espressione genica accurate anche in presenza di geni ad alta similarità. La pipeline implementata è stata in seguito applicata ad un caso di studio reale, per identificare i geni coinvolti nella patogenesi dell'atrofia muscolare spinale (SMA). La SMA è una malattia neuromuscolare degenerativa che costituisce una delle principali cause genetiche di morte infantile e per la quale non sono ad oggi disponibili né una cura né un trattamento efficace. Con la nostra analisi abbiamo identificato un insieme di geni legati ad altre malattie del tessuto connettivo e muscoloscheletrico i cui pattern di espressione differenziale correlano con il fenotipo, e che quindi potrebbero rappresentare dei meccanismi protettivi in grado di combattere i sintomi della SMA. Alcuni di questi target putativi sono in via di validazione poiché potrebbero portare allo sviluppo di strumenti efficaci per lo screening diagnostico e il trattamento di questa malattia. Gli obiettivi futuri riguardano l'ottimizzazione della pipeline definita in questa tesi e la sua estensione all'analisi di dati dinamici da "time-series RNA-seq". A questo scopo, abbiamo definito il design di due data set "time-series", uno reale e uno simulato. La progettazione del design sperimentale e del sequenziamento del data set reale, nonché la modellazione dei dati simulati, sono stati parte integrante dell'attività di ricerca svolta durante il dottorato. L'evoluzione rapida e costante che ha caratterizzato i metodi per l'analisi di dati RNA-seq ha impedito fino ad ora la definizione di uno schema di analisi standardizzato e la risoluzione di problematiche legate a diversi aspetti dell'elaborazione, quali ad esempio la normalizzazione. In questo contesto, la pipeline definita in questa tesi e, più in ampiamente, i temi discussi in ciascun capitolo, toccano tutti i diversi aspetti dell'analisi dei dati RNA-seq e forniscono delle linee guida utili a definire un approccio computazionale efficace e robusto.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhong, Cuncong. "Computational Methods for Comparative Non-coding RNA Analysis: From Structural Motif Identification to Genome-wide Functional Classification." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5894.

Full text
Abstract:
Non-coding RNA (ncRNA) plays critical functional roles such as regulation, catalysis, and modification etc. in the biological system. Non-coding RNAs exert their functions based on their specific structures, which makes the thorough understanding of their structures a key step towards their complete functional annotation. In this dissertation, we will cover a suite of computational methods for the comparison of ncRNA secondary and 3D structures, and their applications to ncRNA molecular structural annotation and their genome-wide functional survey. Specifically, we have contributed the following five computational methods. First, we have developed an alignment algorithm to compare RNA structural motifs, which are recurrent RNA 3D structural fragments. Second, we have improved upon the previous alignment algorithm by incorporating base-stacking information and devise a new branch-and-bond algorithm. Third, we have developed a clustering pipeline for RNA structural motif classification using the above alignment methods. Fourth, we have generalized the clustering pipeline to a genome-wide analysis of RNA secondary structures. Finally, we have devised an ultra-fast alignment algorithm for RNA secondary structure by using the sparse dynamic programming technique. A large number of novel RNA structural motif instances and ncRNA elements have been discovered throughout these studies. We anticipate that these computational methods will significantly facilitate the analysis of ncRNA structures in the future.
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
47

Mosquera, Mayo José Luís. "Methods and Models for the Analysis of Biological Signifïcance Based on High­Throughput Data." Doctoral thesis, Universitat de Barcelona, 2014. http://hdl.handle.net/10803/286465.

Full text
Abstract:
The advent of high-throughput technologies has generated a huge quantity of omics data. The results of these experiments usually are long lists of genes that can be used as biomarkers. A major challenge for the researchers is to attribute a biological interpretation or significance to these list of potential biomarkers, by using biological information stored in bioinformatics resources such as the Gene Ontology (GO) or the Kyoto Encyclopedia of Genes and Genomes (KEGG), or combining them with other types of omics data. This dissertation had two main objectives. First, to study mathematical properties of two types of semantic similarity measures for exploring GO categories, and second, to classify and to study the evolution of GO tools for enrichment analysis. The first measure considered was a semantic similarity measure proposed by Lord et al. It is a node- based approach based on the Graph Theory. The second measure actually was a group pseudo- distances proposed Joslyn et al. They were edge-based approaches based on the algebraic point of view of the Partially Ordered Sets (POSET) Theory. So, in order of reaching our objectives, first of all a review and description of main methods about graph theory and POSET theory was carried out. This fact allowed us to realized that there are to ways for mapping objects (e.g. genes) in to the terms of an ontology (e.g. GO). First formulation is called Object-Ontology Complex (OOC). It was proposed by Carey in order to perform statistical computations. Second formulation is called POSET Ontology (POSO) and it was introduced by Joslyn et al. In order to classify the GO for enrichment analysis the first 26 GO available at the website of The GO Consortium were surveyed. These left us list of 205 features that were used for building an Standard Functionalities Set. Based on these functionalities the 26 GO tools were classified according to their capabilities. The study of the GO tools evolution was based on the monitoring of these 26 GO tools. So the statistical analysis consisted of a descriptive statistics, an inferential analysis and a multivariate analysis. With regard to the first objective, we have seen the Lord's measure is the same as the Resnik's measure, previously published. It has observed that there exists a certain level of analogy between the formalization of the OOC and the POSO for mapping genes to objects to terms of an ontology. A property and a corollary to calculate semantic similarity measures from node-based approaches based on a matrix point of view have been proposed. It has been proved that the Lord's measure and the Joslyn's measure can be redefined in terms of metric distance. An R package called sims for computing semantic similarity measures between terms of an arbitrary ontology and comparing semantic similarity profiles based on the GO terms associated with two lists of genes has been developed. Based on the classification of the GO programs a web-based tool called SerbGO devoted to select and compare GO tools stored in was developed. The statistical analysis about the evolution of GO tools suggested that the promoters have introduced improvements over time, but clear models of GO tools have been detected. According to the results of the statistical analysis an ontology called DeGOT was built in order to provide an structured vocabulary for the developers when they dealing with the task of introducing improvements in the existing GO tools for enrichment analysis or designing a new one program. DeGOT can be used for supporting queries and comparison results of SerbGO.
L'aparició de les tecnologies d'alt rendiment ha generat una quantitat ingent de dades òmiques. Els resultats d'aquests experiment són llargues llistes de gens, que poden ser utilitzats com a biomarcadors. Un dels grans reptes dels investigadors experimentals és atribuir una interpretació o significació biològica a aquests biomarcadors potencials, ja be sigui extraient la informació bioblògica emmagatzemada en recursos com la Gene Ontology (GO) o la Kyoto Encyclopedia of Genes and Genomes (KEGG), o be combinant-les amb altres dades òmiques. Els objectius de la tesis eren: primer, estudiar les propietats matemàtiques de dos tipus de mesures de similaritat semàntica per a explorar categories GO, i segon, classificar i estudiar l'evolució de les eines GO per a l'anàlisi d'enriquiment. La primera mesura de similaritat semàntica considerada, proposada per en Lord et al., es fonamentava en la teoria de grafs, i la segona era un grup de pseudo-distàncies, proposades per Joslyn et al., fonamentades en la teoria dels Partially Ordered Sets (POSETs). L'estudi de les eines GO es va basar en les primeres 26 eines disponibles al web del The GO Consortium. S'ha vist que la mesura d'en Lord et al. és la mateixa mesura que la d'en Resnik, anteriorment publicada. S'ha observat una analogia en la forma de mapejar els gens a la GO via grafs i/o via POSETs. S'han proposat una propietat i un corol·lari que permeten calcular matricialment les la primera mesura de similaritat semàntica. S'ha demostrat que ambdues mesures estan associades a la distància mètrica. A'ha desenvolupat un paquet R, anomenat sims, que permet calcular similaritats semàntiques d'una ontologia arbitraria i comparar perfils de similaritat semàntica de la GO. S'ha proposat un Conjunt de Funcionalitats Estàndard per a classificar eines GO i s'ha desenvolupat un programari web, anomenat SerbGO, dirigit a seleccionar i comparar eines GO. L'estudi estadístic ha revelat que els promotors de les eines GO han introduït millores al llarg del temps, però no s'han detectat models ben definits. S'ha desenvolupat una ontologia, anomenada DeGOT, que proporciona un vocabulari als desenvolupadors per a introduir millores a les eines o dissenyar una de nova.
APA, Harvard, Vancouver, ISO, and other styles
48

Sahadevan, Sudeep [Verfasser]. "Application of knowledge discovery and data mining methods in livestokc genomics for hypothesis generation and identification of biomarker candidates influencing meat quality traits in pigs / Sudeep Sahadevan." Bonn : Universitäts- und Landesbibliothek Bonn, 2014. http://d-nb.info/1077268890/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Abecassis, Judith. "Statistical methods for deciphering intra-tumor hereterogeneity : challenges and opportunities for cancer clinical management." Thesis, Université Paris sciences et lettres, 2020. http://www.theses.fr/2020UPSLM065.

Full text
Abstract:
L'obtention du répertoire des gènes de cancer mutés a été déterminant pour notre compréhension de la tumorigénèse. Cependant, les efforts menés pour caractériser les cancers au niveau génétique ne sont pas suffisants pour prédire la survie des patients, ou leur réponse aux traitements, ce qui est essentiel pour améliorer leur prise en charge. Cet échec est en partie attribué au caractère évolutif des cancers. En effet, comme toute population biologique capable d'acquérir des changements héréditaires, les cellules tumorales sont soumises à la sélection naturelle et la dérive génétique, résultant en une structure mosaique, dans laquelle coexistent plusieurs sous-clones ayant des génomes et des propriétés différentes. Cela a d'importantes conséquences sur les traitements anti-cancéreux, puisque ces sous-populations peuvent être sensibles ou résistantes à différentes thérapies, et de nouveaux phénotypes résistants peuvent continuer d'apparaître alors que la maladie continue à progresser. Un nombre importants de méthodes mathématiques ou statistiques a été développé pour détecter et mesurer l'hétérogénéité intra-tumorale (ITH), mais aucune évaluation systématique de leurs performances et de leur application clinique potentielle n'a été effectué. Notre première contribution a donc été de réaliser une étude des approches existantes pour détecter l'hétérogénéité intra-tumorale, pour permettre de naviguer plus facilement entre les idées sous-tendant ces approches. Nous avons aussi proposé un cadre pour analyser la robustesse de ces approches, et leur usage potentiel pour la stratification des patients. Cette enquête approfondie nous a aussi permis d'identifier un type de données encore non exploité pour la reconstruction de l'hétérogénéité intra-tumorale, et notre seconde contribution vise à combler ce manque. En effet, au-delà de la fréquence observée d'une mutation somatique dans un échantillon tumoral, qui permet de distinguer plusieurs clones, le contexte nucléotidique d'une mutation révèle les processus mutationnels causaux et non observables. Nous montrons, à la fois avec des données simulées et réelles la possibilité de modéliser ces deux aspects de l'évolution tumorale conjointement. En conclusion, nous mettons en évidence le besoin de renforcer l'intégration de données de nature ou d'origine multiples pour exploiter pleinement le potentiel de l'évolution tumorale dans la prise en charge clinique du cancer
Accessing the repertoire of cancer somatic alterations has been instrumental in our current understanding of carcinogenesis. However, efforts in genomic characterization of cancers are not sufficient to predict a patient's outcome or response to therapy, which is key to inform their clinical management. This failure is partly attributed to the evolutionary aspect of cancers. Indeed, as any biological population able to acquire heritable transformations, tumor cells are shaped by natural selection and genetic drift, resulting in a mosaic structure, where several subclones with distinct genomes and properties coexist. This has important implications for cancer treatment as those subpopulations can be sensitive or resistant to different therapies, and new resistant phenotypes can keep emerging as the diseases progresses further. An important number of mathematical or statistical methods have been developed to detect and quantify the intra-tumor heterogeneity (ITH), but no systematic evaluation of their performances and potential for clinical application has been performed. Our first contribution consists in a survey of existing approaches to decipher ITH, that allows to navigate the different underlying ideas easily. We have also proposed a framework to assess the robustness of those approaches, and their potential for use in patient stratification. This survey has allowed us to identify an unexploited type of data in the process of ITH reconstruction, and our second contribution fills remedies to this shortfall. Indeed, besides observed prevalences of somatic mutations within a tumor sample that allow us to distinguish several clones, the nucleotidic context of those mutations reveals the unknown causative mutational processes. We illustrate on both simulated and real data the opportunity to jointly model those two aspects of tumor evolution. In conclusion, we highlight the need to reinforce data integration from several sources or samples to harness the potential of tumor evolution for cancer clinical management
APA, Harvard, Vancouver, ISO, and other styles
50

Stephens, Alex J. "The development of rapid genotyping methods for methicillin-resistant Staphylococcus aureus." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/20172/1/Alexander_Stephens_Thesis.pdf.

Full text
Abstract:
Methicillin-resistant Staphylococcus aureus (MRSA) is an important human pathogen that is endemic in hospitals all over the world. It has more recently emerged as a serious threat to the general public in the form of community-acquired MRSA. MRSA has been implicated in a wide variety of diseases, ranging from skin infections and food poisoning to more severe and potentially fatal conditions, including; endocarditis, septicaemia and necrotising pneumonia. Treatment of MRSA disease is complicated and can be unsuccessful due to the bacterium's remarkable ability to develop antibiotic resistance. The considerable economic and public health burden imposed by MRSA has fuelled attempts by researchers to understand the evolution of virulent and antibiotic resistant strains and thereby improve epidemiological management strategies. Central to MRSA transmission management strategies is the implementation of active surveillance programs, via which unique genetic fingerprints, or genotypes, of each strain can be identified. Despite numerous advances in MRSA genotyping methodology, there remains a need for a rapid, reproducible, cost-effective method that is capable of producing a high level of genotype discrimination, whilst being suitable for high throughput use. Consequently, the fundamental aim of this thesis was to develop a novel MRSA genotyping strategy incorporating these benefits. This thesis explored the possibility that the development of more efficient genotyping strategies could be achieved through careful identification, and then simple interrogation, of multiple, unlinked DNA loci that exhibit progressively increasing mutation rates. The baseline component of the MRSA genotyping strategy described in this thesis is the allele-specific real-time PCR interrogation of slowly evolving core single nucleotide polymorphisms (SNPs). The genotyping SNP set was identified previously from the Multi-locus sequence typing (MLST) sequence database using an in-house software package named Minimum SNPs. As discussed in Chapter Three, the genotyping utility of the SNP set was validated on 107 diverse Australian MRSA isolates, which were largely clustered into groups of related strains as defined by MLST. To increase the resolution of the SNP genotyping method, a selection of binary virulence genes and antimicrobial resistance plasmids were tested that were successful at sub typing the SNP groups. A comprehensive MRSA genotyping strategy requires characterisation of the clonal background as well as interrogation of the hypervariable Staphylococcal Cassette Chromosome mec (SCCmec) that carries the β-lactam resistance gene, mecA. SCCmec genotyping defines the MRSA lineages; however, current SCCmec genotyping methods have struggled to handle the increasing number of SCCmec elements resulting from a recent explosion of comparative genomic analyses. Chapter Four of this thesis collates the known SCCmec binary marker diversity and demonstrates the ability of Minimum SNPs to identify systematically a minimal set of binary markers capable of generating maximum genotyping resolution. A number of binary targets were identified that indeed permit high resolution genotyping of the SCCmec element. Furthermore, the SCCmec genotyping targets are amenable for combinatorial use with the MLST genotyping SNPs and therefore are suitable as the second component of the MRSA genotyping strategy. To increase genotyping resolution of the slowly evolving MLST SNPs and the SCCmec binary markers, the analysis of a hypervariable repeat region was required. Sequence analysis of the Staphylococcal protein A (spa) repeat region has been conducted frequently with great success. Chapter Five describes the characterisation of the tandem repeats in the spa gene using real-time PCR and high resolution melting (HRM) analysis. Since the melting rate and precise point of dissociation of double stranded DNA is dependent on the size and sequence of the PCR amplicon, the HRM method was used successfully to identify 20 of 22 spa sequence types, without the need for DNA sequencing. The accumulation of comparative genomic information has allowed the systematic identification of key MRSA genomic polymorphisms to genotype MRSA efficiently. If implemented in its entirety, the strategy described in this thesis would produce efficient and deep-rooted genotypes. For example, an unknown MRSA isolate would be positioned within the MLST defined population structure, categorised based on its SCCmec lineage, then subtyped based on the polymorphic spa repeat region. Overall, by combining the genotyping methods described here, an integrated and novel MRSA genotyping strategy results that is efficacious for both long and short term investigations. Furthermore, an additional benefit is that each component can be performed easily and cost-effectively on a standard real-time PCR platform.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography