Academic literature on the topic 'RNA-seq Data Analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'RNA-seq Data Analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Dissertations / Theses on the topic "RNA-seq Data Analysis"

1

Wang, Qi. "Integrative Data Analysis of Microarray and RNA-seq." Diss., North Dakota State University, 2018. https://hdl.handle.net/10365/29968.

Full text
Abstract:
Background: Microarray and RNA sequencing (RNA-seq) are two commonly used high-throughput technologies for gene expression profiling for the past decades. For global gene expression studies, both techniques are expensive, and each has its unique advantages and limitations. Integrative analysis of these two types of data would provide increased statistical power, reduced cost, and complementary technical advantages. However, the complete different mechanisms of the high-throughput techniques make the two types of data highly incompatible. Methods: Based on the degrees of compatibility, the genes are grouped into different clusters using a novel clustering algorithm, called Boundary Shift Partition (BSP). For each cluster, a linear model is fitted to the data and the number of differentially expressed genes (DEGs) is calculated by running two-sample t-test on the residuals. The optimal number of cluster can be determined using the selection criteria that is penalized on the number of parameters for model fitting. The method was evaluated using the data simulated from various distributions and it was compared with the conventional K-means clustering method, Hartigan-Wong’s algorithm. The BSP algorithm was applied to the microarray and RNA-seq data obtained from the embryonic heart tissues from wild type mice and Tbx5 mice. The raw data went through multiple preprocessing steps including data transformation, quantile normalization, linear model, principal component analysis and probe alignments. The differentially expressed genes between wild type and Tbx5 are identified using the BSP algorithm. Results: The accuracies of the BSP algorithm for the simulation data are higher than those of Hartigan-Wong’s algorithm for the cases with smaller standard deviations across the five different underlying distributions. The BSP algorithm can find the correct number of the clusters using the selection criteria. The BSP method identifies 584 differentially expressed genes between the wild type and Tbx5 mice. A core gene network developed from the differentially expressed genes showed a set of key genes that were known to be important for heart development. Conclusion: The BSP algorithm is an efficient and robust classification method to integrate the data obtained from microarray and RNA-seq.
APA, Harvard, Vancouver, ISO, and other styles
2

Stupnikov, Aleksei. "Statistical models for RNA-seq data analysis of cancer." Thesis, Queen's University Belfast, 2017. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.728670.

Full text
Abstract:
In our research we addressed several major points, related with RNA-seq-based models for Cancer. The first chapter reviews various genomics technologies from the pre-NGS era and most commonSy used NGS platforms, as well as recently developed methods. From here the main concepts of differential expression for SAGE technology and RNA-seq were considered, going on to discuss several the most widely used methods in the field. In the third chapter we formulated the biological problem, that is, reproducibility and robustness of RNA-seq Differential Expression Analysis, and made some general observations on counts distributions of cancer-related RNA-seq data as well as sequencing depth alterations impact on data. In the chapter five we employed this robustness approach to rank the performance of existing differential gene expression (DGE) models and studied effects of subsamping in terms of library, size and number of samples on the outcome of a DGE analysis. In addition, in this chapter we introduced samExploreR - an R package that allows one to implement the sequencing depth altering simulations quickly and efficiently. Building on this work we applied the concept of subsampling to Quadratic - a candidate compound discovery framework based on connectivity mapping and explored its robustness and reproducibility for various, datasets. Finally, in chapter seven we explored how integrating information from different RNA-seq based approaches may affect the resulting outcome of the analysis and studied robustness' of those methods. The approaches adapted in this body of work allowed us to introduce the procedure of subsampling as a quality control measure that can allow an inference of quality when applied to datasets in research and clinical procedures.
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, Yuanhua. "Structured Bayesian methods for splicing analysis in RNA-seq data." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31328.

Full text
Abstract:
In most eukaryotes, alternative splicing is an important regulatory mechanism of gene expression that results in a single gene coding for multiple protein isoforms, thus largely increases the diversity of the proteome. RNA-seq is widely used for genome-wide splicing isoform quantification, and several effective and powerful methods have been developed for splicing analysis with RNA-seq data. However, it remains problematic for genes with low coverages or large number of isoforms. These difficulties may in principle be ameliorated by exploiting correlations encoded in the structured data sources. This thesis contributes to developments of Bayesian methods for splicing analysis by leveraging additional information in multiple datasets with structured prior distributions. First, we developed DICEseq, the first isoform quantification method tailored to time-series RNA-seq experiments. DICEseq explicitly models the correlations between experiments at different time points to aid the quantification of isoforms across experiments. Numerical experiments on both simulated and real datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Second, we developed BRIE (Bayesian Regression for Isoform Estimation), a Bayesian hierarchical model which resolves the difficulties in splicing analysis in single-cell RNA-seq (scRNA-seq) data by learning an informative prior distribution from sequence features. This method combines the quantification and imputation for splicing analysis via a Bayesian way, which is particularly useful in scRNA-seq data due to its extreme low coverages and high technical noises. We validated BRIE on several scRNA-seq data sets, showing that BRIE yields reproducible estimates of exon inclusion ratios in single cells. Third, we provided an effective tool by using Bayes factor to sensitively detect differential splicing between different single cells. When applying BRIE to a few real datasets, we found interesting heterogeneity patterns in splicing events across cell population, for example alternative exons in DNMT3B. In summary, this thesis proposes structured Bayesian methods to integrate multiple datasets to improve splicing analysis and study its biological functions.
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Xiao. "Computational Modeling for Differential Analysis of RNA-seq and Methylation data." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/72271.

Full text
Abstract:
Computational systems biology is an inter-disciplinary field that aims to develop computational approaches for a system-level understanding of biological systems. Advances in high-throughput biotechnology offer broad scope and high resolution in multiple disciplines. However, it is still a major challenge to extract biologically meaningful information from the overwhelming amount of data generated from biological systems. Effective computational approaches are of pressing need to reveal the functional components. Thus, in this dissertation work, we aim to develop computational approaches for differential analysis of RNA-seq and methylation data to detect aberrant events associated with cancers. We develop a novel Bayesian approach, BayesIso, to identify differentially expressed isoforms from RNA-seq data. BayesIso features a joint model of the variability of RNA-seq data and the differential state of isoforms. BayesIso can not only account for the variability of RNA-seq data but also combines the differential states of isoforms as hidden variables for differential analysis. The differential states of isoforms are estimated jointly with other model parameters through a sampling process, providing an improved performance in detecting isoforms of less differentially expressed. We propose to develop a novel probabilistic approach, DM-BLD, in a Bayesian framework to identify differentially methylated genes. The DM-BLD approach features a hierarchical model, built upon Markov random field models, to capture both the local dependency of measured loci and the dependency of methylation change. A Gibbs sampling procedure is designed to estimate the posterior distribution of the methylation change of CpG sites. Then, the differential methylation score of a gene is calculated from the estimated methylation changes of the involved CpG sites and the significance of genes is assessed by permutation-based statistical tests. We have demonstrated the advantage of the proposed Bayesian approaches over conventional methods for differential analysis of RNA-seq data and methylation data. The joint estimation of the posterior distributions of the variables and model parameters using sampling procedure has demonstrated the advantage in detecting isoforms or methylated genes of less differential. The applications to breast cancer data shed light on understanding the molecular mechanisms underlying breast cancer recurrence, aiming to identify new molecular targets for breast cancer treatment.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
5

Hu, Yin. "A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA." UKnowledge, 2013. http://uknowledge.uky.edu/cs_etds/17.

Full text
Abstract:
The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases.
APA, Harvard, Vancouver, ISO, and other styles
6

Turro, Ernest. "Statistcal methods for gene expression analysis using microarray and RNA-Seq data." Thesis, Imperial College London, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.534964.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Abuelqumsan, Mustafa. "Assessment of supervised classification methods for the analysis of RNA-seq data." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0582/document.

Full text
Abstract:
Les technologies « Next Generation Sequencing» (NGS), qui permettent de caractériser les séquences génomiques à un rythme sans précédent, sont utilisées pour caractériser la diversité génétique humaine et le transcriptome (partie du génome transcrite en acides ribonucléiques). Les variations du niveau d’expression des gènes selon les organes et circonstances, sous-tendent la différentiation cellulaire et la réponse aux changements d’environnement. Comme les maladies affectent souvent l’expression génique, les profils transcriptomiques peuvent servir des fins médicales (diagnostic, pronostic). Différentes méthodes d’apprentissage artificiel ont été proposées pour classer des individus sur base de données multidimensionnelles (par exemple, niveau d’expression de tous les gènes dans des d’échantillons). Pendant ma thèse, j’ai évalué des méthodes de « machine learning » afin d’optimiser la précision de la classification d’échantillons sur base de profils transcriptomiques de type RNA-seq<br>Since a decade, “Next Generation Sequencing” (NGS) technologies enabled to characterize genomic sequences at an unprecedented pace. Many studies focused of human genetic diversity and on transcriptome (the part of genome transcribed into ribonucleic acid). Indeed, different tissues of our body express different genes at different moments, enabling cell differentiation and functional response to environmental changes. Since many diseases affect gene expression, transcriptome profiles can be used for medical purposes (diagnostic and prognostic). A wide variety of advanced statistical and machine learning methods have been proposed to address the general problem of classifying individuals according to multiple variables (e.g. transcription level of thousands of genes in hundreds of samples). During my thesis, I led a comparative assessment of machine learning methods and their parameters, to optimize the accuracy of sample classification based on RNA-seq transcriptome profiles
APA, Harvard, Vancouver, ISO, and other styles
8

Wartmann, Hannes [Verfasser]. "Bias Invariant RNA-Seq Data Annotation and Liver Diseases Microbiome Analysis / Hannes Wartmann." Hamburg : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2021. http://d-nb.info/1235244083/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Johnson, Kristen. "Software for Estimation of Human Transcriptome Isoform Expression Using RNA-Seq Data." ScholarWorks@UNO, 2012. http://scholarworks.uno.edu/td/1448.

Full text
Abstract:
The goal of this thesis research was to develop software to be used with RNA-Seq data for transcriptome quantification that was capable of handling multireads and quantifying isoforms on a more global level. Current software available for these purposes uses various forms of parameter alteration in order to work with multireads. Many still analyze isoforms per gene or per researcher determined clusters as well. By doing so, the effects of multireads are diminished or possibly wrongly represented. To address this issue, two programs, GWIE and ChromIE, were developed based on a simple iterative EM-like algorithm with no parameter manipulation. These programs are used to produce accurate isoform expression levels.
APA, Harvard, Vancouver, ISO, and other styles
10

Graf, Alexander. "Analysis of genome activation in early bovine embryos by bioinformatic evaluation of RNA-Seq data." Diss., Ludwig-Maximilians-Universität München, 2015. http://nbn-resolving.de/urn:nbn:de:bvb:19-179386.

Full text
Abstract:
During maternal-to-embryonic transition, control of embryonic development gradually switches from maternal RNAs and proteins stored in the oocyte to gene products generated after embryonic genome activation. Detailed insight into the onset of embryonic transcription is obscured by the presence of maternal transcripts and to date there is no systematic study addressing the activation of specific genes during several stages of early bovine embryo development. Using the bovine model system, comparative analyses of RNA-seq data set were performed. The sequencing libraries had been constructed starting with germinal vesicle (GV) and metaphase II (MII) oocytes and embryos at the four-cell, eight-cell, 16-cell and blastocyst stage. The embryos had been generated in vitro by fertilization of Bos taurus taurus oocytes with sperm of a Bos taurus indicus sire. In total, approximately 13,000 RNA species could be identified in oocytes and each embryonic stages. The number of identified differential abundant transcripts increased in the course of development from roughly 100 to several thousands, with a sharp rise at the eight-cell stage. A bioinformatic approach could be developed to capture maternally delivered and de novo synthesized RNA species separately. It sensitively identified actively transcribed genes despite the fact that comparative analyses failed due to presence of the huge amount of RNA provided by the oocyte. Actively transcribed RNA species could be identified for approximately 8,000 genes, the majority of them at the eight-cell stage. This finding indicated, that the majority of all RNA species provided by oocytes was de novo transcribed during early embryonic development. Furthermore, it could be shown that the de novo transcription of larger genes was initiated later in embryonic development than smaller ones. A procedure was established to identify Bos t. indicus specific SNPs in RNA-Seq datasets which identified more than 60,000 SNPs occurring in 20% of all annotated genes. A major part of these SNPs could be detected at the eight-cell stage. This procedure enables a way to capture and study allele-specific transcription during early embryonic development. The described bioinformatic approaches were used to study major genome activation, an important step in the maternal-to-embryonic transition. More than 4,000 genes were de novo transcribed during major genome activation, which was found to occur at the eight-cell stage. These genes were functionally related to transcription, translation and their regulation. In summary, this thesis created and applied a powerful tool set for bioinformatic dissection of processes occurring during development of early bovine embryos and provided unprecedented insights in major genome activation.
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography