To see the other types of publications on this topic, follow the link: Next-generation sequencing RNA-Seq.

Dissertations / Theses on the topic 'Next-generation sequencing RNA-Seq'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 dissertations / theses for your research on the topic 'Next-generation sequencing RNA-Seq.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Busby, Michele Anne. "Measuring Gene Expression With Next Generation Sequencing Technology." Thesis, Boston College, 2012. http://hdl.handle.net/2345/3145.

Full text
Abstract:
Thesis advisor: Gabor Marth
While a PhD student in Dr. Gabor Marth's laboratory, I have had primary responsibility for two projects focused on using RNA-Seq to measure differential gene expression. In the first project we used RNA-Seq to identify differentially expressed genes in four yeast species and I analyzed the findings in terms of the evolution of gene expression. In this experiment, gene expression was measured using two biological replicates of each species of yeast. While we had several interesting biological findings, during the analysis we dealt with several statistical issues that were caused by the experiment's low number of replicates. The cost of sequencing has decreased rapidly since this experiment's design and many of these statistical issues can now practically be avoided by sequencing a greater number of samples. However, there is little guidance in the literature as to how to intelligently design an RNA-Seq experiment in terms of the number of replicates that are required and how deeply each replicate must be sequenced. My second project, therefore, was to develop Scotty, a web-based program that allows users to perform power analysis for RNA-Seq experiments. The yeast project resulted in a highly accessed first author publication in BMC Genomics in 2011. I have structured my dissertation as follows: The first chapter, entitled General Issues in RNA-Seq, is intended to synthesize the themes and issues of RNA-Seq statistical analysis that were common to both papers. In this section, I have discussed the main findings from the two papers as they relate to analyzing RNA-Seq data. Like the Scotty application, this section is designed to be "used" by wet-lab biologists who have a limited background in statistics. While some background in statistics would be required to fully understand the following chapters, the essence of this background can be gained by reading this first chapter. The second and third chapters contain the two papers that resulted from the two RNA-Seq projects. Each chapter contains both the original manuscript and original supplementary methods and data section. Finally, I include brief summaries of my contributions to the two papers on which I was a middle author. The first was a functional analysis of the genomic regions affected by mobile element insertions as a part of Chip Stewart's paper with the 1000 Genome Consortium. This paper was published in Plos Genetics. The second was a cluster analysis of microarray gene expression in Toxoplasma gondii, which was included as part of Alexander Lorestani et al.'s paper, Targeted proteomic dissection of Toxoplasma cytoskeleton sub-compartments using MORN1. This paper is currently under review. The yeast project was a collaborative effort between Jesse Gray, Michael Springer, and Allen Costa at Harvard Medical School, Jeffery Chuang here at Boston College, and members of the Marth lab. Jesse Gray conceived of the project. While I wrote the draft for the manuscript, many people, particularly Gabor Marth, provided substantial guidance on the actual text. I conceived of and implemented Scotty and wrote its manuscript with only editorial assistance from my co-authors. I produced all figures for the two manuscripts. Chip Stewart provided extensive guidance and mentorship to me on all aspects of my statistical analyses for both projects
Thesis (PhD) — Boston College, 2012
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
APA, Harvard, Vancouver, ISO, and other styles
2

Innocenti, Nicolas. "Data Analysis and Next Generation Sequencing : Applications in Microbiology." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-173219.

Full text
Abstract:
Next Generation Sequencing (NGS) is a new technology that has revolutionized the way we study living organisms. Where previously only a few genes could be studied at a time through targeted direct probing, NGS offers the possibility to perform measurements for a whole genome at once. The drawback is that the amount of data generated in the process is large and extracting useful information from it requires new methods to process and analyze it. The main contribution of this thesis is the development of a novel experimental method coined tagRNA-seq, combining 5’tagRACE, a previously developed technique, with RNA-sequencing technology. Briefly, tagRNA-seq makes it possible to identify the 5’ ends of RNAs in bacteria and directly probe for their type, primary or processed, by ligating short RNA sequences, the tags, to the beginnings of RNA molecules. We used the method to directly probe for transcription start and processing sites in two bacterial species, Escherichiacoli and Enterococcus faecalis. It was also used to study polyadenylation in E. coli, where the ability to identify processed RNA molecules proved to be useful to separate direct and indirect regulatory effects of this mechanism. We also demonstrate how data from tagRNA-seq experiments can be used to increase confidence on the discovery of anti-sense transcripts in bacteria. Analyses of RNA-seq data obtained in the context of these experiments revealed subtle artifacts in the coverage signal towards gene ends, that we were able to explain and quantify based Kolmogorov’s broken stick model. We also discovered evidences for circularization of a few RNA transcripts, both in our own data sets and publicly available data. Designing the tags used in tagRNA-seq led us to the problem of words absent from a text. We focus on a particular subset of these, the minimal absent words (MAWs), and develop a theory providing a complete description of their size distribution in random text. We also show that MAWs in genomes from viruses and living organisms almost always exhibit a behavior different from random texts in the tail of the distribution, and that MAWs from this tail are closely related to sequences present in the genome that preferentially appear in regions with important regulatory functions. Finally, and independently from tagRNA-seq, we propose a new approach to the problem of bacterial community reconstruction in metagenomic, based on techniques from compressed sensing. We provide a novel algorithm competing with state-of-the-art techniques in the field.

QC 20150930

APA, Harvard, Vancouver, ISO, and other styles
3

Espírito, Ana Cláudia Pereira. "Saccharomycotin transcriptomics by next-generation sequencing." Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/15677.

Full text
Abstract:
Mestrado em Biomedicina Molecular
The non-standard decoding of the CUG codon in Candida cylindracea raises a number of questions about the evolutionary process of this organism and other species Candida clade for which the codon is ambiguous. In order to find some answers we studied the transcriptome of C. cylindracea, comparing its behavior with that of Saccharomyces cerevisiae (standard decoder) and Candida albicans (ambiguous decoder). The transcriptome characterization was performed using RNA-seq. This approach has several advantages over microarrays and its application is booming. TopHat and Cufflinks were the software used to build the protocol that allowed for gene quantification. About 95% of the reads were mapped on the genome. 3693 genes were analyzed, of which 1338 had a non-standard start codon (TTG/CTG) and the percentage of expressed genes was 99.4%. Most genes have intermediate levels of expression, some have little or no expression and a minority is highly expressed. The distribution profile of the CUG between the three species is different, but it can be significantly associated to gene expression levels: genes with fewer CUGs are the most highly expressed. However, CUG content is not related to the conservation level: more and less conserved genes have, on average, an equal number of CUGs. The most conserved genes are the most expressed. The lipase genes corroborate the results obtained for most genes of C. cylindracea since they are very rich in CUGs and nothing conserved. The reduced amount of CUG codons that was observed in highly expressed genes may be due, possibly, to an insufficient number of tRNA genes to cope with more CUGs without compromising translational efficiency. From the enrichment analysis, it was confirmed that the most conserved genes are associated with basic functions such as translation, pathogenesis and metabolism. From this set, genes with more or less CUGs seem to have different functions. The key issues on the evolutionary phenomenon remain unclear. However, the results are consistent with previous observations and shows a variety of conclusions that in future analyzes should be taken into consideration, since it was the first time that such a study was conducted.
A descodificação não-standard do codão CUG na Candida cylindracea levanta uma série de questões sobre o processo evolutivo deste organismo e de outras espécies do subtipo Candida para as quais o codão é ambíguo. No sentido de encontrar algumas respostas procedeu-se ao estudo do transcriptoma de C. cylindracea, comparando o seu comportamento com o de Saccharomyces cerevisiae (descodificador standard) e de Candida albicans (descodificador ambíguo). A caracterização do transcriptoma foi realizada a partir de RNA-seq. Esta metodologia apresenta várias vantagens em relação aos microarrays e a sua aplicação encontra-se em franca expansão. TopHat e Cufflinks foram os softwares utilizados na construção do protocolo que permitiu efectuar a quantificação génica. Cerca de 95% das reads alinharam contra o genoma. Foram analisados 3693 genes, 1338 dos quais com codão start não-standard (TTG/CTG) e a percentagem de genoma expresso foi de 99,4%. Maioritarimente, os genes têm níveis de expressão intermédios, alguns apresentam pouca ou nenhuma expressão e uma minoria é altamente expressa. O perfil de distribuição do codão CUG entre as três espécies é muito diferente, mas pode associar-se significativamente aos níveis de expressão: os genes com menos CUGs são os mais altamente expressos. Porém, o conteúdo em CUG não se relaciona com o nível de conservação: genes mais e menos conservados têm, em média, igual número de CUGs. Os genes mais conservados são os mais expressos. Os genes de lipases corroboram os resultados obtidos para os genes de C. cylindracea em geral, sendo muito ricos em CUGs e nada conservados. A quantidade reduzida de codões CUG que se observa em genes altamente expressos pode dever-se, eventualmente, a um número insuficiente de genes de tRNA para fazer face a mais CUGs sem comprometer a eficiência da tradução. A partir da análise de enriquecimento foi possível confirmar que os genes mais conservados estão associados a funções básicas como tradução, patogénese e metabolismo. Dentro destes, os genes com mais e menos CUGs parecem ter funções diferentes. As questões-chave sobre o fenómeno evolutivo permanecem por esclarecer. No entanto, os resultados são compatíveis com as observações anteriores e são apresentadas várias conclusões que em futuras análises devem ser tidas em consideração, já que foi a primeira vez que um estudo deste tipo foi realizado.
APA, Harvard, Vancouver, ISO, and other styles
4

Wan, Mohamad Nazarie Wan Fahmi Bin. "Network-based visualisation and analysis of next-generation sequencing (NGS) data." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28923.

Full text
Abstract:
Next-generation sequencing (NGS) technologies have revolutionised research into nature and diversity of genomes and transcriptomes. Since the initial description of these technology platforms over a decade ago, massively parallel RNA sequencing (RNA-seq) has driven many advances in the characterization and quantification of transcriptomes. RNA-seq is a powerful gene expression profiling technology enabling transcript discovery and provides a far more precise measure of the levels of transcripts and their isoforms than other methods e.g. microarray. However, the analysis of RNA-seq data remains a significant challenge for many biologists. The data generated is large and the tools for its assembly, analysis and visualisation are still under development. Assemblies of reads can be inspected using tools such as the Integrative Genomics Viewer (IGV) where visualisation of results involves ‘stacking’ the reads onto a reference genome. Whilst sufficient for many needs, when the underlying variance of the genome or transcript assemblies is complex, this visualisation method can be limiting; errors in assembly can be difficult to spot and visualisation of splicing events may be challenging. Data visualisation is increasingly recognised as an essential component of genomic and transcriptomic data analysis, enabling large and complex datasets to be better understood. An approach that has been gaining traction in biological research is based on the application of network visualisation and analysis methods. Networks consist of nodes connected by edges (lines), where nodes usually represent an entity and edge a relationship between them. These are now widely used for plotting experimentally or computationally derived relationships between genes and proteins. The overall aim of this PhD project was to explore the use of network-based visualisation in the analysis and interpretation of RNA-seq data. In chapter 2, I describe the development of a data pipeline that has been designed to go from ‘raw’ RNA-seq data to a file format which supports data visualisation as a ‘DNA assembly graph’. In DNA assembly graphs, nodes represent sequence reads and edges denote a homology between reads above a defined threshold. Following the mapping of reads to a reference sequence and defining which reads a map to a given loci, pairwise sequence alignments are performed between reads using MegaBLAST. This provides a weighted similarity score that is used to define edges between reads. Visualisation of the resulting networks is then carried out using BioLayout Express3D that can render large networks in 3-D, thereby allowing a better appreciation of the often-complex network structure. This pipeline has formed the basis for my subsequent work on the exploring and analysing alternative splicing in human RNA-seq data. In the second half of this chapter, I provide a series of tutorials aimed at different types of users allowing them to perform such analyses. The first tutorial is aimed at computational novices who might want to generate networks using a web-browser and pre-prepared data. Other tutorials are designed for use by more advanced users who can access the code for the pipeline through GitHub or via an Amazon Machine Image (AMI). In chapter 3, the utility of network-based visualisations of RNA-seq data is explored using data processed through the pipeline described in Chapter 2. The aim of the work described in this chapter was to better understand the basic principles and challenges associated with network visualisation of RNA-seq data, in particular how it could be used to visualise transcript structure and splice-variation. These analyses were performed on data generated from four samples of human fibroblasts taken at different time points during their entry into cell division. One of the first challenges encountered was the fact that the existing network layout algorithm (Fruchterman- Reingold) implemented within BioLayout Express3D did not result in an optimal layout of the unusual graph structures produced by these analyses. Following the implementation of the more advanced layout algorithm FMMM within the tool, network structure could be far better appreciated. Using this layout method, the majority of genes sequenced to an adequate depth assemble into networks with a linear ‘corkscrew’ appearance and when representing single isoform transcripts add little to existing views of these data. However, in a small number of cases (~5%), the networks generated from transcripts expressed in human fibroblasts possess more complex structures, with ‘loops’, ‘knots’ and multiple ends being observed. In a majority of cases examined, these loops were associated with alternative splicing events, a fact confirmed by RT-PCR analyses. Other DNA assembly networks representing the mRNAs for genes such as MKI67 showed knot-like structures, which was found to be due to the presence of repetitive sequence within an exon of the gene. In another case, CENPO the unusual structure observed was due to reads derived from an overlapping gene of ADCY3 gene present on the opposite strand with reads being wrongly mapped to CENPO. Finally, I explored the use of a network reduction strategy as an approach to visualising highly expressed genes such as GAPDH and TUBA1C. Having successfully demonstrated the utility of networks in analysing transcript isoforms in data derived from a single cell type I set out to explore its utility in analysing transcript variation in tissue data where multiple isoforms expressed by different cells within the tissue might be present in a given sample. In chapter 4, I explore the analysis of transcript variation in an RNA-seq dataset derived from human tissue. The first half of this chapter describes the quality control of these data again using a network-based approach but this time based the correlation in expression between genes and samples. Of the 95 samples derived from 27 human tissues, 77 passed the quality control. A network was constructed using a correlation threshold of r ≥ 0.9, which comprised 6,109 nodes (genes) and 1,091,477 edges (correlations) and clustered. Subsequently, the profile and gene content of each cluster was examined and enrichment of GO terms analysed. In the second half of this chapter, the aim was to detect and analyse alternative splicing events between different tissues using the rMATS tool. By using a false-discovery rate (FDR) cut-off of < 0.01, I found that in comparisons of brain vs. heart, brain vs. liver and heart vs. liver, the program reported 4,992, 4,804 and 3,990 splicing events, respectively. Of these events, only 78 splicing events (52 genes) with more than 50% of exon inclusion level and expression level more than FPKM 30. To further explore the sometimes-complex structure of transcripts diversity derived from tissue, RNAseq assembly networks for KLC1, SORBS2, GUK1, and TPM1 were explored. Each of these networks showed different types of alternative splicing events and it was sometimes difficult to determine the isoforms expressed between tissues using other approaches. For instance, there is an issue in visualising the read assembly of long genes such as KLC1 and SORBS2, using a Sashimi plots or even Vials, just because of the number of exons and the size of their genomic loci. In another case of GUK1, tissue-specific isoform expression was observed when a network of three tissues was combined. Arguably the most complex analysis is the network of TPM1 where the uniquification step was employed for this highly expressed gene. In chapter 5, I perform a usability testing for NGS Graph Generator web application and visualising RNA-seq assemblies as a network using BioLayout Express3D. This test was important to ensure that the application is well received and utilised by the user.
Almost all participants of this usability test agree that this application would encourage biologists to visualise and understand the alternative splicing together with existing tools. The participants agreed that Sashimi plots rather difficult to view and visualise and perhaps would lose something interesting features. However, there were also reviews of this application that need improvements such as the capability to analyse big network in a short time, side-by-side analysis of network with Sashimi plot and Ensembl. Additional information of the network would be necessary to improve the understanding of the alternative splicing. In conclusion, this work demonstrates the utility of network visualisation of RNAseq data, where the unusual structure of these networks can be used to identify issues in assembly, repetitive sequences within transcripts and splice variation. As such, this approach has the potential to significantly improve our understanding of transcript complexity. Overall, this thesis demonstrates that network-based visualisation provides a new and complementary approach to characterise alternative splicing from RNA-seq data and has the potential to be useful for the analysis and interpretation of other kinds of sequencing data.
APA, Harvard, Vancouver, ISO, and other styles
5

Khuder, Basil. "Human Genome and Transcriptome Analysis with Next-Generation Sequencing." University of Toledo Health Science Campus / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=mco1501886695490104.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

BERETTA, STEFANO. "Algorithms for next generation sequencing data analysis." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2013. http://hdl.handle.net/10281/42355.

Full text
Abstract:
Two of the main bioinformatics fields that have been influenced by the introduction of the Next-Generation Sequencing (NGS) techniques are transcriptomics and metagenomics. The adoption of these new methods to sequence DNA/RNA molecules has drastically changed the kind and also the amount of produced data. The effect is that all the developed algorithms and tools working on traditional data cannot be applied on NGS data. For this reason, in this thesis we face two central problems in two fields: transcriptmics and metagenomics. The first one regards the characterization of the Alternative Splicing (AS) events starting from NGS sequences coming from transcripts (called RNA-Seq reads). To this aim we have modeled the structure of a gene, with respect to the AS variations occurring in it, by using a graph representation (called splicing graph). More specifically, we have identified the conditions for the correct reconstruction of the splicing graph, starting from RNA-Seq data, and we have realized an algorithm for its construction. Moreover, our method is able to correct reconstruct the splicing graph even when the input RNA-Seq reads do not respect the identified conditions. Finally, we have performed an experimental analysis of our procedure in order to validated the obtained results. The second problem we face in this thesis is the assignment of NGS read, coming from a metagenomic sample, to a reference taxonomic tree, in order to assess the composition of the sample and classify the unknown micro-organisms in it. This is done by aligning the reads to the taxonomic tree and then choosing (when there are more valid matches) the node that best represents the read. This choice is based on the calculation of a Penalty Score (PS) function for all the nodes descending from the lowest common ancestor of the valid matches in the tree. We have realized an optimal algorithm for the computation of the PS function, based on the so called skeleton tree, which improve the performances of the taxonomic assignment procedure. We have also implemented the method by using more efficient data structures, with respect to the one used in the previous version of the procedure. Finally, we have offered the possibility to switch among different taxonomies by developing a method to map trees and translate the input alignments.
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, Guorong. "RNA CoMPASS: RNA Comprehensive Multi-Processor Analysis System for Sequencing." ScholarWorks@UNO, 2012. http://scholarworks.uno.edu/td/1531.

Full text
Abstract:
The main theme of this dissertation is to develop a distributed computational pipeline for processing next-generation RNA sequencing (RNA-seq) data. RNA-seq experiments generate hundreds of millions of short reads for each DNA/RNA sample. There are many existing bioinformatics tools developed for the analysis and visualization of this data, but very large studies present computational and organizational challenges that are difficult to overcome manually. We designed a comprehensive pipeline for the analysis of RNA sequencing which leverages many existing tools and parallel computing technology to facilitate the analysis of extremely large studies. RNA CoMPASS provides a web-based graphical user interface and distributed computational pipeline including endogenous transcriptome quantification and additionally the investigation of exogenous sequences.
APA, Harvard, Vancouver, ISO, and other styles
8

Christodoulou, Danos C. "Methods for comprehensive transcriptome analysis using next-generation sequencing and application in hypertrophic cardiomyopathy." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:10749.

Full text
Abstract:
Characterization of the RNA transcriptome by next-generation sequencing can produce an unprecedented yield of information that provides novel biologic insights. I describe four approaches for sequencing different aspects of the transcriptome and provide computational tools to analyze the resulting data. Methods that query the dynamic range of gene expression, low expressing transcripts, micro RNA levels, and start-site usage of transcripts are described.
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Guorong. "Computational Pipeline for Human Transcriptome Quantification Using RNA-seq Data." ScholarWorks@UNO, 2011. http://scholarworks.uno.edu/td/343.

Full text
Abstract:
The main theme of this thesis research is concerned with developing a computational pipeline for processing Next-generation RNA sequencing (RNA-seq) data. RNA-seq experiments generate tens of millions of short reads for each DNA/RNA sample. The alignment of a large volume of short reads to a reference genome is a key step in NGS data analysis. Although storing alignment information in the Sequence Alignment/Map (SAM) or Binary SAM (BAM) format is now standard, biomedical researchers still have difficulty accessing useful information. In order to assist biomedical researchers to conveniently access essential information from NGS data files in SAM/BAM format, we have developed a Graphical User Interface (GUI) software tool named SAMMate to pipeline human transcriptome quantification. SAMMate allows researchers to easily process NGS data files in SAM/BAM format and is compatible with both single-end and paired-end sequencing technologies. It also allows researchers to accurately calculate gene expression abundance scores.
APA, Harvard, Vancouver, ISO, and other styles
10

Harrison, Nicole Rezac. "Using next-generation sequencing technologies to develop new molecular markers for the leaf rust resistance gene Lr16." Thesis, Kansas State University, 2014. http://hdl.handle.net/2097/17662.

Full text
Abstract:
Master of Science
Department of Plant Pathology
John P. Fellers
Allan K. Fritz
Leaf rust is caused by Puccinia triticina and is one of the most widespread diseases of wheat worldwide. Breeding for resistance is one of the most effective methods of control. Lr16 is a leaf rust resistance gene that provides partial resistance at the seedling stage. One objective of this study was to use RNA-seq and in silico subtraction to develop new resistance gene analog (RGA) markers linked to Lr16. RNA was isolated from the susceptible wheat cultivar Thatcher (Tc) and the resistant Thatcher isolines TcLr10, TcLr16, and TcLr21. Using in silico subtraction, Tc isoline ESTs that did not align to the Tc reference were assembled into contigs and analyzed using BLAST. Primers were designed from 137 resistance gene analog sequences not found in Tc. A population of 260 F[subscript]2 lines derived from a cross between the rust-susceptible cultivar Chinese Spring (CS) and a Thatcher isoline containing Lr16 (TcLr16) was developed for mapping these markers. Two RGA markers XRGA266585 and XRGA22128 were identified that mapped 1.1 cM and 23.8 cM from Lr16, respectively. Three SSR markers Xwmc764, Xwmc661, and Xbarc35 mapped between these two RGA markers at distances of 4.1 cM, 10.7 cM, and 16.1 cM from Lr16, respectively. Another objective of this study was to use genotyping-by-sequencing (GBS) to develop single nucleotide polymorphism (SNP) markers closely linked to Lr16. DNA from 22 resistant and 22 susceptible F[subscript]2 plants from a cross between CS and TcLr16 was used for GBS analysis. A total of 39 Kompetitive Allele Specific PCR (KASP) markers were designed from SNPs identified using the UNEAK and Tassel pipelines. The KASP marker XSNP16_TP1456 mapped 0.7 cM proximal to Lr16 in a TcxTcLr16 population consisting of 129 F[subscript]2 plants. These results indicate that both techniques are viable methods to develop new molecular markers. RNA-seq and in silico subtraction were successfully used to develop two new RGA markers linked to Lr16, one of which was more closely linked than known SSR markers. GBS was also successfully used on an F[subscript]2 population to develop a KASP marker that is the most closely linked marker to Lr16 to date.
APA, Harvard, Vancouver, ISO, and other styles
11

Risso, Davide. "Simultaneous inference for RNA-Seq data." Doctoral thesis, Università degli studi di Padova, 2012. http://hdl.handle.net/11577/3421731.

Full text
Abstract:
In the last few years, RNA-Seq has become a popular choice for high-throughput studies of gene expression, revealing its potential to overcome microarrays and become the new standard for transcriptional profiling. At a gene-level, RNA-Seq yields counts rather than continuous measures of expression, leading to the need for novel methods to deal with count data in high-dimensional problems. In this Thesis, we aim at shedding light on the problems related to the exploration and modeling of RNA-Seq data. In particular, we introduce simple and effective ways to summarize and visualize the data; we define a novel algorithm for the clustering of RNA-Seq data and we implement simple normalization strategies to deal with technology-related biases. Finally, we present a hierarchical Bayesian approach to the modeling of RNA-Seq data. The model accounts for the difference in sequencing depth, as well as for overdispersion, automatically accounting for different types of normalization.
Negli ultimi anni il sequenziamento massivo di RNA (RNA-Seq) è diventato una scelta frequente per gli studi di espressione genica. Questa tecnica ha il potenziale di superare i microarray come tecnica standard per lo studio dei profili trascrizionali. A livello genico, i dati di RNA-Seq si presentano sotto forma di conteggi, al contrario dei microarray che stimano l’espressione su una scala continua. Questo porta alla necessità di sviluppare nuovi metodi e modelli per l'analisi di dati di conteggio in problemi con dimensionalità elevata. In questa tesi verranno affrontati alcuni problemi relativi all'esplorazione e alla modellazione dei dati di RNA-Seq. In particolare, verranno introdotti metodi per la visualizzazione e il riassunto numerico dei dati. Inoltre si definirà un nuovo algoritmo per il raggruppamento dei dati e alcune strategie per la normalizzazione, volte a eliminare le distorsioni specifiche di questa tecnologia. Infine, verrà definito un modello gerarchico Bayesiano per modellare l'espressione di dati RNA-Seq e verificarne le eventuali differenze in diverse condizioni sperimentali. Il modello tiene in considerazione la profondità di sequenziamento e la sovra-dispersione e automaticamente sviluppa diversi tipi di normalizzazione.
APA, Harvard, Vancouver, ISO, and other styles
12

Choudhry, Hani. "Genome-wide analysis of the hypoxic breast cancer transcriptome using next generation sequencing." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:9a66b553-a66c-4164-a854-5881be65ca45.

Full text
Abstract:
Hypoxia pathways are associated with the pathogenesis of both ischaemic and neoplastic diseases. In response to hypoxia the transcription factor hypoxia‐inducible factor (HIF) induces the expression of hundreds of genes with diverse functions. These enable cells to adapt to low oxygen availability. To date, pan-genomic analyses of these transcriptional responses have focussed on protein-coding genes and microRNAs. However, the role of other classes of non-coding RNAs, in particular lncRNAs, in the hypoxia response is largely uncharacterised. My thesis aimed at improving understanding of the transcriptional regulation of the non-coding transcriptome in hypoxia. I performed an integrated genomic analysis of both non-coding and coding transcripts by massively parallel sequencing. This was interfaced with pan-genomic analyses of DNAse hypersensitivity and HIF, H3k4me3 and RNApol2 binding in hypoxic cells. These analyses have revealed that hypoxia profoundly regulated all RNA classes. snRNAs and tRNAs are globally downregulated in hypoxia, whilst miRNAs, mRNAs and lncRNAs are both up- and downregulated with an overall trend towards slight upregulation. In addition, a significant number of previously non-annotated (and largely hypoxia upregulated) transcripts were identified, including novel intergenic transcripts and natural antisense transcripts. HIF bound close to genes for mRNAs, miRNAs and lncRNAs that were upregulated by hypoxia, but was excluded from binding at genes for RNA classes that showed global downregulation. This suggests that HIF acts as a transcriptional activator (but not repressor), of lncRNAs as well as mRNAs and miRNAs. Consistent with direct regulation by HIF, many of these hypoxia-inducible, HIF-binding lncRNAs were downregulated following HIF knockdown. Analysis of RNApol2 binding and DNAse HSS signals at HIF transcriptional target genes indicated that HIF-dependent transcriptional activation occurs through release of RNApol2 that is pre-bound to open promoters of lncRNAs as well as mRNAs. In these datasets, NEAT1 was the most hypoxia-upregulated, HIF-targeted lncRNA in MCF-7 cells and, despite binding of both HIF-1 and HIF-2 isoforms at its promoter, was selectively regulated by HIF-2 alone. Furthermore, NEAT1 was induced by hypoxia in a wide range of breast cancer cell lines and in hypoxic xenograft models. Functionally, NEAT1 is required for the assembly of nuclear paraspeckle structures. Increased nuclear paraspeckle formation was observed in hypoxia and was dependent on both NEAT1 and HIF-2. Knockdown of hypoxia-induced NEAT1 significantly reduced cell proliferation and survival and induced apoptosis. Finally, high expression of NEAT1 correlated with poor clinical outcome in a large cohort of breast cancer patients. These findings extend the role of the hypoxic transcriptional response in cancer into the spectrum of non-coding transcripts and provide new insights into molecular roles of hypoxia-regulated lncRNAs, which may provide the basis for novel therapeutic targets in the future.
APA, Harvard, Vancouver, ISO, and other styles
13

Nguyen, Viet Tuan. "An evaluation of potential candidate genes involved in salinity tolerance in striped catfish (Pangasianodon hypopthalmus) using an RNA-SEQ approach." Thesis, Queensland University of Technology, 2015. https://eprints.qut.edu.au/84924/4/Viet_Tuan_Nguyen_Thesis.pdf.

Full text
Abstract:
The project investigated the molecular response of Tra catfish (Pangasianodon hypophthalmus) to elevated salinity conditions. We employed Next generation sequencing platform to evaluate differential gene expression profiles of key genes under two salinity conditions. Results of the current project can form the basis for further studies to confirm the functional roles of specific genes that influence salinity tolerance in the target species and more broadly in other freshwater teleost fishes. Ultimately, the approach can contribute to developing superior culture stocks of the target species.
APA, Harvard, Vancouver, ISO, and other styles
14

Finotello, Francesca. "Computational methods for the analysis of gene expression from RNA sequencing data." Doctoral thesis, Università degli studi di Padova, 2014. http://hdl.handle.net/11577/3423789.

Full text
Abstract:
In every living organism, the entirety of its hereditary information is encoded, in the form of DNA, through the so-called genome. The genome consists in both genes and non-coding sequences and contains the whole information needed to determine all the properties and functions of each single cell. Cells can access and translate specific instructions of this code through gene expression, namely by selectively switching on and off a particular set of genes. Thanks to gene expression, the information encoded into the active genes is transcribed into RNAs. This set of RNAs reflects the current state of a cell and can reveal pathological mechanisms underlying diseases. In recent years, a novel methodology for RNA sequencing, called RNA-seq, is replacing microarrays for the study of gene expression. The sequencing framework of RNA-seq methodology enables to investigate at high resolution all the RNA species present in a sample, characterizing their sequences and quantifying their abundances at the same time. In practice, millions of short sequences, called reads, are sequenced from random positions of the input RNAs. These reads can then be computationally mapped on a reference genome to reveal a transcriptional map, where the number of reads aligned on each gene, called counts, gives a measure of its level of expression. At first glance, this scheme may seem very simple, but the implementation of the whole analysis workflow is in fact complex and not well defined. So far, many computational methods have been proposed to perform the different steps of RNA-seq data analysis, but a unified processing pipeline is still lacking. The aim of my Ph.D. research project was the implementation of a robust computational pipeline for RNA-seq data analysis, from data pre-processing to differential expression detection. The definition of the different analysis modules was carried out through several steps. First, we drafted a basic analysis framework through the study of RNA-seq data features and the dissection of data models and state-of-the-art algorithmic strategies. Then, we focused on count bias, which is one of the most challenging aspects of RNA-seq data analysis. We demonstrated that some biases affecting counts can be effectively corrected with current normalization methods, while others, like length bias, cannot be completely removed without introducing additional systematic errors. Thus, we defined a novel approach to compute RNA-seq counts, which strongly reduces length bias prior to normalization and is robust to the upstream processing steps. Finally, we defined the complete analysis pipeline considering the best preforming methods and optimized some specific processing steps to enable correct expression estimates even in the presence of high-similarity genomic sequences. The implemented analysis pipeline was applied to a real case study to identify the genes involved in the pathogenesis of spinal muscular atrophy (SMA) from RNA-seq data of patients and healthy controls. SMA is a degenerative neuromuscular disease that has no cure and represents one of the major genetic causes of infant mortality. We identified a set of genes related to skeletal muscle and connective tissue disorders whose patterns of differential expression correlate with phenotype and may underlie protective mechanisms against SMA progression. Some putative positive targets identified by this analysis are currently under biological validation since they might improve diagnostic screening and therapy. To pose the basis for future research, which will focus on the optimization of the processing pipeline and to its extension to the analysis of dynamic expression data, we designed two time-series RNA-seq data sets: a real one and a simulated one. The experimental and sequencing design of the real data set, as well as the modelling of the synthetic data, have been an integral part of the Ph.D. activity. Overall, this thesis considers each step of the RNA-seq data processing and provides some valuable guidelines in a fast-evolving research field that, up to now, has prevented the establishment of a stable and standardized analysis scheme.
Il patrimonio genetico di ogni organismo vivente è codificato, sotto forma di DNA, nel genoma. Il genoma è costituito da geni e da sequenze non codificanti e racchiude in sé tutte le informazioni necessarie al corretto funzionamento delle cellule dell'organismo. Le cellule possono accedere a specifiche istruzioni di questo codice tramite un processo chiamato espressione genica, ovvero attivando o disattivando un particolare set di geni e trascrivendo l'informazione necessaria in RNA. L'insieme degli RNA trascritti caratterizza quindi un preciso stato cellulare e può fornire importanti informazioni sui meccanismi coinvolti nella patogenesi di una malattia. Recentemente, una metodologia per il sequenziamento dell'RNA, chiamata RNA-seq, sta rapidamente sostituendo i microarray nello studio dell'espressione genica. Grazie alle proprietà delle tecnologie di sequenziamento su cui è basato, l'RNA-seq permette di misurare il numero di RNA presenti in un campione e al contempo di "leggerne" l'esatta sequenza. In realtà, il sequenziamento produce milioni di sequenze, chiamate "read", che rappresentano piccole stringhe lette da posizioni random degli RNA in input. Le read devono quindi essere mappate con un algoritmo su un genoma di riferimento, in modo da ricostruire una mappa trascrizionale, in cui il numero di read allineate su ciascun gene dà una misura digitale (chiamata "count") del suo livello di espressione. Sebbene a prima vista questa procedura possa sembrare molto semplice, lo schema di analisi integrale è in realtà molto complesso e non ben definito. In questi anni sono stati sviluppati diversi metodi per ciascuna delle fasi di elaborazione, ma non è stata tuttora definita una pipeline di analisi dei dati RNA-seq standardizzata. L'obiettivo principale del mio progetto di dottorato è stato lo sviluppo di una pipeline computazionale per l'analisi di dati RNA-seq, dal pre-processing alla misura dell'espressione genica differenziale. I diversi moduli di elaborazione sono stati definiti e implementati tramite una serie di passi successivi. Inizialmente, abbiamo considerato e ridefinito metodi e modelli per la descrizione e l'elaborazione dei dati, in modo da stabilire uno schema di analisi preliminare. In seguito, abbiamo considerato più attentamente uno degli aspetti più problematici dell'analisi dei dati RNA-seq: la correzione dei bias presenti nei count. Abbiamo dimostrato che alcuni di questi bias possono essere corretti in modo efficace tramite le tecniche di normalizzazione correnti, mentre altri, ad esempio il "length bias", non possono essere completamente rimossi senza introdurre ulteriori errori sistematici. Abbiamo quindi definito e testato un nuovo approccio per il calcolo dei count che minimizza i bias ancora prima di procedere con un'eventuale normalizzazione. Infine, abbiamo implementato la pipeline di analisi completa considerando gli algoritmi più robusti e accurati, selezionati nelle fasi precedenti, e ottimizzato alcun step in modo da garantire stime dell'espressione genica accurate anche in presenza di geni ad alta similarità. La pipeline implementata è stata in seguito applicata ad un caso di studio reale, per identificare i geni coinvolti nella patogenesi dell'atrofia muscolare spinale (SMA). La SMA è una malattia neuromuscolare degenerativa che costituisce una delle principali cause genetiche di morte infantile e per la quale non sono ad oggi disponibili né una cura né un trattamento efficace. Con la nostra analisi abbiamo identificato un insieme di geni legati ad altre malattie del tessuto connettivo e muscoloscheletrico i cui pattern di espressione differenziale correlano con il fenotipo, e che quindi potrebbero rappresentare dei meccanismi protettivi in grado di combattere i sintomi della SMA. Alcuni di questi target putativi sono in via di validazione poiché potrebbero portare allo sviluppo di strumenti efficaci per lo screening diagnostico e il trattamento di questa malattia. Gli obiettivi futuri riguardano l'ottimizzazione della pipeline definita in questa tesi e la sua estensione all'analisi di dati dinamici da "time-series RNA-seq". A questo scopo, abbiamo definito il design di due data set "time-series", uno reale e uno simulato. La progettazione del design sperimentale e del sequenziamento del data set reale, nonché la modellazione dei dati simulati, sono stati parte integrante dell'attività di ricerca svolta durante il dottorato. L'evoluzione rapida e costante che ha caratterizzato i metodi per l'analisi di dati RNA-seq ha impedito fino ad ora la definizione di uno schema di analisi standardizzato e la risoluzione di problematiche legate a diversi aspetti dell'elaborazione, quali ad esempio la normalizzazione. In questo contesto, la pipeline definita in questa tesi e, più in ampiamente, i temi discussi in ciascun capitolo, toccano tutti i diversi aspetti dell'analisi dei dati RNA-seq e forniscono delle linee guida utili a definire un approccio computazionale efficace e robusto.
APA, Harvard, Vancouver, ISO, and other styles
15

SADEGHI, DEHCHESHMEH IMAN. "THE GENETIC OVERLAP BETWEEN NEUROPSYCHIATRIC DISORDERS: A META-ANALYSIS OF NEXT GENERATION SEQUENCING DATA." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/793613.

Full text
Abstract:
Neurodegenerative and neuropsychiatric disorders (NDD-NPDs) are multifactorial, polygenic and complex behavioral phenotypes caused by brain abnormalities. Most genetic studies have focused on understanding the genetic component of specific brain diseases. Several brain diseases also show similar clinical and pathological symptoms. In recen years, multiple studies have used next generation sequencing (NGS) technologies such as RNA sequencing (RNA-Seq) to investigate molecular signature of brain diseases. However, many studies have only focused on a particular disease and limited brain regions. By using the data from a broad range of cortical regions from multiple brain diseases, we will be able to dig deeper into the molecular basis of neurological diseases. The main aim of this thesis was to examine the transcriptome-wide characterization of cortical brain regions across neurological disorders. We focused our research efforts on highlighting cross-disease shared molecular signatures, and exploring co-expression networks and cell-type-specific patterns for NDD-NPDs. By processing and analyzing RNA-Seq data using a set of computational tools and statistical tests, we performed transcriptomic profiling of brain samples from eight groups of patients with Alzheimer’s disease (AD), Parkinson’s disease (PD), Progressive Supranuclear Palsy (PSP), Pathological Aging (PA), Autism Spectrum Disorder (ASD), Schizophrenia (SZ), Major Depressive Disorder (MDD), and Bipolar Disorder (BP)-in comparison with 2,078 brain samples from matched control subjects. In this thesis, we provide a transcriptomic framework to understand the molecular architecture of NPDs and NDDs through their shared- and specific gene expression in the brain.
APA, Harvard, Vancouver, ISO, and other styles
16

Mittal, Vinay K. "Detection and characterization of gene-fusions in breast and ovarian cancer using high-throughput sequencing." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/54014.

Full text
Abstract:
Gene-fusions are a prevalent class of genetic variants that are often employed as cancer biomarkers and therapeutic targets. In recent years, high-throughput sequencing of the cellular genome and transcriptome have emerged as a promising approach for the investigation of gene-fusions at the DNA and RNA level. Although, large volumes of sequencing data and complexity of gene-fusion structures presents unique computational challenges. This dissertation describes research that first addresses the bioinformatics challenges associated with the analysis of the massive volumes of sequencing data by developing bioinformatics pipeline and more applied integrated computational workflows. Application of high-throughput sequencing and the proposed bioinformatics approaches for the breast and ovarian cancer study reveals unexpected complex structures of gene-fusions and their functional significance in the onset and progression of cancer. Integrative analysis of gene-fusions at DNA and RNA level shows the key importance of the regulation of gene-fusion at the transcription level in cancer.
APA, Harvard, Vancouver, ISO, and other styles
17

Bukhari, Ghadeer, and Wenheng Zhang. "INDEPENDENT ORIGINATION OF FLORAL ZYGOMORPHY, A PREDICTED ADAPTIVE RESPONSE TO POLLINATORS: DEVELOPMENTAL AND GENETIC MECHANISMS." VCU Scholars Compass, 2016. http://scholarscompass.vcu.edu/etd/4482.

Full text
Abstract:
Observations of floral development indicate that floral organ initiation in pentapetalous flowers more commonly results in a medially positioned abaxial petal (MAB) than in a medially positioned adaxial petal (MAD), where the medial plane is defined by the stem and the bract during early floral development. It was proposed that the dominant MAB petal initiation might impose a developmental constraint that leads to the evolution of limited patterns of floral zygomorphy in Asteridae, a family in which the floral zygomorphy develops along the medial plane and results in a central ventral (CV) petal in mature flowers. Here, I investigate whether the pattern of floral organ initiation may limit patterns of floral zygomorphy to evolve in pentapetalous angiosperms. I analyzed floral diagrams representing 405 species in 330 genera of pentapetalous angiosperms to reconstruct the evolution of floral organ initiation and the evolution of developmental processes that give rise to floral zygomorphy on a phylogenetic framework. Results indicate that MAB petal initiation is the most common; it occupies 86.2% of diversity and represents the ancestral state of floral organ initiation in pentapetalous angiosperms. The MAD petal initiation evolved 28 times independently from the ancestral MAB petal initiation. Among the 34 independent originations of floral zygomorphy, 76.5% of these clades represent MAB petal initiation, among which only 47% of the clades result a CV petal in mature flowers. The discrepancy is explained by the existence of developmental processes that result in floral zygomorphy along oblique planes of floral symmetry in addition to along the medial plane. Findings suggest that although the early floral organ initiation plays a constraining role to the evolution of patterns of floral zygomorphy, the constraint diverges along phylogenetically distantly related groups that allow the independent originations of floral zygomorphy through distinct development processes in pentapetalous angiosperms. In additional study, the butterfly-like flowers of Schizanthus are adapted to pollination by bees, hummingbirds, and moths. I investigated the genetic basis of the zygomorphic corolla, for which development is key to the explosive pollen release mechanism found in the species of Schizanthus adapted to bee pollinators. I examined differential gene expression profiles across the zygomorphic corolla of Schizanthus pinnatus, a bee-pollinated species, by analyzing RNA transcriptome sequencing (RNA- seq). Data indicated that CYC2 is not expressed in the zygomorphic corolla of Sc. pinnatus, suggesting CYC2 is not involved in the development of floral zygomorphy in Schizanthus (Solanaceae). The data also indicated that a number of genes are differentially expressed across the corolla.
APA, Harvard, Vancouver, ISO, and other styles
18

Lee, Jiyoung. "Computational Analysis of Gene Expression Regulation from Cross Species Comparison to Single Cell Resolution." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99878.

Full text
Abstract:
Gene expression regulation is dynamic and specific to various factors such as developmental stages, environmental conditions, and stimulation of pathogens. Nowadays, a tremendous amount of transcriptome data sets are available from diverse species. This trend enables us to perform comparative transcriptome analysis that identifies conserved or diverged gene expression responses across species using transcriptome data. The goal of this dissertation is to develop and apply approaches of comparative transcriptomics to transfer knowledge from model species to non-model species with the hope that such an approach can contribute to the improvement of crop yield and human health. First, we presented a comprehensive method to identify cross-species modules between two plant species. We adapted the unsupervised network-based module finding method to identify conserved patterns of co-expression and functional conservation between Arabidopsis, a model species, and soybean, a crop species. Second, we compared drought-responsive genes across Arabidopsis, soybean, rice, corn, and Populus in order to explore the genomic characteristics that are conserved under drought stress across species. We identified hundreds of common gene families and conserved regulatory motifs between monocots and dicots. We also presented a BLS-based clustering method which takes into account evolutionary relationships among species to identify conserved co-expression genes. Last, we analyzed single-cell RNA-seq data from monocytes to attempt to understand regulatory mechanism of innate immune system under low-grade inflammation. We identified novel subpopulations of cells treated with lipopolysaccharide (LPS), that show distinct expression patterns from pro-inflammatory genes. The data revealed that a promising therapeutic reagent, sodium 4-phenylbutyrate, masked the effect of LPS. We inferred the existence of specific cellular transitions under different treatments and prioritized important motifs that modulate the transitions using feature selection by a random forest method. There has been a transition in genomics research from bulk RNA-seq to single-cell RNA-seq, and scRNA-seq has become a widely used approach for transcriptome analysis. With the experience we gained by analyzing scRNA-seq data, we plan to conduct comparative single-cell transcriptome analysis across multiple species.
Doctor of Philosophy
All cells in an organism have the same set of genes, but there are different cell types, tissues, organs with different functions as the organism ages or under different conditions. Gene expression regulation is one mechanism that modulates complex, dynamic, and specific changes in tissues or cell types for any living organisms. Understanding gene regulation is of fundamental importance in biology. With the rapid advancement of sequencing technologies, there is a tremendous amount of gene expression data (transcriptome) from individual species in public repositories. However, major studies have been reported from several model species and research on non-model species have relied on comparison results with a few model species. Comparative transcriptome analysis across species will help us to transform knowledge from model species to non-model species and such knowledge transfer can contribute to the improvement of crop yields and human health. The focus of my dissertation is to develop and apply approaches for comparative transcriptome analysis that can help us better understand what makes each species unique or special, and what kinds of common functions across species have been passed down from ancestors (evolutionarily conserved functions). Three research chapters are presented in this dissertation. First, we developed a method to identify groups of genes that are commonly co-expressed in two species. We chose seed development data from soybean with the hope to contribute to crop improvement. Second, we compared gene expression data across five plant species including soybean, rice, and corn to provide new perspectives about crop plants. We chose drought stress to identify conserved functions and regulatory factors across species since drought stress is one of the major stresses that negatively impact agricultural production. We also proposed a method that groups genes with evolutionary relationships from an unlimited number of species. Third, we analyzed single-cell RNA-seq data from mouse monocytes to understand the regulatory mechanism of the innate immune system under low-grade inflammation. We observed how innate immune cells respond to inflammation that could cause no symptoms but persist for a long period of time. Also, we reported an effect of a promising therapeutic reagent (sodium 4-phenylbutyrate) on chronic inflammatory diseases. The third project will be extended to comparative single-cell transcriptome analysis with multiple species.
APA, Harvard, Vancouver, ISO, and other styles
19

CROCI, OTTAVIO. "GENOMIC LANDSCAPE AND TRANSCRIPTIONAL REGULATION BY YAP AND MYC IN THE LIVER." Doctoral thesis, Università degli Studi di Milano, 2018. http://hdl.handle.net/2434/556194.

Full text
Abstract:
This thesis is divided in three sections; the main project is described in the first part, while additional projects are developed in two appendixes. In the main project we studied YAP, the downstream effector of the Hippo pathway, a transcriptional co-factor that plays a fundamental role in de-differentiation, cell proliferation and transformation. While its upstream regulation has been extensively studied, its role as transcriptional co-factor is still poorly understood. We show that YAP co-adjuvates the transcriptional responses of Myc oncogene to promote cell proliferation and transformation; when both YAP and Myc are overexpressed, YAP is recruited on genomic sites pre-marked by Myc, TEAD and active chromatin and potentiate the expression of cell cycle genes regulated by Myc. In addition, we show that YAP promotes cell de-differentiation by antagonizing in cis the expression of liver-specific genes controlled by HNF4A master regulator, thus providing a mechanism on how YAP can revert the phenotype of a differentiated hepatocyte into a progenitor cell. In the first appendix we explain the mechanism of BRD4 inhibition, a promising strategy for the treatment of Myc-driven tumors. The efficacy of this strategy relies on the control of transcriptional elongation mediated by BRD4 on gene promoters, independently of the downregulation of Myc oncogene. Although the inhibition of BRD4 causes its genome-wide displacement on promoters, the effects on transcription are restricted to a subset of sensitive genes. This specificity relies on the fact that while most genes compensate the drop in elongation caused by BRD4 inhibition with further recruitment of RNA Pol2 on promoters and maintain a proficient mRNA transcription, vulnerable genes are not able to promote these compensatory effects, because RNA Pol2 recruitment on these promoters is already maximized. Our results show how the impairment of elongation genome-wide can affect specific transcriptional programs. In the second appendix we describe a new web application, Chrokit, aimed at analyzing genomic data in a fast and intuitive way. Chrokit handles a set of genomic regions of interest and performs several tasks on them, such as selecting particular subsets, computing overlaps and visualize reads enrichment of specific chromatin features interactively. The application is multiplatform and can be run on dedicated servers to maximize computational power and provide accessibility to multiple users simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Biao. "Development and Application of Genomic Resources in Non-model Bird Species." Doctoral thesis, Uppsala, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-183645.

Full text
Abstract:
Understanding the genetic basis of biological processes is a fundamental component of modern ecology and evolutionary biology studies. With the recent advent of next generation sequencing (NGS) technologies, it is now possible to perform large genome and transcriptome projects for ecologically important non-model species. In this thesis, I focused on the development and application of genomic resources of two non-model bird species, the black grouse (Tetrao tetrix) and the great snipe (Gallinago media). Using the chicken genome as a reference, I developed a reference guided NGS pipeline to assemble the complete draft genome of black grouse. The draft genome has a good coverage of the main 29 chromosomes of the chicken genome. The genome was used to develop a vast number of genetic markers. Comparing this genome with that of other species, I identified the genomic regions which were important for the lineage specific evolution of black grouse. I also sequenced and characterised the spleen transcriptome of the black grouse. I identified and validated a large number of gene-based microsatellite markers from the transcriptome and identified and confirmed the expression of immune related genes. Using a similar RNA-Seq approach, I also sequenced the blood transcriptomes of 14 great snipe males with different mating success. I identified genes and single nucleotide polymorphisms (SNPs) which might be related to male mating success in this species, both in terms of gene expression levels and genetic variation structure. For the immunologically important major histocompatibility complex (MHC) gene region of black grouse, I constructed a fosmid library and used it to sequence the complete core MHC region of this species. This resource allowed me to perform a comprehensive comparative genomics analysis of the galliform MHC, by which I found that some genes in this region were affected by selective forces. I was also able to develop a single locus genotyping protocol for the duplicated MHC BLB (class IIB) genes and found that the two black grouse BLB loci followed different evolutionary trajectories. This thesis set an example of developing genomic resources in non-model species and applying them in addressing questions relevant to ecology and evolutionary biology.
APA, Harvard, Vancouver, ISO, and other styles
21

Kruse, Colin Peter Singer. "Data-Enabled Approach to Characterize Dynamic Regulatory Pathways in Two Kingdoms." Ohio University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1573746719306039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Grimaldi, Alexis. "Interactions croisées entre hormones thyroïdiennes et glucocorticoïdes durant la métamorphose de Xenopus tropicalis." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA11T020/document.

Full text
Abstract:
La métamorphose des amphibiens est le processus rapide et irréversible par lequel un têtard aquatique se transforme en une grenouille respirant à la surface. Cette transition écologique, réminiscente de la période périnatale chez les mammifères, s'accompagne de changements spectaculaires (régime alimentaire, organes locomoteurs, système respiratoire...). Ces modifications morphologiques et physiologiques nécessitent la réponse concertée à un signal hormonal, les hormones thyroïdiennes (HT), de différents tissus vers des destin parfois opposés : apoptose (dans la queue), prolifération (dans les pattes), et remodelage (dans les intestins et le système nerveux central). Toutefois, la synchronisation de la réponse des différents tissus fait appel à d'autres signaux hormonaux, et notamment les glucocorticoïdes (GC). Ces derniers sont également les médiateurs principaux de la réponse au stress. Les processus endocriniens de la métamorphose et la réponse au stress sont fortement couplés. Les GC peuvent ainsi jouer le rôle d'interface permettant l'intégration de signaux environnementaux au niveau de réseaux de régulation. Dans le cadre de mon doctorat, j'ai analysé les transcriptomes des bourgeons de membres postérieurs et de l'épiderme caudal de têtards de Xenopus tropicalis traités ponctuellement avec des HT et / ou des GC. La comparaison de ces deux tissus a permis de caractériser la diversité des profils d'expression des gènes cibles des HT et des GC.Il en ressort plusieurs résultats majeurs. Tout d'abord, la diversité des profils d'interaction entre ces deux voies est limitée, et la majorité des types de profils sont communs aux deux tissus. Indépendamment du tissu, certains profils sont caractéristiques de fonctions biologiques spécifiques comme le remodelage de la matrice extracellulaire et le système immunitaire. Les gènes impliqués dans ces fonctions communes aux deux tissus sont cependant différents. Enfin, plusieurs facteurs impliqués dans la méthylation de l'ADN sont régulés par les deux hormones
Amphibian metamorphosis is the rapid and irreversible process during which an aquatic tadpole transforms into an air breathing adult frog. This ecological transition, reminiscent of the mammalian perinatal period, comes with spectacular changes (diet, locmotor organs, respiratory system...). These morphological and physiological modifications necessitate the properly timed response to a single hormonal signal, the thyroid hormones (TH), in various tissues to lead them to sometimes opposite fates : apoptosis (in the tail), cell prolifération and differenciation (in the limbs) and remodeling (in the intestine and the central nervous system).However, TH do not act alone. In particular, glucocorticoids (GC) play important roles during this process. They also are the main mediator of the stress response. Endocrine processes of the metamorphosis and the stress response are deeply intertwined. GC can thus act as an interface to integrate environmental inputs into regulatory networks.During my doctorate, I analyzed the possible transcriptional crosstalks between TH and GC in two larval tissues : the tailfin (TF) and the hindlimb buds (HLB). Comparing these two tissues allowed me to caracterize the diversity of TH and GC target gene expression profiles. This resulted in several major results. First, the diversity of the profiles of crosstalk between these two pathways is limited, and the majority of the types of profiles is common to both tissues. Next, independently ofthe tissues, some profiles are caracteristic of spécific biological functions such as extracellular matrix remodeling and the immune system. Yet, the genes involved in these shared functions are different between the TF and the HLB. Finally, several factors involved in DNA methylation are subject to a crosstalk between the two hormones
APA, Harvard, Vancouver, ISO, and other styles
23

Hrazdilová, Ivana. "Analýza dat ze sekvenování příští generace ke studiu aktivity transposonů v nádorových buňkách." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220061.

Full text
Abstract:
Theoretical part of this diploma thesis gives a brief characteristic of human mobile elements (transposons), which represents nearly 50% of human genome. It provides basic transposon clasification and describes types of transposons present in hunam genome, as well as mobilization, activation and regulation mechanisms. The work also deals with the domestication of transposons, describes the ways in which TE contribute to DNA damage and summarizes the diseases caused by mutagenic activity of transposons in the human genome. Conclusion of theoretical part describes next-generation sequencing technologies (NGS). As practical part, data from RNA-seq experimet were analyzed in order to compare differen transposon activity in normal and cancer cells from prostate and colorectal tissues. As like as publicly available sophisticated tools (TopHat), new scripts were created to analyze these data. The results show that cancer cells exhibit overexpression of transposons. This corresponds with the published results and suggests a connection of transposon activation with cancer development.
APA, Harvard, Vancouver, ISO, and other styles
24

Castillo-Pérez, Karina. "Étude de l'expression différentielle du génome en relation avec la détermination du sexe chez le palmier dattier (Phoenix dactylifera L.)." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS054.

Full text
Abstract:
La compréhension des mécanismes moléculaires impliqués dans la détermination du sexe chez les plantes à fleurs est primordiale d’un point de vue fondamental et appliqué. Des processus liés à la biosynthèse des hormones, tel que l’éthylène, ou la régulation de l’expression génique via des petits ARN et des facteurs de transcription ont été associés à l’unisexualisation des fleurs chez des espèces dioïques. Cependant, les déterminants contrôlant le sexe chez les plantes sont encore largement méconnus. Le palmier dattier, Phoenix dactylifera L, est une espèce dioïque dont le dimorphisme sexuel est observé très tôt au cours du développement des fleurs. Des gènes différentiellement exprimés (DEGs) ont été identifiés pendant les stades précoces du développement floral mâle et femelle. Pour cela, un transcriptome de référence rassemblant des données d’expression relatives aux deux sexes a été généré. L’analyse d'enrichissement GO des DEGs, a révélé des processus biologiques communs aux mâles et aux femelles, associés au développement reproducteur et à la réponse aux stimuli. Ce résultat indique que des mêmes processus peuvent solliciter des gènes différents au cours du développement floral précoce en fonction du sexe. Cette analyse a également mis en évidence que le développement des fleurs mâles requiert des processus biologiques spécifiques impliqués dans la régulation cellulaire et l'expression des gènes. En outre, deux DEGs femelles, une S-adenosylmethionine synthase et une Flap endonuclease et un DEG mâle, un élément transposable, ont été identifiés dans les régions non-recombinantes du génome du palmier dattier.Cette étude est la première analyse globale des processus biologiques associés à l’acquisition du dimorphisme sexuel. Elle contribue également à la compréhension de la détermination du sexe chez le palmier dattier, et plus largement à la connaissance de ces processus chez les espèces dioïques
Unraveling molecular mechanisms involved in sex determination in flowering plants is of outstanding basic and applied interest. Several studies on dioecious species have highlighted the molecular basis of sex determination, such as cell death and ethylene biosynthesis pathway. Sex determination mechanisms in plants are, however, still largely unknown. The date palm, Phoenix dactylifera L, is a dioecious species where sexual dimorphism is observed very early in development of flowers. Differentially expressed genes (DEGs) were identified during the early stages of the male and female flower development. A reference transcriptome including male and female data was constructed to gain insight into this process in the dioecious palm Phoenix dactylifera L. Differentially expressed genes (DEG) were subsequently identified between males and females in the early flower development stages in which the first morphological gender difference occurs in date palms.Gene ontology enrichment analysis of DEG revealed biological processes shared between males and females involved in reproductive development and response to stimulus, indicating that same processes could require different genes during early flower development in date palm. This analysis also suggested that date palm triggers biological processes specifically involved in cellular regulation and gene expression to develop male flowers. Furthermore, two female DEGs related to DNA methylation S-adenosylmethionine synthase and DNA metabolism Flap endonuclease, and one male DEGs, a transposable element were found in non-recombinant date palm regions. This study provided the first insight into biological processes involved in sex determination in date palms and more widely to knowledge of this process in dioecious species
APA, Harvard, Vancouver, ISO, and other styles
25

Wolff, Alexander. "Analysis of expression profile and gene variation via development of methods for Next Generation Sequencing data." Thesis, 2018. http://hdl.handle.net/11858/00-1735-0000-002E-E517-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Shomroni, Orr. "Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks." Doctoral thesis, 2017. http://hdl.handle.net/11858/00-1735-0000-0023-3E0C-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Park, Daechan. "Genome-wide approaches to explore transcriptional regulation in eukaryotes." Thesis, 2014. http://hdl.handle.net/2152/30443.

Full text
Abstract:
Transcriptional regulation is a complicated process controlled by numerous factors such as transcription factors (TFs), chromatin remodeling enzymes, nucleosomes, post-transcriptional machineries, and cis-acting DNA sequence. I explored the complex transcriptional regulation in eukaryotes through three distinct studies to comprehensively understand the functional genomics at various steps. Although a variety of high throughput approaches have been developed to understand this complex system on a genome wide scale with high resolution, a lack of accurate and comprehensive annotation transcription start sites (TSS) and polyadenylation sites (PAS) has hindered precise analyses even in Saccharomyces cerevisiae, one of the simplest eukaryotes. We developed Simultaneous Mapping Of RNA Ends by sequencing (SMORE-seq) and identified the strongest TSS and PAS of over 90% of yeast genes with single nucleotide resolution. Owing to the high accuracy of TSS identified by SMORE-seq, we detected possibly mis-annotated 150 genes that have a TSS downstream of the annotated start codon. Furthermore, SMORE-seq showed that 5’-capped non-coding RNAs were highly transcribed divergently from TATA-less promoters in wild-type cells under normal conditions. Mapping of DNA-protein interactions is essential to understanding the role of TFs in transcriptional regulation. ChIP-seq is the most widely used method for this purpose. However, careful attention has not been given to technical bias reflected in final target calling due to many experimental steps of ChIP-seq including fixation and shearing of chromatin, immunoprecipitation, sequencing library construction, and computational analysis. While analyzing large-scale ChIP-seq data, we observed that unrelated proteins appeared to bind to the gene bodies of highly transcribed genes across datasets. Control experiments including input, IgG ChIP in untagged cells, and the Golgi factor Mnn10 ChIP also showed the strong binding at the same loci, indicating that the signals were obviously derived from bias that is devoid of biological meaning. In addition, the appearance of nucleosomal periodicity in ChIP-seq data for proteins localizing to gene bodies is another bias that can be mistaken for false interactions with nucleosomes. We alleviated these biases by correcting data with proper negative controls, but the biases could not be completely removed. Therefore, caution is warranted in interpreting the results from ChIP-seq. Nucleosome positioning is another critical mechanism of transcriptional regulation. Global mapping of nucleosome occupancy in S. cerevisiae strains deleted for chromatin remodeling complexes has elucidated the role of these complexes on a genome wide scale. In this study, loss of chromodomain helicase DNA binding protein 1 (Chd1) resulted in severe disorganization of nucleosome positioning. Despite the difficulties of performing ChIP-seq for chromatin remodeling complexes due to their transient and dynamic localization on chromatin, we successfully mapped the genome-wide occupancy of Chd1 and quantitatively showed that Chd1 co-localizes with early transcription elongation factors, but not late transcription elongation factors. Interestingly, Chd1 occupancy was independent of the methylation levels at H3K36, indicating the necessity of a new working model describing Chd1 localization.
APA, Harvard, Vancouver, ISO, and other styles
28

Radovich, Milan. "DECODING THE TRANSCRIPTIONAL LANDSCAPE OF TRIPLE-NEGATIVE BREAST CANCER USING NEXT GENERATION WHOLE TRANSCRIPTOME SEQUENCING." Thesis, 2012. http://hdl.handle.net/1805/2745.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)
Triple-negative breast cancers (TNBCs) are negative for the expression of estrogen (ER), progesterone (PR), and HER-2 receptors. TNBC accounts for 15% of all breast cancers and results in disproportionally higher mortality compared to ER & HER2-positive tumours. Moreover, there is a paucity of therapies for this subtype of breast cancer resulting primarily from an inadequate understanding of the transcriptional differences that differentiate TNBC from normal breast. To this end, we embarked on a comprehensive examination of the transcriptomes of TNBCs and normal breast tissues using next-generation whole transcriptome sequencing (RNA-Seq). By comparing RNA-seq data from these tissues, we report the presence of differentially expressed coding and non-coding genes, novel transcribed regions, and mutations not previously reported in breast cancer. From these data we have identified two major themes. First, BRCA1 mutations are well known to be associated with development of TNBC. From these data we have identified many genes that work in concert with BRCA1 that are dysregulated suggesting a role of BRCA1 associated genes with sporadic TNBC. In addition, we observe a mutational profile in genes also associated with BRCA1 and DNA repair that lend more evidence to its role. Second, we demonstrate that using microdissected normal epithelium maybe an optimal comparator when searching for novel therapeutic targets for TNBC. Previous studies have used other controls such as reduction mammoplasties, adjacent normal tissue, or other breast cancer subtypes, which may be sub-optimal and have lead to identifying ineffective therapeutic targets. Our data suggests that the comparison of microdissected ductal epithelium to TNBC can identify potential therapeutic targets that may lead to be better clinical efficacy. In summation, with these data, we provide a detailed transcriptional landscape of TNBC and normal breast that we believe will lead to a better understanding of this complex disease.
APA, Harvard, Vancouver, ISO, and other styles
29

Gerasimov, Ekaterina. "Analysis of NGS Data from Immune Response and Viral Samples." 2017. http://scholarworks.gsu.edu/cs_diss/127.

Full text
Abstract:
This thesis is devoted to designing and applying advanced algorithmical and statistical tools for analysis of NGS data related to cancer and infection diseases. NGS data under investigation are obtained either from host samples or viral variants. Recently, random peptide phage display libraries (RPPDL) were applied to studies of host's antibody response to different diseases. We study human antibody response to breast cancer and mouse antibody response to Lyme disease by sequencing of the whole antibody repertoire profiles which are represented by RPPDL. Alternatively, instead of sequencing immune response NGS can be applied directly to a viral population within an infected host. Specifically, we analyze the following RNA viruses: the human immunodeficiency virus (HIV) and the infectious bronchitis virus (IBV). Sequencing of RNA viruses is challenging because there are many variants inside population due to high mutation rate. Our results show that NGS helps to understand RNA viruses and explore their interaction with infected hosts. NGS also helps to analyze immune response to different diseases, trace changing of immune response at different disease stages.
APA, Harvard, Vancouver, ISO, and other styles
30

Temate, Tiagueu Yvette Charly B., and Tiagueu Yvette C. B. Temate. "Methods for Differential Analysis of Gene Expression and Metabolic Pathway Activity." 2016. http://scholarworks.gsu.edu/cs_diss/102.

Full text
Abstract:
RNA-Seq is an increasingly popular approach to transcriptome profiling that uses the capabilities of next generation sequencing technologies and provides better measurement of levels of transcripts and their isoforms. In this thesis, we apply RNA-Seq protocol and transcriptome quantification to estimate gene expression and pathway activity levels. We present a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. In the first version of IsoDE, we compared the tool against four existing methods: Fisher's exact test, GFOLD, edgeR and Cuffdiff on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. We also introduce the second version of IsoDE which runs 10 times faster than the first implementation due to some in-memory processing applied to the underlying gene expression frequencies estimation tool and we also perform more optimization on the analysis. The second part of this thesis presents a set of tools to differentially analyze metabolic pathways from RNA-Seq data. Metabolic pathways are series of chemical reactions occurring within a cell. We focus on two main problems in metabolic pathways differential analysis, namely, differential analysis of their inferred activity level and of their estimated abundance. We validate our approaches through differential expression analysis at the transcripts and genes levels and also through real-time quantitative PCR experiments. In part Four, we present the different packages created or updated in the course of this study. We conclude with our future work plans for further improving IsoDE 2.0.
APA, Harvard, Vancouver, ISO, and other styles
31

Aldana, Juan Andres. "Resistance mechanisms to Didymascella thujina (Durand) Maire in Thuja plicata Donn ex D. Don, Thuja standishii (Gord.) Carrière and Thuja standishii x plicata." Thesis, 2018. https://dspace.library.uvic.ca//handle/1828/10058.

Full text
Abstract:
Plants and microorganisms interact with each other constantly, with some interactions being mutually beneficial and others being detrimental to the plants. The features of the organisms involved in such interactions will determine the characteristics of individual pathosystems. Plants respond readily to pathogen attacks, regardless of the pathosystem; furthermore, variation in the resistance to pathogens within species is common and well documented in many plant species. The variability in pathogen resistance is at the core of genetic improvement programs for disease resistance. True resistance to pathogens in plants is a genetically determined and complex trait that can involve both constitutive and induced mechanisms at different levels of organization. The complexity of this phenomenon makes the study of compatible plant - pathogen interactions challenging, and typically, disease resistance studies focus on specific aspects of a pathosystem, such as field resistance, anatomical or physiological features of resistant plants, or molecular mechanisms of resistance. The Thuja sp. - Didymascella thujina (E.J. Durand) Maire interaction is an important pathosystem in western North America, which has been studied for more than five decades. Western redcedar (Thuja plicata Donn ex D. Don) is very susceptible to cedar leaf blight (D. thujina), a biotroph that affects the tree at all stages, although seedlings are the most sensitive to the pathogen. The characteristics of the Thuja sp. - D. thujina interaction, the wealth of information on the pathosystem and the excellent Thuja sp. genetic resources available from the British Columbia Ministry of Forests, Lands, Natural Resource Operations and Rural Development make this interaction an ideal system to advance the study of disease resistance mechanisms in conifers. This Doctoral project presents a comprehensive investigation of the constitutive and induced resistance mechanisms against D. thujina in T. plicata, Thuja standishii (Gord.) Carrière and a Thuja standishii x plicata hybrid at the phenotypic and gene expression levels, undertaken with the objective of exploring the resistance mechanisms against the biotroph in these conifers. The project also aimed to establish base knowledge for the future development of markers for marker-assisted breeding of T. plicata. The investigations included a combination of histological, chemical and next generation sequencing (NGS) methodologies. NGS data were analyzed, in addition to the traditional clustering analyses, with cutting edge machine learning methods, including grade of membership analysis, dynamic topic modelling and stability selection analysis. The studies were progressively more controlled to narrow the focus on the resistance mechanisms to D. thujina in Thuja sp. Histological characteristics related to D. thujina resistance in Thuja sp. were studied first, along with the relationship between climate of origin and disease resistance. The virulence of D. thujina was also documented early in this project. Chemical and gene expression constitutive and induced responses to D. thujina infection in T. plicata seedlings were studied next. T. plicata clonal lines were then comprehensively studied to shed light on the mechanisms behind known physiologically determined resistance. A holistic investigation of the resistance mechanisms to D. thujina in T. standishii, T. plicata and a T. standishii x plicata hybrid explored the possibility of a gene-for-gene resistance model. Thirty-five T. plicata families were screened during the four field seasons carried out between 2012 and 2015, totalling more than 1,400 seedlings scored for D. thujina severity. Thirteen of those families were used in the five studies performed during the program, along with two T. plicata seedling lines self-pollinated for five generations and three T. plicata clonal lines. One T. standishii clonal line, and one T. standishii x plicata clone were also investigated during the program. A total of 16 histological and anatomical characteristics were studied in more than 750 samples, and more than 270 foliar samples were analyzed for 60 chemical and nutritional compounds. Almost one million transcriptomic sequences in four individually assembled reference transcriptomes were examined during the program. The results of the project support the variability in the resistance to D. thujina in T. plicata, as well as the higher resistance to the pathogen in plants originating from cooler and wetter environments. The data collected also depicted the existence of age-related resistance in T. plicata, and confirmed the full resistance to the disease in T. standishii. Western redcedar plants resistant and susceptible to D. thujina showed constitutive differences at the phenotypic and gene expression levels. Resistant T. plicata seedlings had thicker cuticles, constitutively higher concentrations of sabinene, alpha-thujene, and higher levels of expression of NBS-LRR disease resistance proteins. Resistant clones of T. plicata and T. standishii had higher expression levels of bark storage proteins and of dirigent proteins. Plants from all ages, species and resistance classes studied that were infected with D. thujina showed the accumulation of aluminum in the foliage, and increased levels of sequences involved in cell wall reinforcement. Additional responses to D. thujina infection in T. plicata seedlings included the downregulation of some secondary metabolic pathways, whereas pathogenesis-related proteins were upregulated in clonal lines of T. plicata. The comprehensive approach used here to study the Thuja sp. - D. thujina pathosystem could be applied to other compatible plant-pathogen interactions.
Graduate
2020-08-31
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography