Log in

Relevant bibliographies by topics / RNA bioinformatics / Dissertations / Theses

To see the other types of publications on this topic, follow the link: RNA bioinformatics.

Dissertations / Theses on the topic 'RNA bioinformatics'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'RNA bioinformatics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mathew, Sumi. "A method to identify the non-coding RNA gene for U1 RNA in species in which it has not yet been found." Thesis, University of Skövde, School of Humanities and Informatics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-37.

Full text

Abstract:

<p>Background</p><p>Non coding RNAs are the RNA molecules that do not code for proteins but play structural, catalytic or regulatory roles in the organisms in which they are found. These RNAs generally conserve their secondary structure more than their primary sequence. It is possible to look for protein coding genes using sequence signals like promoters, terminators, start and stop codons etc. However, this is not the case with non coding RNAs since these signals are weakly conserved in them. Hence the situation with non coding RNAs is more challenging. Therefore a protocol is devised to identify U1 RNA in species not previously known to have it.</p><p>Results</p><p>It is sufficient to use the covariance models to identify non coding RNAs but they are very slow and hence a filtering step is needed before using the covariance models to reduce the search space for identifying these genes. The protocol for identifying U1 RNA genes employs for the filtering a pattern matcher RNABOB that can conduct secondary structure pattern searches. The descriptor for RNABOB is made automatically such that it can also represent the bulges and interior loops in helices of RNA. The protocol is compared with the Rfam and Weinberg & Ruzzo approaches and has been able to identify new U1 RNA homologues in the Apicomplexan group where it has not previously been found.</p><p>Conclusions</p><p>The method has been used to identify the gene for U1 RNA in certain species in which it has not been detected previously. The identified genes may be further analyzed by wet laboratory techniques for the confirmation of their existence.</p><p>4</p>

APA, Harvard, Vancouver, ISO, and other styles

2

Liu, Tsunglin. "Physics and bioinformatics of RNA." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1141407392.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Huang, Xiaolan. "BIOINFORMATICS INVESTIGATION OF RNA PSEUDOKNOTS." OpenSIUC, 2017. https://opensiuc.lib.siu.edu/dissertations/1463.

Full text

Abstract:

Pseudoknots are a special kind of RNA structures that play functional roles in a wide variety of biological processes. Pseudoknots are best known for their involvement in the −1 programed ribosomal frameshifting (−1 PRF) and stop codon readthrough translational recoding events as the stimulatory structures. In this dissertation, three large scale bioinformatics investigations were carried out on the roles of pseudoknots in the −1 PRF, as well as stop codon readthrough, recoding mechanisms in viral and human mRNAs. To meet the specific needs of the bioinformatics investigations, a new algorithm and method for the detection of RNA pseudoknots has been developed. The new approach differs from all existing pseudoknot detection programs in that it is capable of identifying all potential pseudoknots in any given RNA sequence with no length limitation, in a time efficient manner. This capability is essential for large scale applications in which large datasets of long RNA sequences are analyzed. The algorithm and method have been implemented, with different flavors, in three large scale sequence analysis investigations. The three datasets of mRNA sequences are: 1) full-length genomic mRNA sequences of all animal viruses known or expected to use the −1 PRF and stop codon readthrough recoding mechanisms for viral protein production; 2) full-length genomic mRNA sequences of 4000 plus different strains of human immunodeficiency virus type-1 (HIV-1); 3) 34,000 plus full-length human mRNA sequences. Results from systematic sequence analysis on these three datasets prove the usefulness and robustness of the newly developed pseudoknot detection approach. A large number of previously unknown potential pseudoknots were detected in the viral and human mRNA sequences under investigation. Post detection analysis leads to new mechanistic insights and hypotheses of pseudoknot dependent translational recoding. Some unifying themes of RNA pseudoknot structures in general are also uncovered. The results provide solid basis for further experimental and bioinformatics studies in the future.

APA, Harvard, Vancouver, ISO, and other styles

4

Freyhult, Eva. "New techniques for analysing RNA structure /." Uppsala, 2004. http://www.math.uu.se/research/pub/Freyhult1.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Michalik, Juraj. "Non-redundant sampling in RNA Bioinformatics." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLX009/document.

Full text

Abstract:

Un échantillonnage statistique est central à de nombreuses méthodes algorithmiques pour la bioinformatique structurale des ARNs, où ils sont couramment utilisés pour identifier des modèles structuraux importants, fournir des résumés des espaces de repliement ou approcher des quantités d'intérêt dans l'équilibre thermodynamique. Dans tous ces exemples, la redondance dans l'ensemble échantillonné est non-informative et inefficace, limitant la portée des applications des méthodes existantes. Dans cette thèse, nous introduisons le concept de l'échantillonnage non-redondante et nous explorons ses applications et conséquences en bioinformatique des ARN.Nous commençons par introduire formellement le concept d'échantillonnage non-redondante et nous démontrons que tout algorithme échantillonnant dans la distribution de Boltzmann peut être modifié en une version non-redondante. Son implémentation repose sur une structure de données spécifique et la modification d'une remontée stochastique pour fournir l'ensemble des structures uniques, avec la même complexité.Nous montrons alors une exemple pratique en implémentant le principe d'échantillonnage non-redondant au sein d'un algorithme combinatoire qui échantillonne des structures localement optimales. Nous exploitons cet outil pour étudier la cinétique des ARN, modélisant des espaces de repliement générés à partir des structures localement optimales. Ces structures agissent comme des pièges cinétiques, rendant leur prise en compte essentielle pour analyser la dynamique des ARN. Des résultats empirique montrent que des espaces de repliement générés à partir des échantillons non-redondants sont plus proches de la réalité que ceux obtenus par un échantillonnage classique.Nous considérons ensuite le problème du calcul efficace d'estimateurs statistiques à partir d'échantillons non redondants. L'absence de la redondance signifie que l'estimateur naïf, obtenu en moyennant des quantités observés dans l'échantillon, est erroné. Par contre, nous établissons un estimateur non-trivial non-biaisé spécifique aux échantillons non-redondants suivant la distribution de Boltzmann. Nous montrons que l'estimateur des échantillons non-redondants est plus efficace que l'estimateur naïf, notamment dans les cas où la majorité des l'espace de recherche est échantillonné.Finalement, nous introduisons l'algorithme d'échantillonnage, avec sa contre-partie non-redondante, pour des structures secondaires présentant des pseudonoeuds de type simple. Des pseudonoeuds sont typiquement omis pour des raisons d'efficacité, bien que beaucoup d'entre eux possèdent une grande importance biologique. Nos commençons par proposer une schéma de programmation dynamique qui permet d'énumérer tous les pseudonoeuds composés de deux hélices pouvant contenir des bases non-appariés qui s'entrecroisent. Ce schéma généralise la proposition de Reeders et Giegerich, choisi pour sa base complexité temporelle et spatiale. Par la suite, nous expliquons comment adapter cette décomposition à un algorithme d'échantillonnage statistique pour des pseudonoeuds simples. Finalement, nous présentons des résultats préliminaires et nous discutons sur l'extension de principe non-redondant dnas ce contexte.Le travail présenté dans cette thèse ouvre non seulement la porte à l'analyse cinétique des séquences d'ARN plus longues, mais aussi l'analyse structurale plus détaillée des séquences d'ARN en général. L'échantillonnage non-redondant peut être employé pour analyser des espaces de recherche pour des problèmes combinatoires susceptibles à l'échantillonnage statistique, y inclus virtuellement tous problèmes solvables par la programmation dynamique. Les principes d'échantillonnage non-redondant sont robustes et typiquement faciles à implémenter, comme démontré par l'inclusion d'échantillonnage non-redondant dans les versions récentes de Vienna package populaire<br>Sampling methods are central to many algorithmic methods in structural RNA bioinformatics, where they are routinely used to identify important structural models, provide summarized pictures of the folding landscapes, or approximate quantities of interest at the thermodynamic equilibrium.In all of these examples, redundancy within sampled sets is uninformative and computationally wasteful, limiting the scope of application of existing methods.In this thesis, we introduce the concept of non-redundant sampling, and explore its applications and consequences in RNA bioinformatics.We begin by formally introducing the concept of non-redundant sampling and demonstrate that any algorithm sampling in Boltzmann distribution can be modified into non-redundant variant. Its implementation relies on a specific data structure and a modification of the stochastic backtrack to return the set of unique structures, with the same complexity.We then show a practical example by implementing the non-redundant principle into a combinatorial algorithm that samples locally optimal structures. We use this tool to study the RNA kinetics by modeling the folding landscapes generated from sets of locally optimal structures. These structures act as kinetic traps, influencing the outcome of the RNA kinetics, thus making their presence crucial. Empirical results show that the landscapes generated from the non-redundant samples are closer to the reality than those obtained by classic approaches.We follow by addressing the problem of the efficient computation of the statistical estimates from non-redundant sampling sets. The absence of redundancy means that the naive estimator, obtained by averaging quantities observed in a sample, is erroneous. However we establish a non-trivial unbiased estimator specific to a set of unique Boltzmann distributed secondary structures. We show that the non-redundant sampling estimator performs better than the naive counterpart in most cases, specifically where most of the search space is covered by the sampling.Finally, we introduce a sampling algorithm, along with its non-redundant counterpart, for secondary structures featuring simple-type pseudoknots. Pseudoknots are typically omitted due to complexity reasons, yet many of them have biological relevance. We begin by proposing a dynamic programming scheme that allows to enumerate all recursive pseudoknots consisting of two crossing helices, possibly containing unpaired bases. This scheme generalizes the one proposed by Reeders and Giegerich, chosen for its low time and space complexities. We then explain how to adapt this decomposition into a statistical sampling algorithm for simple pseudoknots. We then present preliminary results, and discuss about extensions of the non-redundant principle in this context.The work presented in this thesis not only opens the door towards kinetics analysis for longer RNA sequences, but also more detailed structural analysis of RNAs in general. Non-redundant sampling can be applied to analyze search spaces for combinatorial problems amenable to statistical sampling, including virtually any problem solved by dynamic programming. Non-redundant sampling principles are robust and typically easy to implement, as demonstrated by the inclusion of non-redundant sampling in recent versions of the popular Vienna package

APA, Harvard, Vancouver, ISO, and other styles

6

Zhou, Yu. "Application of RNA Bioinformatics in decoding RNA structure and regulation." Paris 11, 2008. http://www.theses.fr/2008PA112234.

Full text

Abstract:

Ma thèse porte sur le développement de méthodologies informatiques et bioinformatiques pour résoudre des problèmes provenant de questions biologiques liées à l’ARN, telles que la prédiction de structures, l’identification de structures communes, la découverte de cibles des micro-ARN, la prédiction de la régulation de l’épissage, et le design (ou repliement inverse) d'ARN. Le premier chapitre concerne la mise en place d’une méthode itérative pour la prédiction des structures secondaires des introns de groupes 1, incluant les pseudo-nœuds, et la développement d’une base de données complète sur les introns de groupe 1. Dans le deuxième chapitre, je décris mon travail sur l’analyse bioinformatique de la structure des sites d’incorporation de la Pyrrolysine, le 22ème acide aminé, dans des gènes d’archae. Les troisième et quatrième chapitres sont consacrés au développement et à la mise en œuvre de deux méthodes d’analyse de données expérimentales pour la recherche, dans les séquences d’ARN, de cibles de micro-ARN, et de sites de fixation de protéines impliquées dans le processus d’épissage des introns. Enfin, le cinquième chapitre présente un algorithme de design de structures d’ARN avec des contraintes de motifs, faisant appel à des manipulations d’automates et de grammaires non contextuelles<br>My thesis focuses on the application of RNA bioinformatics analysis to solve the problems originated from biological requirements, ranging from structure prediction, common structure identification, microRNA target discovery, splicing regulation prediction, and RNA design (inverse folding). The first chapter concerns the establishment of an iterative method for the secondary structure prediction of group I introns including pseudo-knots, and the development of a comprehensive group I intron sequence and structure database. In the second chapter, I describe my work on bioinformatics analysis of the Pyrrolysine (Pyl, 22nd amino acid) insertion structure in Pyl-associated genes in archaea. The third and fourth chapters are devoted to develop two methods of experimental data analysis for identification of micro-RNA target sites, and for determination of binding sites of a RNA binding protein implicated in pre-mRNA splicing, independently. Finally, the fifth chapter presents an algorithm for RNA design under motif constraints, involving manipulation of automata and context-free grammars

APA, Harvard, Vancouver, ISO, and other styles

7

Baez, William David. "RNA Secondary Structures: from Biophysics to Bioinformatics." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1525714439675315.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Freyhult, Eva. "A Study in RNA Bioinformatics : Identification, Prediction and Analysis." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis Acta Universitatis Upsaliensis, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8305.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Rahrig, Ryan Robert. "Automated Alignment of RNA 3D Structures." Bowling Green State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1276873588.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Starmer, Joshua Mr. "What can RNA hybrids tell us about translation?" NCSU, 2006. http://www.lib.ncsu.edu/theses/available/etd-10202006-155443/.

Full text

Abstract:

Molecular biologists have been observing interactions between messenger RNA (mRNA) molecules and other non-coding RNA molecules for quite some time. Here I revisit some of the classical hybridizations between the 16S ribosomal RNA (rRNA) and mRNA during initiation, as well as investigate the interactions between small interfering RNA (siRNA) molecules and mRNA. In reviewing rRNA-mRNA interactions, I observed that the majority of both bacterial and eukaryote genes can bind at the start codon. This novel result lead to a method for improving genome annotation as well as a new theory of translation initiation. The examination of siRNA-mRNA interactions lead to new criteria for predicting an siRNA's efficacy.

APA, Harvard, Vancouver, ISO, and other styles

11

Anderson, James William Justin. "Probabilistic models of RNA secondary structure." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:3e58e9d9-c58d-4616-8e88-4082d1ca0e2a.

Full text

Abstract:

This thesis develops probabilistic models of RNA secondary structure. The first chapter introduces RNA secondary structure prediction, in particular stochastic context-free grammars (SCFGs), and considers a novel method for automated design of SCFGs. Many SCFGs are found with a similar predictive quality as those commonly used for RNA secondary structure prediction. The second chapter discusses the effect alignment quality, evolutionary distance between sequences, and number of sequences in an alignment have on RNA secondary structure prediction. By combining statistical alignment and SCFG models we can, in a statistically sound setting, average structure predictions over the space of alignments to decrease loss created by poor alignments. The third chapter incorporates additional biological information about RNA secondary structure formation into the decoding of the SCFG posterior distribution. Combining iterative helix formation, phylogenetic modelling, and a distance function between alignment columns leads to the an improvement in the accuracy of comparative RNA secondary structure prediction. Finally, appendices briefly discuss further work concerning probabilistic models of RNA secondary structure which may be of interest to the reader.

APA, Harvard, Vancouver, ISO, and other styles

12

Childs, Liam. "Bioinformatics approaches to analysing RNA mediated regulation of gene expression." Phd thesis, Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4128/.

Full text

Abstract:

The genome can be considered the blueprint for an organism. Composed of DNA, it harbours all organism-specific instructions for the synthesis of all structural components and their associated functions. The role of carriers of actual molecular structure and functions was believed to be exclusively assumed by proteins encoded in particular segments of the genome, the genes. In the process of converting the information stored genes into functional proteins, RNA – a third major molecule class – was discovered early on to act a messenger by copying the genomic information and relaying it to the protein-synthesizing machinery. Furthermore, RNA molecules were identified to assist in the assembly of amino acids into native proteins. For a long time, these - rather passive - roles were thought to be the sole purpose of RNA. However, in recent years, new discoveries have led to a radical revision of this view. First, RNA molecules with catalytic functions - thought to be the exclusive domain of proteins - were discovered. Then, scientists realized that much more of the genomic sequence is transcribed into RNA molecules than there are proteins in cells begging the question what the function of all these molecules are. Furthermore, very short and altogether new types of RNA molecules seemingly playing a critical role in orchestrating cellular processes were discovered. Thus, RNA has become a central research topic in molecular biology, even to the extent that some researcher dub cells as “RNA machines”. This thesis aims to contribute towards our understanding of RNA-related phenomena by applying Bioinformatics means. First, we performed a genome-wide screen to identify sites at which the chemical composition of DNA (the genotype) critically influences phenotypic traits (the phenotype) of the model plant Arabidopsis thaliana. Whole genome hybridisation arrays were used and an informatics strategy developed, to identify polymorphic sites from hybridisation to genomic DNA. Following this approach, not only were genotype-phenotype associations discovered across the entire Arabidopsis genome, but also regions not currently known to encode proteins, thus representing candidate sites for novel RNA functional molecules. By statistically associating them with phenotypic traits, clues as to their particular functions were obtained. Furthermore, these candidate regions were subjected to a novel RNA-function classification prediction method developed as part of this thesis. While determining the chemical structure (the sequence) of candidate RNA molecules is relatively straightforward, the elucidation of its structure-function relationship is much more challenging. Towards this end, we devised and implemented a novel algorithmic approach to predict the structural and, thereby, functional class of RNA molecules. In this algorithm, the concept of treating RNA molecule structures as graphs was introduced. We demonstrate that this abstraction of the actual structure leads to meaningful results that may greatly assist in the characterization of novel RNA molecules. Furthermore, by using graph-theoretic properties as descriptors of structure, we indentified particular structural features of RNA molecules that may determine their function, thus providing new insights into the structure-function relationships of RNA. The method (termed Grapple) has been made available to the scientific community as a web-based service. RNA has taken centre stage in molecular biology research and novel discoveries can be expected to further solidify the central role of RNA in the origin and support of life on earth. As illustrated by this thesis, Bioinformatics methods will continue to play an essential role in these discoveries.<br>Das Genom eines Organismus enthält alle Informationen für die Synthese aller strukturellen Komponenten und deren jeweiligen Funktionen. Lange Zeit wurde angenommen, dass Proteine, die auf definierten Abschnitten auf dem Genom – den Genen – kodiert werden, die alleinigen Träger der molekularen - und vor allem katalytischen - Funktionen sind. Im Prozess der Umsetzung der genetischen Information von Genen in die Funktion von Proteinen wurden RNA Moleküle als weitere zentrale Molekülklasse identifiziert. Sie fungieren dabei als Botenmoleküle (mRNA) und unterstützen als Trägermoleküle (in Form von tRNA) die Zusammenfügung der einzelnen Aminosäurebausteine zu nativen Proteine. Diese eher passiven Funktionen wurden lange als die einzigen Funktionen von RNA Molekülen angenommen. Jedoch führten neue Entdeckungen zu einer radikalen Neubewertung der Rolle von RNA. So wurden RNA-Moleküle mit katalytischen Eigenschaften entdeckt, sogenannte Ribozyme. Weiterhin wurde festgestellt, dass über proteinkodierende Abschnitte hinaus, weit mehr genomische Sequenzbereiche abgelesen und in RNA Moleküle transkribiert werden als angenommen. Darüber hinaus wurden sehr kleine und neuartige RNA Moleküle identifiziert, die entscheidend bei der Koordinierung der Genexpression beteiligt sind. Diese Entdeckungen rückten RNA als Molekülklasse in den Mittelpunkt moderner molekularbiologischen Forschung und führten zu einer Neubewertung ihrer funktionellen Rolle. Die vorliegende Promotionsarbeit versucht mit Hilfe bioinformatorischer Methoden einen Beitrag zum Verständnis RNA-bezogener Phänomene zu leisten. Zunächst wurde eine genomweite Suche nach Abschnitten im Genom der Modellpflanze Arabidopsis thaliana vorgenommen, deren veränderte chemische Struktur (dem Genotyp) die Ausprägung ausgewählter Merkmale (dem Phänotyp) entscheidend beeinflusst. Dabei wurden sogenannte Ganz-Genom Hybridisierungschips eingesetzt und eine bioinformatische Strategie entwickelt, Veränderungen der chemischen Struktur (Polymorphismen) anhand der veränderten Bindung von genomischer DNA aus verschiedenen Arabidopsis Kultivaren an definierte Proben auf dem Chip zu detektieren. In dieser Suche wurden nicht nur systematisch Genotyp-Phänotyp Assoziationen entdeckt, sondern dabei auch Bereiche identifiziert, die bisher nicht als proteinkodierende Abschnitte annotiert sind, aber dennoch die Ausprägung eines konkreten Merkmals zu bestimmen scheinen. Diese Bereiche wurden desweiteren auf mögliche neue RNA Moleküle untersucht, die in diesen Abschnitten kodiert sein könnten. Hierbei wurde ein neuer Algorithmus eingesetzt, der ebenfalls als Teil der vorliegenden Arbeit entwickelt wurde. Während es zum Standardrepertoire der Molekularbiologen gehört, die chemische Struktur (die Sequenz) eines RNA Moleküls zu bestimmen, ist die Aufklärung sowohl der Struktur als auch der konkreten Funktion des Moleküls weitaus schwieriger. Zu diesem Zweck wurde in dieser Arbeit ein neuer algorithmischer Ansatz entwickelt, der mittels Computermethoden eine Zuordnung von RNA Molekülen zu bestimmten Funktionsklassen gestattet. Hierbei wurde das Konzept der Beschreibung von RNA-Sekundärstrukturen als Graphen genutzt. Es konnte gezeigt werden, dass diese Abstraktion von der konkreten Struktur zu nützlichen Aussagen zur Funktion führt. Des weiteren konnte demonstriert werden, dass graphen-theoretisch abgeleitete Merkmale von RNA-Molekülen einen neuen Zugang zum Verständnis der Struktur-Funktionsbeziehungen ermöglichen. Die entwickelte Methode (Grapple) wurde als web-basierte Anwendung der wissenschaftlichen Welt zur Verfügung gestellt. RNA hat sich als ein zentraler Forschungsgegenstand der Molekularbiologie etabliert und neue Entdeckungen können erwartet werden, die die zentrale Rolle von RNA bei der Entstehung und Aufrechterhaltung des Lebens auf der Erde weiter untermauern. Bioinformatische Methoden werden dabei weiterhin eine essentielle Rolle spielen.

APA, Harvard, Vancouver, ISO, and other styles

13

Sigurgeirsson, Benjamín. "Analysis of RNA and DNA sequencing data : Improved bioinformatics applications." Doctoral thesis, KTH, Genteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-184158.

Full text

Abstract:

Massively parallel sequencing has rapidly revolutionized DNA and RNA research. Sample preparations are steadfastly advancing, sequencing costs have plummeted and throughput is ever growing. This progress has resulted in exponential growth in data generation with a corresponding demand for bioinformatic solutions. This thesis addresses methodological aspects of this sequencing revolution and applies it to selected biological topics. Papers I and II are technical in nature and concern sample preparation and data anal- ysis of RNA sequencing data. Paper I is focused on RNA degradation and paper II on generating strand specific RNA-seq libraries. Paper III and IV deal with current biological issues. In paper III, whole exomes of cancer patients undergoing chemotherapy are sequenced and their genetic variants associ- ated to their toxicity induced adverse drug reactions. In paper IV a comprehensive view of the gene expression of the endometrium is assessed from two time points of the menstrual cycle. Together these papers show relevant aspects of contemporary sequencing technologies and how it can be applied to diverse biological topics.<br><p>QC 20160329</p>

APA, Harvard, Vancouver, ISO, and other styles

14

Frederick, Madeline Rose. "The role of RNA-editing in viral mediated pathogenesis." Kent State University Honors College / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ksuhonors152545654349718.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Canzler, Sebastian. "Insights into the Evolution of small nucleolar RNAs." Doctoral thesis, Universitätsbibliothek Leipzig, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-217924.

Full text

Abstract:

Over the last decades, the formerly irrevocable believe that proteins are the only key-factors in the complex regulatory machinery of a cell was crushed by a plethora of findings in all major eukaryotic lineages. These suggested a rugged landscape in the eukaryotic genome consist- ing of sequential, overlapping, or even bi-directional transcripts and myriads of regulatory elements. The vast part of the genome is indeed transcribed into an RNA intermediate, but solely a small fraction is finally translated into functional proteins. The sweeping majority, however, is either degraded or functions as a non-protein coding RNA (ncRNA). Due to continuous developments in experimental and computational research, the variety of ncRNA classes grew larger and larger, ranging from key-processes in the cellular lifespan to regulatory processes that are driven and guided by ncRNAs. The bioinformatical part pri- marily concentrates on the prediction, annotation, and extraction of characteristic properties of novel ncRNAs. Due to conservation of sequence and/or structure, this task is often deter- mined by an homology-search that utilizes information about functional, and hence conserved regions, as an indicator. This thesis focuses mainly on a special class of ncRNAs, small nucleolar RNAs (snoRNAs). These abundant molecules are mainly responsible for the guidance of 2’-O-ribose-methylations and pseudouridylations in different types of RNAs, such as ribosomal and spliceosomal RNAs. Although the relevance of single modifications is still rather unclear, the elimination of a bunch of modifications is shown to cause severe effects, including lethality. Several de novo prediction programs have been published over the last years and a substantial amount of publicly available snoRNA databases has originated. Normally, these are restricted to a small amount of species and a collection of experimentally extracted snoRNA. The detection of snoRNAs by means of wet lab experiments and/or de novo prediction tools is generally time consuming (wet lab) and a quite tedious task (identification of snoRNA-specific characteristics). The snoRNA annotation pipeline snoStrip was developed with the intention to circumvent these obstacles. It therefore utilizes a homology-based search procedure to reliably predict snoRNA genes in genomic sequences. In a subsequent step, all candidates are filtered with respect to specific sequence motifs and secondary structures. In a functional analysis, poten- tial target sites are predicted in ribosomal and spliceosomal RNA sequences. In contrast to de novo prediction tools, snoStrip focuses on the extension of the known snoRNA world to uncharted organisms and the mapping and unification of the existing diversity of snoRNAs into functional, homologous families. The pipeline is properly suited to analyze a manifold set of organisms in search for their snoRNAome in short timescales. This offers the opportunity to generate large scale analyses over whole eukaryotic kingdoms to gain insights into the evolutionary history of these spe- cial ncRNA molecules. A set of experimentally validated snoRNA genes in Deuterostomia and Fungi were starting points for highly comprehensive surveys searching and analyzing the snoRNA repertoire in these two major eukaryotic clades. In both cases, the snoStrip pipeline proved itself as a fast and reliable tool and collected thousands of snoRNA genes in nearly 200 organisms. Additionally, the Interaction Conservation Index (ICI), which is am- plified to additionally work on single lineages, provides a convenient measure to analyze and evaluate the conservation of snoRNA-targetRNA interactions across different species. The massive amount of data and the possibility to score the conservation of predicted interactions constitute the main pillars to gain an extraordinary insight into the evolutionary history of snoRNAs on both the sequence and the functional level. A substantial part of the snoR- NAome is traceable down to the root of both eukaryotic lineages and might indicate an even more ancient origin of these snoRNAs. However, a plenitude of lineage specific innovation and deletion events are also discernible. Due to its automated detection of homologous and functionally related snoRNA sequences, snoStrip identified extraordinary target switches in fungi. These unveiled a coupled evolutionary history of several snoRNA families that were previously thought to be independent. Although these findings are exceedingly interesting, the broad majority of snoRNA families is found to show remarkable conservation of the se- quence and the predicted target interactions. On two occasions, this thesis will shift its focus from a genuine snoRNA inspection to an analysis of introns. Both investigations, however, are still conducted under an evolutionary viewpoint. In case of the ubiquitously present U3 snoRNA, functional genes in a notable amount of fungi are found to be disrupted by U2-dependent introns. The set of previously known U3 genes is considerably enlarged by an adapted snoStrip-search procedure. Intron- disrupted genes are found in several fungal lineages, while their precise insertion points within the snoRNA-precursor are located in a small and homologous region. A potential targetRNA of snoRNA genes, U6 snRNA, is also found to contain intronic sequences. Within this work, U6 genes are detected and annotated in nearly all fungal organisms. Although a few U6 intron- carrying genes have been known before, the widespread of these findings and the diversity regarding the particular insertion points are surprising. Those U6 genes are commonly found to contain more than just one intron. In both cases of intron-disrupted non-coding RNA genes, the detected RNA molecules seem to be functional and the intronic sequences show remarkable sequence conservation for both their splice sites and the branch site. In summary, the snoStrip pipeline is shown to be a reliable and fast prediction tool that works on homology-based search principles. Large scale analyses on whole eukaryotic lineages become feasible on short notice. Furthermore, the automated detection of functionally related but not yet mapped snoRNA families adds a new layer of information. Based on surveys covering the evolutionary history of Fungi and Deuterostomia, profound insights into the evolutionary history of this ncRNA class are revealed suggesting ancient origin for a main part of the snoRNAome. Lineage specific innovation and deletion events are also found to occur at a large number of distinct timepoints.

APA, Harvard, Vancouver, ISO, and other styles

16

Wang, Leying. "Noncoding RNA-Involved Interactions for Cancer Prognosis: A Prostate Cancer Study." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1586651927830285.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sigurðarson, Sandholt Arnar Kári. "Dual RNA-seq analysis of host-pathogen interaction in Eimeria infection of chickens." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413432.

Full text

Abstract:

Eimeria tenella is a eukaryotic, intracellular parasite that, along with six other Eimeria species, causes coccidiosis in chickens. This disease can result in weight loss or even death and is estimated to cause 2 billion euros of damages to the chicken industry each year. While much is known of the life cycle of E. tenella in the chicken, less is known about molecular mechanisms of infection and the chicken immune response. In this study, we produced a pipeline for dual RNA-sequencing analysis of a mixed chicken and E. tenella dataset. We then carried out an analysis on an in vitro infection of the chicken macrophage HD-11 cell line. This was followed by a differential expression analysis across six time points, 2, 4, 12, 24, 48, and 72 hours post-infection, in order to elucidate these mechanisms. The results showed clear patterns of expression for the chicken immune genes, with strong down-regulation of genes across the immune system at 24 hours and a repetition of early patterns at 72 hours, indicating that reinfection by a second generation of parasite cells was occurring. Several genes that may have important roles in the immune reaction of the chicken were identified, such as MRC2, ITGB3 and ITGA9, along with genes with known roles, such as TLR15. The expression of surface antigen genes in E. tenella was also examined, showing a clear upregulation in the late stages of merogony, suggesting important roles for merozoites. Finally, a co-expression analysis was carried out, showing considerable co-expression among the two organisms. One of the gene co-expression networks identified appeared to be enriched with both infection specific genes from E. tenella and chicken immune genes. These results, along with the pipeline, will be used in further studies on E. tenella infections and bring us closer to the eventual goal of a vaccine for coccidiosis.

APA, Harvard, Vancouver, ISO, and other styles

18

Furió, Tarí Pedro. "Development of bioinformatic tools for massive sequencing analysis." Doctoral thesis, Universitat Politècnica de València, 2020. http://hdl.handle.net/10251/152485.

Full text

Abstract:

[EN] Transcriptomics is one of the most important and relevant areas of bioinformatics. It allows detecting the genes that are expressed at a particular moment in time to explore the relation between genotype and phenotype. Transcriptomic analysis has been historically performed using microarrays until 2008 when high-throughput RNA sequencing (RNA-Seq) was launched on the market, replacing the old technique. However, despite the clear advantages over microarrays, it was necessary to understand factors such as the quality of the data, reproducibility and replicability of the analyses and potential biases. The first section of the thesis covers these studies. First, an R package called NOISeq was developed and published in the public repository "Bioconductor", which includes a set of tools to better understand the quality of RNA-Seq data, minimise the impact of noise in any posterior analyses and implements two new methodologies (NOISeq and NOISeqBio) to overcome the difficulties of comparing two different groups of samples (differential expression). Second, I show our contribution to the Sequencing Quality Control (SEQC) project, a continuation of the Microarray Quality Control (MAQC) project led by the US Food and Drug Administration (FDA, United States) that aims to assess the reproducibility and replicability of any RNA-Seq analysis. One of the most effective approaches to understand the different factors that influence the regulation of gene expression, such as the synergic effect of transcription factors, methylation events and chromatin accessibility, is the integration of transcriptomic with other omics data. To this aim, a file that contains the chromosomal position where the events take place is required. For this reason, in the second chapter, we present a new and easy to customise tool (RGmatch) to associate chromosomal positions to the exons, transcripts or genes that could regulate the events. Another aspect of great interest is the study of non-coding genes, especially long non-coding RNAs (lncRNAs). Not long ago, these regions were thought not to play a relevant role and were only considered as transcriptional noise. However, they represent a high percentage of the human genes and it was recently shown that they actually play an important role in gene regulation. Due to these motivations, in the last chapter we focus, first, in trying to find a methodology to find out the generic functions of every lncRNA using publicly available data and, second, we develop a new tool (spongeScan) to predict the lncRNAs that could be involved in the sequestration of micro-RNAs (miRNAs) and therefore altering their regulation task.<br>[ES] La transcriptómica es una de las áreas más importantes y destacadas en bioinformática, ya que permite ver qué genes están expresados en un momento dado para poder explorar la relación existente entre genotipo y fenotipo. El análisis transcriptómico se ha realizado históricamente mediante el uso de microarrays hasta que, en el año 2008, la secuenciación masiva de ARN (RNA-Seq) fue lanzada al mercado y comenzó a desplazar poco a poco su uso. Sin embargo, a pesar de las ventajas evidentes frente a los microarrays, resultaba necesario entender factores como la calidad de los datos, reproducibilidad y replicabilidad de los análisis así como los potenciales sesgos. La primera parte de la tesis aborda precisamente estos estudios. En primer lugar, se desarrolla un paquete de R llamado NOISeq, publicado en el repositorio público "Bioconductor", el cual incluye un conjunto de herramientas para entender la calidad de datos de RNA-Seq, herramientas de procesado para minimizar el impacto del ruido en posteriores análisis y dos nuevas metodologías (NOISeq y NOISeqBio) para abordar la problemática de la comparación entre dos grupos (expresión diferencial). Por otro lado, presento nuestra contribución al proyecto Sequencing Quality Control (SEQC), una continuación del proyecto Microarray Quality Control (MAQC) liderado por la US Food and Drug Administration (FDA) que pretende evaluar precisamente la reproducibilidad y replicabilidad de los análisis realizados sobre datos de RNA-Seq. Una de las estrategias más efectivas para entender los diferentes factores que influyen en la regulación de la expresión génica, como puede ser el efecto sinérgico de los factores de transcripción, eventos de metilación y accesibilidad de la cromatina, es la integración de la transcriptómica con otros datos ómicos. Para ello se necesita generar un fichero que indique las posiciones cromosómicas donde se producen estos eventos. Por este motivo, en el segundo capítulo de la tesis presentamos una nueva herramienta (RGmatch) altamente customizable que permite asociar estas posiciones cromosómicas a los posibles genes, transcritos o exones a los que podría estar regulando cada uno de estos eventos. Otro de los aspectos de gran interés en este campo es el estudio de los genes no codificantes, especialmente los ARN largos no codificantes (lncRNAs). Hasta no hace mucho, se pensaba que estos genes no jugaban ningún papel fundamental y se consideraban como simple ruido transcripcional. Sin embargo, suponen un alto porcentaje de los genes del ser humano y se ha demostrado que juegan un papel crucial en la regulación de otros genes. Por este motivo, en el último capítulo nos centramos, en un primer lugar, en intentar obtener una metodología que permita averiguar las funciones generales de cada lncRNA haciendo uso de datos ya publicados y, en segundo lugar, generamos una nueva herramienta (spongeScan) que permite predecir qué lncRNAs podrían estar secuestrando determinados micro-RNAs (miRNAs), alterando así la regulación llevada a cabo por estos últimos.<br>[CA] La transcriptòmica és una de les àrees més importants i destacades en bioinformàtica, ja que permet veure quins gens s'expressen en un moment donat per a poder explorar la relació existent entre genotip i fenotip. L'anàlisi transcriptòmic s'ha fet històricament per mitjà de l'ús de microarrays fins l'any 2008 quan la tècnica de seqüenciació massiva d'ARN (RNA-Seq) es va fer pública i va començar a desplaçar a poc a poc el seu ús. No obstant això, a pesar dels avantatges evidents enfront dels microarrays, resultava necessari entendre factors com la qualitat de les dades, reproducibilitat i replicabilitat dels anàlisis, així com els possibles caires introduïts. La primera part de la tesi aborda precisament estos estudis. En primer lloc, es va programar un paquet de R anomenat NOISeq publicat al repositori públic "Bioconductor", el qual inclou un conjunt d'eines per a entendre la qualitat de les dades de RNA-Seq, eines de processat per a minimitzar l'impact del soroll en anàlisis posteriors i dos noves metodologies (NOISeq i NOISeqBio) per a abordar la problemàtica de la comparació entre dos grups (expressió diferencial). D'altra banda, presente la nostra contribució al projecte Sequencing Quality Control (SEQC), una continuació del projecte Microarray Quality Control (MAQC) liderat per la US Food and Drug Administration (FDA) que pretén avaluar precisament la reproducibilitat i replicabilitat dels anàlisis realitzats sobre dades de RNA-Seq. Una de les estratègies més efectives per a entendre els diferents factors que influïxen a la regulació de l'expressió gènica, com pot ser l'efecte sinèrgic dels factors de transcripció, esdeveniments de metilació i accessibilitat de la cromatina, és la integració de la transcriptómica amb altres dades ómiques. Per això es necessita generar un fitxer que indique les posicions cromosòmiques on es produïxen aquests esdeveniments. Per aquest motiu, en el segon capítol de la tesi presentem una nova eina (RGmatch) altament customizable que permet associar aquestes posicions cromosòmiques als possibles gens, transcrits o exons als que podria estar regulant cada un d'aquests esdeveniments regulatoris. Altre dels aspectes de gran interés en aquest camp és l'estudi dels genes no codificants, especialment dels ARN llargs no codificants (lncRNAs). Fins no fa molt, encara es pensava que aquests gens no jugaven cap paper fonamental i es consideraven com a simple soroll transcripcional. No obstant això, suposen un alt percentatge dels gens de l'ésser humà i s'ha demostrat que juguen un paper crucial en la regulació d'altres gens. Per aquest motiu, en l'últim capítol ens centrem, en un primer lloc, en intentar obtenir una metodologia que permeta esbrinar les funcions generals de cada lncRNA fent ús de dades ja publicades i, en segon lloc, presentem una nova eina (spongeScan) que permet predeir quins lncRNAs podríen estar segrestant determinats micro-RNAs (miRNAs), alterant així la regulació duta a terme per aquests últims.<br>Furió Tarí, P. (2020). Development of bioinformatic tools for massive sequencing analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/152485<br>TESIS

APA, Harvard, Vancouver, ISO, and other styles

19

Oguchi, Chizoba. "A Comparison of Sensitive Splice Aware Aligners in RNA Sequence Data Analysis in Leaping towards Benchmarking." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18513.

Full text

Abstract:

Bioinformatics, as a field, rapidly develops and such development requires the design ofalgorithms and software. RNA-seq provides robust information on RNAs, both alreadyknown and new, hence the increased study of the RNA. Alignment is an important step indownstream analyses and the ability to map reads across splice junctions is a requirement ofan aligner to be suitable for mapping RNA-seq reads. Therefore, the necessity for a standardsplice-aware aligner. STAR, Rsubread and HISAT2 have not been singly studied for thepurpose of benchmarking one of them as a standard aligner for spliced RNA-seq reads. Thisstudy compared these aligners, found to be sensitive to splice sites, with regards to theirsensitivity to splice sites, performance with default parameter settings and the resource usageduring the alignment process. The aligners were matched with featureCounts. The resultsshow that STAR and Rsubread outperform HISAT2 in the aspects of sensitivity and defaultparameter settings. Rsubread was more sensitive to splice junctions than STAR butunderperformed with featureCounts. STAR had a consistent performance, with more demandon the memory and time resource, but showed it could be more sensitive with real data.

APA, Harvard, Vancouver, ISO, and other styles

20

DeBlasio, Daniel. "NEW COMPUTATIONAL APPROACHES FOR MULTIPLE RNA ALIGNMENT AND RNA SEARCH." Master's thesis, University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4070.

Full text

Abstract:

In this thesis we explore the the theory and history behind RNA alignment. Normal sequence alignments as studied by computer scientists can be completed in $O(n^2)$ time in the naive case. The process involves taking two input sequences and finding the list of edits that can transform one sequence into the other. This process is applied to biology in many forms, such as the creation of multiple alignments and the search of genomic sequences. When you take into account the RNA sequence structure the problem becomes even harder. Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for multiple RNA alignments first generate pair-wise RNA structure alignments and then build the multiple alignment using only the sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a multiple RNA structure alignment. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. Specifically, we reduce the memory consumption to $\sim O(band^2*m)$ where $band$ is the banding size. Other solutions are $\sim O(n^2*m)$ where $n$ and $m$ are the lengths of the target and query respectively. The algorithm also provides a method to utilize a multi-core environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR outperforms other state-of-the-art programs. Furthermore, we regenerate 607 Rfam seed alignments and show that our automated process creates similar multiple alignments to the manually-curated Rfam seed alignments. While these methods can also be applied directly to genome sequence search, the abundance of new multiple species genome alignments presents a new area for exploration. Many multiple alignments of whole genomes are available and these alignments keep growing in size. These alignments can provide more information to the searcher than just a single sequence. Using the methodology from sequence-structure alignment we developed AlnAlign, which searches an entire genome alignment using RNA sequence structure. While programs have been readily available to align alignments, this is the first to our knowledge that is specifically designed for RNA sequences. This algorithm is presented only in theory and is yet to be tested.<br>M.S.<br>School of Electrical Engineering and Computer Science<br>Engineering and Computer Science<br>Computer Science MS

APA, Harvard, Vancouver, ISO, and other styles

21

Fasold, Mario. "Hybridization biases of microarray expression data - A model-based analysis of RNA quality and sequence effects." Doctoral thesis, Universitätsbibliothek Leipzig, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-116957.

Full text

Abstract:

Modern high-throughput technologies like DNA microarrays are powerful tools that are widely used in biomedical research. They target a variety of genomics applications ranging from gene expression profiling over DNA genotyping to gene regulation studies. However, the recent discovery of false positives among prominent research findings indicates a lack of awareness or understanding of the non-biological factors negatively affecting the accuracy of data produced using these technologies. The aim of this thesis is to study the origins, effects and potential correction methods for selected methodical biases in microarray data. The two-species Langmuir model serves as the basal physicochemical model of microarray hybridization describing the fluorescence signal response of oligonucleotide probes. The so-called hook method allows to estimate essential model parameters and to compute summary parameters characterizing a particular microarray sample. We show that this method can be applied successfully to various types of microarrays which share the same basic mechanism of multiplexed nucleic acid hybridization. Using appropriate modifications of the model we study RNA quality and sequence effects using publicly available data from Affymetrix GeneChip expression arrays. Varying amounts of hybridized RNA result in systematic changes of raw intensity signals and appropriate indicator variables computed from these. Varying RNA quality strongly affects intensity signals of probes which are located at the 3\' end of transcripts. We develop new methods that help assessing the RNA quality of a particular microarray sample. A new metric for determining RNA quality, the degradation index, is proposed which improves previous RNA quality metrics. Furthermore, we present a method for the correction of the 3\' intensity bias. These functionalities have been implemented in the freely available program package AffyRNADegradation. We show that microarray probe signals are affected by sequence effects which are studied systematically using positional-dependent nearest-neighbor models. Analysis of the resulting sensitivity profiles reveals that specific sequence patterns such as runs of guanines at the solution end of the probes have a strong impact on the probe signals. The sequence effects differ for different chip- and target-types, probe types and hybridization modes. Theoretical and practical solutions for the correction of the introduced sequence bias are provided. Assessment of RNA quality and sequence biases in a representative ensemble of over 8000 available microarray samples reveals that RNA quality issues are prevalent: about 10% of the samples have critically low RNA quality. Sequence effects exhibit considerable variation within the investigated samples but have limited impact on the most common patterns in the expression space. Variations in RNA quality and quantity in contrast have a significant impact on the obtained expression measurements. These hybridization biases should be considered and controlled in every microarray experiment to ensure reliable results. Application of rigorous quality control and signal correction methods is strongly advised to avoid erroneous findings. Also, incremental refinement of physicochemical models is a promising way to improve signal calibration paralleled with the opportunity to better understand the fundamental processes in microarray hybridization.

APA, Harvard, Vancouver, ISO, and other styles

22

Kimura, Takayuki. "RNA-protein structure classifiers incorporated into second-generation statistical potentials." Thesis, San Jose State University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10241445.

Full text

Abstract:

<p> Computational modeling of RNA-protein interactions remains an important endeavor. However, exclusively all-atom approaches that model RNA-protein interactions via molecular dynamics are often problematic in their application. One possible alternative is the implementation of hierarchical approaches, first efficiently exploring configurational space with a coarse-grained representation of the RNA and protein. Subsequently, the lowest energy set of such coarse-grained models can be used as scaffolds for all-atom placements, a standard method in modeling protein 3D-structure. However, the coarse-grained modeling likely will require improved ribonucleotide-amino acid potentials as applied to coarse-grained structures. As a first step we downloaded 1,345 PDB files and clustered them with PISCES to obtain a non-redundant complex data set. The contacts were divided into nine types with DSSR according to the 3D structure of RNA and then 9 sets of potentials were calculated. The potentials were applied to score fifty thousand poses generated by FTDock for twenty-one standard RNA-protein complexes. The results compare favorably to existing RNA-protein potentials. Future research will optimize and test such combined potentials. </p>

APA, Harvard, Vancouver, ISO, and other styles

23

Reddy, Veena K. "Analysis of single cell RNA seq data to identify markers for subtyping of non-small cell lung cancer." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18514.

Full text

Abstract:

Single cell RNA technology is a recent technical advancement used to understand the cancertumorgenicity at single cell resolution. In this study we have analyzed the scRNA data from thenon-small cell lung cancer (NSCLC) dataset to facilitate the early identification of NSCLCsubtypes namely, squamous cell carcinoma (SCC) and adenocarcinoma (AC). Non-immunecells, have a major role in tumorigenesis of the malignant tumors, in early stages. Therefore,we have analyzed the major non-immune cells, namely endothelial cells and fibroblast cellsfrom the GSE127465 dataset using SEURAT pipeline. Dimensionality reduction analysis andcluster analysis indicate that AC and SCC subtypes of NSCLC have different fibroblastcompositions. Differential gene expression analysis indicates that AC tumours have shownelevated content of MGP/PTGDS and INMT/MFAP4 fibroblast cells, whereas squamous cellcarcinoma showed an elevated content of COL6A1/COL6A2 and FNDC1/COL12A1 fibroblastcells. The statistical analysis shows that the clustering is statistically significant and not anartefact. Given that the tumour microenvironment is highly dynamic, in this study we haveattempted to understand the tumour microenvironment by scRNA analysis of non-immune cellsat single cell resolution.

APA, Harvard, Vancouver, ISO, and other styles

24

Le, Faucheur Xavier Jean Maurice. "Statistical methods for feature extraction in shape analysis and bioinformatics." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33911.

Full text

Abstract:

The presented research explores two different problems of statistical data analysis. In the first part of this thesis, a method for 3D shape representation, compression and smoothing is presented. First, a technique for encoding non-spherical surfaces using second generation wavelet decomposition is described. Second, a novel model is proposed for wavelet-based surface enhancement. This part of the work aims to develop an efficient algorithm for removing irrelevant and noise-like variations from 3D shapes. Surfaces are encoded using second generation wavelets, and the proposed methodology consists of separating noise-like wavelet coefficients from those contributing to the relevant part of the signal. The empirical-based Bayesian models developed in this thesis threshold wavelet coefficients in an adaptive and robust manner. Once thresholding is performed, irrelevant coefficients are removed and the inverse wavelet transform is applied to the clean set of wavelet coefficients. Experimental results show the efficiency of the proposed technique for surface smoothing and compression. The second part of this thesis proposes using a non-parametric clustering method for studying RNA (RiboNucleic Acid) conformations. The local conformation of RNA molecules is an important factor in determining their catalytic and binding properties. RNA conformations can be characterized by a finite set of parameters that define the local arrangement of the molecule in space. Their analysis is particularly difficult due to the large number of degrees of freedom, such as torsion angles and inter-atomic distances among interacting residues. In order to understand and analyze the structural variability of RNA molecules, this work proposes a methodology for detecting repetitive conformational sub-structures along RNA strands. Clusters of similar structures in the conformational space are obtained using a nearest-neighbor search method based on the statistical mechanical Potts model. The proposed technique is a mostly automatic clustering algorithm and may be applied to problems where there is no prior knowledge on the structure of the data space, in contrast to many other clustering techniques. First, results are reported for both single residue conformations- where the parameter set of the data space includes four to seven torsional angles-, and base pair geometries. For both types of data sets, a very good match is observed between the results of the proposed clustering method and other known classifications, with only few exceptions. Second, new results are reported for base stacking geometries. In this case, the proposed classification is validated with respect to specific geometrical constraints, while the content and geometry of the new clusters are fully analyzed.

APA, Harvard, Vancouver, ISO, and other styles

25

Gibbons, Theodore Robert. "Inferring dinoflagellate genome structure, function, and evolution from short-read high-throughput RNA-Seq." Thesis, University of Maryland, College Park, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10011582.

Full text

Abstract:

<p> Dinoflagellates are a diverse and ancient lineage of globally abundant algae that have adapted to fill a diverse array of important ecological roles. Despite their importance, dinoflagellate genomes remain relatively poorly understood because of their enormous size. It is suspected that dinoflagellate genomes have expanded through rampant gene duplication, possibly using a lineage-specific mechanism that involves reinsertion of mature transcripts back into the genome, and that may rely on spliced leader trans-splicing for reactivation and processing of recycled transcripts. Draft genomes have recently been published for two extremely small endosymbiotic species. These genomes confirm expansion of nearly 10k gene families, relative to other eukaryotes. In the more complete genome, evidence for transcript recycling based on relict spliced leader sequences was found in over 5,500 genes. Genomic efforts in larger dinoflagellates have focused instead on transcriptome sequencing, but transcriptomes assembled from short-read HTS data contain very little evidence for rampant gene duplication, or for trans-splicing. I have shown that apparent disagreement with hypotheses related to ubiquitous trans-splicing and widespread gene duplication are the result of technological limitations. By leveraging the statistical power of high-throughput sequencing, I found that spliced leader suffixes as short as six nucleotides are sufficient for positive identification. I also found that isoform sequences from families of conserved paralogs are systematically collapsed during assembly, but that many of these consensus sequences can be identified using a custom SNP-calling procedure that can be combined with traditional clustering based on pairwise sequence alignment to obtain a more complete picture of gene duplication in dinoflagellates. Efficient, automated homology detection based on pairwise sequence alignment is an equally challenging problem for which there is much room for improvement. I explored alternative metrics for scoring alignments between sequences using a popular procedure based on BLAST and Markov clustering, and showed that simplified metrics perform as well or better than more popular alternatives. I also found that Markov clustering of protein sequences suffers from a serious false positive problem when compared against manual curation, suggesting that it is more appropriate for pre-clustering of very large data sets than as a complete clustering solution. </p>

APA, Harvard, Vancouver, ISO, and other styles

26

Sood, Sanjana. "Developing RNA diagnostics for studying healthy human ageing." Thesis, Loughborough University, 2017. https://dspace.lboro.ac.uk/2134/24708.

Full text

Abstract:

Developing strategies to cope with increase in the ageing population and age-related chronic diseases is one of the societies biggest challenges. The characteristics of the ageing process shows significant inter-individual variation. Building genomic signatures that could account for variation in health outcomes with age may facilitate early prognosis of individual age-correlated diseases (e.g. cancer, coronary artery diseases and dementia) and help in developing better targeted treatments provided years in advance of acquiring disabling symptoms for these diseases. The aim of this thesis was to explore methods for diagnosing molecular features of human ageing. In particular, we utilise multi-platform transcriptomics, independent clinical data and classification methods to evaluate which human tissues demonstrate a reproducible molecular signature for age and which clinical phenotypes correlated with these new RNA biomarkers.

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Semin. "Molecular characterization of protein-nucleic acid interfaces : applications in bioinformatics." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609284.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Fedewa, Gregory. "Quantifying Nucleotide Variation in RNA Virus Populations by Next-generation Sequencing." Thesis, University of California, San Francisco, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10936274.

Full text

Abstract:

<p> RNA viruses include several notable human pathogens including HIV, hepatitis C virus, West Nile virus, influenza, and Ebola virus. This group of viruses includes viruses with incredibly diverse genome structures, such as single-stranded genomes, double-stranded genomes, multipart genomes, negative-stranded genomes, and positive-stranded genomes. They also exist as heterogeneous populations that can mutate and rapidly evolve due to their error-prone polymerases. These errors then accumulate as they are passed down through generation. They can, therefore, be used as a historical marker for genetic relationships. If these errors result in a change of fitness for the virus they can then be used to locate areas in the genome that are undergoing selection pressures.</p><p> In this work, I use these principles to examine what changes are necessary for Ebola virus to infect boa constrictor cells and how high priority RNA viruses mutate as a function of routine viral passaging and propagation. In <i> Chapter 2</i>, I show that Ebola virus requires no additional mutations in order to replicate efficiently in boa constrictor cells. In <i>Chapter 3</i>, I show that SNV analysis can be used to track the identity and passage history of different RNA viruses.</p><p>

APA, Harvard, Vancouver, ISO, and other styles

29

Shi, Jieming. "Novel bioinformatics tools for miRNA-Seq analysis, RNA structure visualization, and genome-wide repeat detection." Miami University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=miami15003113547315.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Ralston, Matthew T. "Assembling improved gene annotations in Clostridium acetobutylicum with RNA sequencing." Thesis, University of Delaware, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1585177.

Full text

Abstract:

<p> The <i>C. acetobutylicum</i> genome annotation has been markedly improved by integrating bioinformatic predictions with RNA sequencing(RNA-seq) data. Samples were acquired under butanol, butyrate, and unstressed treatments across various growth stages to sample the transcriptome from a range of physiologically relevant conditions. Analysis of an initial assembly revealed errors due to technical and biological background signals, challenges with few solutions. Hurdles for RNA-seq transcriptome mapping research include optimizing library complexity and sequencing depth, yet most studies in bacteria report low depth and ignore the effect of ribosomal RNA abundance and other sources on the effective sequencing depth. </p><p> In this work, workflows were established to address type I and II errors associated with these challenges. An integrative analysis method was developed to combine motif predictions, single-nucleotide resolution sequencing depth, and library complexity to resolve these errors during assembly curation. This contextualization minimized false positive error and determined gene boundaries, in some cases, to the exact basepair of prior studies. Curation of the pSOL1 megaplasmid reconciled transcriptome assembly statistics with findings from <i>E. coli</i>. </p><p> The resulting annotation can be readily explored and downloaded through a customized genome browser, enabling future genomic and transcriptomic research in this organism. This work demonstrates the first strand-specific transcriptome assembly in a <i>Clostridium</i> organism. This method can improve the precision of transcript boundary estimates in bacterial transcriptome mapping studies.</p>

APA, Harvard, Vancouver, ISO, and other styles

31

Zhang, Yi. "NOVEL APPLICATIONS OF MACHINE LEARNING IN BIOINFORMATICS." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/83.

Full text

Abstract:

Technological advances in next-generation sequencing and biomedical imaging have led to a rapid increase in biomedical data dimension and acquisition rate, which is challenging the conventional data analysis strategies. Modern machine learning techniques promise to leverage large data sets for finding hidden patterns within them, and for making accurate predictions. This dissertation aims to design novel machine learning-based models to transform biomedical big data into valuable biological insights. The research presented in this dissertation focuses on three bioinformatics domains: splice junction classification, gene regulatory network reconstruction, and lesion detection in mammograms. A critical step in defining gene structures and mRNA transcript variants is to accurately identify splice junctions. In the first work, we built the first deep learning-based splice junction classifier, DeepSplice. It outperforms the state-of-the-art classification tools in terms of both classification accuracy and computational efficiency. To uncover transcription factors governing metabolic reprogramming in non-small-cell lung cancer patients, we developed TFmeta, a machine learning approach to reconstruct relationships between transcription factors and their target genes in the second work. Our approach achieves the best performance on benchmark data sets. In the third work, we designed deep learning-based architectures to perform lesion detection in both 2D and 3D whole mammogram images.

APA, Harvard, Vancouver, ISO, and other styles

32

Khanal, Reecha. "Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques." ScholarWorks@UNO, 2019. https://scholarworks.uno.edu/honors_theses/128.

Full text

Abstract:

Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

33

Lopes, Pinto Fernando. "Development of Molecular Biology and Bioinformatics Tools : From Hydrogen Evolution to Cell Division in Cyanobacteria." Doctoral thesis, Uppsala universitet, Institutionen för fotokemi och molekylärvetenskap, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-110842.

Full text

Abstract:

The use of fossil fuels presents a particularly interesting challenge - our society strongly depends on coal and oil, but we are aware that their use is damaging the environment. Currently, this awareness is gaining momentum, and pressure to evolve towards an energetically cleaner planet is very strong. Molecular hydrogen (H2) is an environmentally suitable energy carrier that could initially supplement or even substitute fossil fuels. Ideally, the primary energy source to produce hydrogen gas should be renewable, and the process of conversion back to energy without polluting emissions, making this cycle environmentally clean. Photoconversion of water to hydrogen can be achieved using the following strategies: 1) the use of photochemical fuel cells, 2) by applying photovoltaics, or 3) by promoting production of hydrogen by photosynthetic microorganisms, either phototrophic anoxygenic bacteria and cyanobacteria or eukaryotic green algae. For photobiological H2 production cyanobacteria are among the ideal candidates since they: a) are capable of H2 evolution, and b) have simple nutritional requirements - they can grow in air (N2 and CO2), water and mineral salts, with light as the only energy source. As this project started, a vision and a set of overall goals were established. These postulated that improved H2 production over a long period demanded: 1) selection of strains taking in consideration their specific hydrogen metabolism, 2) genetic modification in order to improve the H2 evolution, and 3) cultivation conditions in bioreactors should be exmined and improved. Within these goals, three main research objectives were set: 1) update and document the use of cyanobacteria for hydrogen production, 2) create tools to improve molecular biology work at the transcription analysis level, and 3) study cell division in cyanobacteria. This work resulted in: 1) the publication of a review on hydrogen evolution by cyanobacteria, 2) the development of tools to assist understanding of transcription, and 3) the start of a new fundamental research approach to ultimately improve the yield of H2 evolution by cyanobacteria.

APA, Harvard, Vancouver, ISO, and other styles

34

Hsiao, Chiaolong. "Computational bioinformatics on three-dimensional structures of ribosomes using multiresolutional analysis." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26634.

Full text

Abstract:

Thesis (Ph.D)--Chemistry and Biochemistry, Georgia Institute of Technology, 2009.<br>Committee Chair: Williams, Loren; Committee Member: Doyle, Donald; Committee Member: Harvey, Stephen; Committee Member: Hud, Nicholas; Committee Member: Wartell, Roger. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

35

Quan, Jie. "The Roles of RNA-binding Proteins in the Developing Nervous System." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11249.

Full text

Abstract:

RNA-binding proteins are key players in post-transcriptional regulation of gene expression by orchestrating RNA fate from synthesis to decay. Hundreds of proteins with RNA-binding capacity have been identified so far, yet only a small fraction has been functionally characterized and presumably many more RNA-binding proteins await discovery. The roles of RNA-binding proteins in the nervous system are of particular interest because accumulative evidence has linked RNA-based mechanisms to neural development, maintenance and repair. Here, the three RNA-binding proteins under study are IGF-II mRNA binding proteins IMP-1 and IMP-2, known to be involved in mRNA localization, translational control and stability, and adenomatous polyposis coli (APC), identified as a novel RNA-binding protein. To systematically identify their RNA binding profiles, a high-throughput approach combining protein-RNA crosslinking and immunoprecipitation with next-generation sequencing (HITS-CLIP) was applied in embryonic mouse brain. A nonparametric method was developed to computationally analyze the CLIP sequencing data, mapping transcriptome-wide protein-RNA interactions. The identified target mRNAs of IMP-1 and IMP-2 were highly enriched for functions related to neural development, especially neuron projection morphogenesis and axon guidance signaling. Moreover, these target mRNAs were associated with a variety of neurological diseases, including neurodevelopmental and neurodegenerative disorders. Supporting roles in axon development, knockdown of IMP-1 or IMP-2 caused aberrant trajectories of commissural axons in chicken spinal cord. APC mRNA targets were highly enriched for APC-related functions, including microtubule organization, cell and axon motility, Wnt signaling, cancer and neurological disease. Among the APC targets was Tubulin β-2B (Tubb2b), previously known to be required for neuronal migration. It was found that Tubb2b was synthesized in axons, and localized preferentially to dynamic microtubules in the peripheral domain of the growth cone. Blocking the APC binding site in the Tubb2b mRNA 3'UTR caused reduction in its expression in axons and loss of the growth cone peripheral area, and impaired cortical neuron migration in vivo. These findings offer an informative snapshot of the protein-RNA interactome, which can provide a basis to better understand the roles of RNA-binding proteins in the nervous system.

APA, Harvard, Vancouver, ISO, and other styles

36

Gawronski, Alexander. "RiboFSM: Frequent Subgraph Mining for the Discovery of RNA Structures and Interactions." Thèse, Université d'Ottawa / University of Ottawa, 2013. http://hdl.handle.net/10393/26296.

Full text

Abstract:

Frequent subgraph mining is a useful method for extracting biologically relevant patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent a biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei. This species has a unique RNA editing mechanism that has been well studied, making it a good model organism to test RiboFSM. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA.

APA, Harvard, Vancouver, ISO, and other styles

37

Leung, Wing-sze. "Filtering of false positive microRNA candidates by a clustering-based approach." Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B41633908.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Sweeney, Blake Alexander. "Development of a System for Studying Temperature Adaptation of Structural RNAS." Bowling Green State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1321542150.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Darbha, Sriram. "RNA Homology Searches Using Pair Seeding." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1172.

Full text

Abstract:

Due to increasing numbers of non-coding RNA (ncRNA) being discovered recently, there is interest in identifying homologs of a given structured RNA sequence. Exhaustive homology searching for structured RNA molecules using covariance models is infeasible on genome-length sequences. Hence, heuristic methods are employed, but they largely ignore structural information in the query. We present a novel method, which uses secondary structure information, to perform homology searches for a structured RNA molecule. We define the concept of a <em>pair seed</em> and theoretically model alignments of random and related paired regions to compute expected sensitivity and specificity. We show that our method gives theoretical gains in sensitivity and specificity compared to a BLAST-based heuristic approach. We provide experimental verification of this gain. <br /><br /> We also show that pair seeds can be effectively combined with the spaced seeds approach to nucleotide homology search. The hybrid search method has theoretical specificity superior to that of the BLAST seed. We provide experimental evaluation of our hypotheses. Finally, we note that our method is easily modified to process pseudo-knotted regions in the query, something outside the scope of covariance model based methods.

APA, Harvard, Vancouver, ISO, and other styles

40

Crabtree, Nathaniel Mark. "Multi-Class Computational Evolution| Development, Benchmark Comparison, and Application to RNA-Seq Biomarker Discovery." Thesis, University of Arkansas at Little Rock, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10620232.

Full text

Abstract:

<p> A computational evolution system (CES) is a knowledge-discovery engine that constructs and evolves classifiers with a small number of features to identify subtle, synergistic relationships among features and to discriminate groups in high-dimensional data analysis. CESs have previously been designed to only analyze binary datasets. In this work, the CES method has been expanded to accommodate multi-class data.</p><p> The multi-class CES was compared to three common classification and feature selection methods: random forest, random k-nearest neighbor, and support vector machines. The four classifiers were evaluated on three real RNA sequencing datasets. Performance was evaluated via cross validation to assess classification accuracy, number of features selected, stability of the selected feature sets, and run-time.</p><p> The three common classification and feature selection methods were originally designed for microarray data, which is fundamentally different from RNA-Seq data. In order to preprocess RNA-Seq count data for classification, the data was normalized and transformed via a variance stabilizing transformation to remove the variance-mean relationship that is commonly observed in RNA-Seq count data.</p><p> Compared to the three competing methods, the multi-class CES selected far fewer features. The identified features are potential biomarkers that may be more relevant than the longer lists of features identified by the competing methods. The CES performed best on the dataset with the smallest sample size, indicating that it has a unique advantage in these situations since most classification algorithms suffer in terms of accuracy when the sample size is small.</p><p> The CES identified numerous potentially-important biomarkers in each of the three real datasets that are validated by previous research and worthy of additional investigation. CES was especially helpful at identifying important features in the rat blood RNA-Seq data set. Subsequent ontological analysis of these selected features revealed protein folding as an important process in that dataset. The other contribution of this research to science was to extend the applicability of CES to biomarker discovery in multi-class settings. New software algorithms based on CES have already been developed, and the multi-class modifications presented here are directly applicable and would also benefit the newer software.</p><p>

APA, Harvard, Vancouver, ISO, and other styles

41

Herdy, Joseph R. III. "SMALL RNA EXPRESSION DURING PROGRAMMED REARRAGEMENT OF A VERTEBRATE GENOME." UKnowledge, 2014. http://uknowledge.uky.edu/biology_etds/25.

Full text

Abstract:

The sea lamprey (Petromyzon marinus) undergoes programmed genome rearrangements (PGRs) during embryogenesis that results in the deletion of ~0.5 Gb of germline DNA from the somatic lineage. The underlying mechanism of these rearrangements remains largely unknown. miRNAs (microRNAs) and piRNAs (PIWI interacting RNAs) are two classes of small noncoding RNAs that play important roles in early vertebrate development, including differentiation of cell lineages, modulation of signaling pathways, and clearing of maternal transcripts. Here, I utilized next generation sequencing to determine the temporal expression of miRNAs, piRNAs, and other small noncoding RNAs during the first five days of lamprey embryogenesis, a time series that spans the 24-32 cell stage to the formation of the neural crest. I obtained expression patterns for thousands of miRNA and piRNA species. These studies identified several thousand small RNAs that are expressed immediately before, during, and immediately after PGR. Significant sequence variation was observed at the 3’ end of miRNAs, representing template-independent covalent modifications. Patterns observed in lamprey are consistent with expectations that the addition of adenosine and uracil residues plays a role in regulation of miRNA stability during the maternal-zygotic transition. We also identified a conserved motif present in sequences without any known annotation that is expressed exclusively during PGR. This motif is similar to binding motifs of known DNA binding and nuclear export factors, and our data could represent a novel class of small noncoding RNAs operating in lamprey.

APA, Harvard, Vancouver, ISO, and other styles

42

Larsson, Pontus. "Computational Approaches to the Identification and Characterization of Non-Coding RNA Genes." Doctoral thesis, Uppsala universitet, Institutionen för cell- och molekylärbiologi, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-9518.

Full text

Abstract:

Non-coding RNAs (ncRNAs) have emerged as highly diverse and powerful key players in the cell, the range of capabilities spanning from catalyzing essential processes in all living organisms, e.g. protein synthesis, to being highly specific regulators of gene expression. To fully understand the functional significance of ncRNAs, it is of critical importance to identify and characterize the repertoire of ncRNAs in the cell. Practically every genome-wide screen to identify ncRNAs has revealed large numbers of expressed ncRNAs and often identified species-specific ncRNA families of unknown function. Recent years' advancement in high-throughput sequencing techniques necessitates efficient and reliable methods for computational identification and annotation of genes. A major aim in the work underlying this thesis has been to develop and use computational tools for the identification and characterization of ncRNA genes. We used computational approaches in combination with experimental methods to study the ncRNA repertoire of the model organism Dictyostelium discoideum. We report ncRNA genes belonging to well-characterized gene families as well as previously unknown and potentially species-specific ncRNA families. The complicated task of de novo ncRNA gene prediction was successfully addressed by developing a method for nucleotide composition-based gene prediction using maximal-scoring partial sums and considering overlapping dinucleotides. We also report a substantial heterogeneity among human spliceosomal snRNAs. Northern blot analysis and cDNA cloning, as well as bioinformatical analysis of publicly available microarray data, revealed a large number of expressed snRNAs. In particular, U1 snRNA variants with several nucleotide substitutions that could potentially have dramatic effects on splice site recognition were identified. In conclusion, we have by using computational approaches combined with experimental analysis identified a rich and diverse ncRNA repertoire in the eukaryotes D. discoideum and Homo sapiens. The surprising diversity among the snRNAs in H. sapiens suggests a functional involvement in recognition of non-canonical introns and regulation of messenger RNA splicing.

APA, Harvard, Vancouver, ISO, and other styles

43

Otto, Christina, Mathias Möhl, Steffen Heyne, et al. "ExpaRNA-P : simultaneous exact pattern matching and folding of RNAs." Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-159847.

Full text

Abstract:

Background: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. Results: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P webcite. Conclusions: ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications.

APA, Harvard, Vancouver, ISO, and other styles

44

Hosseini, Asanjan Maryam. "Analysis of RNA 3D Folding: PART I. How to Fold a Complex RNA with Few Guanines: The Case of the Mammalian Mitochondrial Ribosomal RNA. PART II. Resolving Ambiguities in RNA Multi-Helix Junction (MHJ) Loops and Automatic Extraction of Them." Bowling Green State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1542336056006577.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Parida, Mrutyunjaya. "Exploring and analyzing omics using bioinformatics tools and techniques." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6244.

Full text

Abstract:

During the Human Genome Project the first hundred billion bases were sequenced in four years, however, the second hundred billion bases were sequenced in four months (NHGRI, 2013). As efforts were made to improve every aspect of sequencing in this project, cost became inversely proportional to the speed (NHGRI, 2013). Human Genome Project ended in April 2003 but research in faster and cheaper ways to sequence the DNA is active to date (NHGRI, 2013). On the one hand, these advancements have allowed the convenient and unbiased generation and interrogation of a variety of omics datasets; on the other hand, they have substantially contributed towards the ever-increasing size of biological data. Therefore, informatics techniques are indispensable tools in the field of biology and medicine due to their ability to efficiently store and probe large datasets. Bioinformatics is a specialized domain under informatics that focusses on biological data storage, organization and analysis (NHGRI, 2013). Here, I have applied informatics approaches such as database designing and web development in the context of biological datasets or bioinformatics, to create a novel web-based resource that allows users to explore the comprehensive transcriptome of common aquatic tunicate named Oikopleura dioica (O .dioica), and access their associated annotations across key developmental time points, conveniently. This unique resource will substantially contribute towards studies on development, evolution and genetics of chordates using O. dioica as a model. Mendelian or single-gene disorders such as cystic fibrosis, sickle-cell anemia, Huntington’s disease, and Rett’s syndrome run across generations in families (Chial, 2008). Allelic variations associated with Mendelian disorders primarily reside in the protein-coding regions of the genome, collectively called an exome (Stenson et al., 2009). Therefore, sequencing of exome rather than whole genome is an efficient and practical approach to discover etiologic variants in our genome (Bamshad et al., 2011). Renal agenesis (RA) is a severe form of congenital anomalies of the kidney and urinary tract (CAKUT) where children are born with one (unilateral renal agenesis) or no kidneys (bilateral renal agenesis) (Brophy et al., 2017; Yalavarthy & Parikh, 2003). In this study, we have applied exome-sequencing technique to selective human patients in a renal agenesis (RA) pedigree that followed a Mendelian mode of disease transmission. Exome sequencing and molecular techniques combined with my bioinformatics analysis has led to the discovery of a novel RA gene called GREB1L (Brophy et al., 2017). In this study, we have successfully demonstrated the validation of exome sequencing and bioinformatics techniques to narrow down disease-associated mutations in human genome. Additionally, the results from this study has substantially contributed towards understanding the molecular basis of CAKUT. Discovery of novel etiologic variants will enhance our understanding of human diseases and development. High-throughput sequencing technique called RNA-Seq has revolutionized the field of transcriptome analysis (Z. Wang, Gerstein, & Snyder, 2009). Concisely, a library of cDNA is prepared from a RNA sample using an enzyme called reverse transcriptase (Nottingham et al., 2016). Next, the cDNA is fragmented, sequenced using a sequencing platform of choice and mapped to a reference genome, assembled transcriptome, or assembled de novo to generate a transcriptome (Grabherr et al., 2011; Nottingham et al., 2016). Mapping allows detection of high-resolution transcript boundaries, quantification of transcript expression and identification of novel transcripts in the genome. We have applied RNA-Seq to analyze the gene expression patterns in water flea otherwise known as D. pulex to work out the genetic details underlying heavy metal induced stress (unpublished) and predator induced phenotypic plasticity (PIPP) (Rozenberg et al., 2015), independently. My bioinformatics analysis of the RNA-Seq data has facilitated the discovery of key biological processes participating in metal induced stress response and predator induced defense mechanisms in D. pulex. These studies are great additions to the field of ecotoxicogenomics, phenotypic plasticity and have aided us in gaining mechanistic insight into the impact of toxicant and predator exposure on D. pulex at a bimolecular level.

APA, Harvard, Vancouver, ISO, and other styles

46

Roll, James Elwood. "Inferring RNA 3D Motifs from Sequence." Bowling Green State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1557482505513958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Raplee, Isaac D. "Contribution of Retrotransposons to Breast Cancer Malignancy." Scholar Commons, 2019. https://scholarcommons.usf.edu/etd/7900.

Full text

Abstract:

The components contributing to cancer progression, especially the transition from early to invasive are unknown. Consequently, the biological reasons are unclear as to why some patients diagnosed with atypia and ductal carcinoma in situ (DCIS) never progress into invasive breast cancer. The “one gene at a time” approach does not sufficiently predict progression. To elucidate the early stage progression to invasive ductal cancer, expression signature of transcripts and transposable elements in micropunched samples of formalin-fixed, paraffin embedded (FFPE) tissue was conducted. A bioinformatics pipeline to analyze poor quality, short reads (>36 nts) from RNA-Seq data was created to compare the most common tools for alignment and differential expression. Most samples from patients prepared for RNA-seq analysis are acquired through archived FFPE tissue collections, which have low RNA quality. The pipeline analytics revealed that STAR alignment software outperformed others. Furthermore, our comparison revealed both DESeq2 and edgeR, with the estimateDisp function applied, both perform well when analyzing greater than 12 replicates. Transcriptome analysis revealed progressive diversification into known oncogenic pathways, a few novel biochemical pathways, in addition to antiviral and interferon activation. Furthermore, the transposable element (TE) signature during breast cancer progression at early stages indicated long terminal repeat (LTRs) as the most abundantly differentially expressed TEs. LTRs belong to endogenous retroviruses (ERV), a subclass of TEs. The retroviral and innate immune response activity in DCIS, which indirectly corroborates the increase in ERV expression in this pre-malignant stage. Finally, to demonstrate the potential role of TEs in the transition from pre-malignant to malignant breast cancer we used pharmacological approaches to alter global TE expression and inhibit retrotransposition activity in control and breast cancer cell lines. It was expected that dysregulation of TEs be associated with increased invasiveness and growth. However, our results indicated that DNA methyltransferase inhibitor 5-Azacytidine (AZA) consistently retarded cell migration and growth. While unexpected, these findings corroborate recent studies that AZA may induce an interferon response in cancer via increased ERV expression. This body of work illustrates the importance of understanding bioinformatics methods used in RNA-seq analysis of common clinical samples. These studies suggest the potential for TEs as biomarkers for disease progression and novel therapeutic approach to investigate in additional model systems.

APA, Harvard, Vancouver, ISO, and other styles

48

Johnson, Travis Steele. "Integrative approaches to single cell RNA sequencing analysis." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1586960661272666.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Tedeschi, Frank A. Tedeschi. "IDENTIFICATION OF CELLULAR RNA BINDING SITES OF DEAD-BOX HELICASES." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1531217057171378.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Stombaugh, Jesse. "Predicting the Structure of RNA 3D Motifs." Bowling Green State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1225391806.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!