To see the other types of publications on this topic, follow the link: Bioinformatics tools.

Dissertations / Theses on the topic 'Bioinformatics tools'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Bioinformatics tools.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Sentausa, Erwin. "Time course simulation replicability of SBML-supporting biochemical network simulation tools." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-33.

Full text
Abstract:
<p>Background: Modelling and simulation are important tools for understanding biological systems. Numerous modelling and simulation software tools have been developed for integrating knowledge regarding the behaviour of a dynamic biological system described in mathematical form. The Systems Biology Markup Language (SBML) was created as a standard format for exchanging biochemical network models among tools. However, it is not certain yet whether actual usage and exchange of SBML models among the tools of different purpose and interfaces is assessable. Particularly, it is not clear whether dynamic simulations of SBML models using different modelling and simulation packages are replicable.</p><p>Results: Time series simulations of published biological models in SBML format are performed using four modelling and simulation tools which support SBML to evaluate whether the tools correctly replicate the simulation results. Some of the tools do not successfully integrate some models. In the time series output of the successful</p><p>simulations, there are differences between the tools.</p><p>Conclusions: Although SBML is widely supported among biochemical modelling and simulation tools, not all simulators can replicate time-course simulations of SBML models exactly. This incapability of replicating simulation results may harm the peer-review process of biological modelling and simulation activities and should be addressed accordingly, for example by specifying in the SBML model the exact algorithm or simulator used for replicating the simulation result.</p>
APA, Harvard, Vancouver, ISO, and other styles
2

Berry, Eric Zachary 1980. "Bioinformatics and database tools for glycans." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/27085.

Full text
Abstract:
Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.<br>Includes bibliographical references (leaves 75-76).<br>Recent advances in biology have afforded scientists with the knowledge that polysaccharides play an active role in modulating cellular activities. Glycosaminoglycans (GAGs) are one such family of polysaccharides that play a very important role in regulating the functions of numerous important signaling molecules and enzymes in the cell. Developing bioinformatics tools has been integral to advancing genomics and proteomics. While these tools have been well-developed to store and process sequence and structure information for proteins and DNA, they are very poorly developed for polysaccharides. Glycan structures pose special problems because of their tremendous information density per fundamental unit, their often-branched structures, and the complicated nature of their building blocks. The GlycoBank, an online database of known GAG structures and functions, has been developed to overcome many of these difficulties by developing a common notation for researchers to describe GAG sequences, a common repository to view known structure-function relationships, and the complex tools and searches needed to facilitate their work. This thesis focuses on the development of GlycoBank. In addition, a large, NIGMS-funded consortium, the Consortium for Functional Glycomics, is a larger database that also aims to store polysaccharide structure-function information of a broader collection of polysaccharides. The ideas and concepts implemented in developing GlycoBank were instrumental in developing databases and bioinformatics tools for the Consortium for Functional Glycomics.<br>by Eric Zachary Berry.<br>M.Eng.and S.B.
APA, Harvard, Vancouver, ISO, and other styles
3

Meng, Da. "Bioinformatics tools for evaluating microbial relationships." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Dissertations/Spring2009/d_meng_042209.pdf.

Full text
Abstract:
Thesis (Ph. D.)--Washington State University, May 2009.<br>Title from PDF title page (viewed on June 8, 2009). "School of Electrical Engineering and Computer Science." Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
4

Martini, Paolo. "Dissecting the transcriptome complexity with bioinformatics tools." Doctoral thesis, Università degli studi di Padova, 2012. http://hdl.handle.net/11577/3422923.

Full text
Abstract:
Bioinformatics has acquired a lot of importance especially with the advent of genomic approaches. The large amount of data produced by ``omics'' experiments requires appropriate frameworks to handle, store and mine the information and to derive appropriate work hypotheses. Transcriptome is defined as the whole amount of RNA molecules produced by a cell that provides the bridge between the genome and proteins. RNA molecules can be divided in two major classes: protein coding RNAs or messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs). While the first class has been the most studied in the last decades, ncRNAs were recently discovered demonstrating their importance in cell regulatory processes. The most important class of the ncRNAs is composed by the micro RNAs (miRNAs) that have been related to several pathologies, including cancer, because of their ability to regulate oncogenes or oncosuppresors and mRNAs involved in the cell cycle. Here, I am presenting a work that aims at following and providing the appropriate structure for the interpretation and storage of the transcriptomics data. In this regard, I devised a tool to integrate expression levels from microarray experiments with gene annotation data like the genome localization and organization in biological pathways. The tool was devised and tuned using two datasets: the first one concerning expression profiles of patients with acute myeloid leukemia (ALL), the second one regarding muscular dystrophies. The application of this new tool to these datasets was very promising, especially regarding meta-analysis studies (muscular dystrophies). For this reason I applied the new tool to analyze public and in-house produced datasets of expression profiles of patients with inflammatory myopathies. This analysis allowed generating the hypothesis of the involvement of JAK-STAT and interferon type I signaling pathways in myopathies. The inferred results were validated using qRT-PCR and the presences of specific proteins produced by validated mRNAs were tested by ELISA and proteomic analysis. To complete and extend the knowledge of the muscle physiology, I used the pig as a new model organism to develop a framework aiming at the integration of miRNA expression and the regulation of their mRNA-target. It was important to develop the appropriate experimental instruments to perform the expression analyses. I developed two microarray platforms to perform the expression profiles of both miRNA and mRNA purified from the same sample. Then, with the expression data, I computationally analyzed aspects of miRNA biogenesis and performed the data integration leading to the production of regulatory networks specific of the studied tissues, including skeletal-muscle. Our miRNA sequences (mature and hairpin) were crossed with public data from RNA-seq experiments demonstrating that there is an important overlap between our results and the sequences identified by RNA-seq, confirming the goodness of our approach<br>Con l’avvento degli approcci genomici la bioinformatica ha acquisito un importanza sempre maggiore nello studio della biologia. Infatti, gli approcci “omici” permettono di produrre un enorme quantitativo di dati che deve essere archiviato in corrette strutture (database). L’archiviazione del dato comporta la necessità di permettere l’accesso e la manipolazione dello stesso al fine di svolgere gli studi appropriati. Sono quindi richiesti strumenti appropriati che consentano l’ispezione e la manipolazione dei database fine di formulare delle ipotesi coerenti con la problematica biologica che si sta studiando. Il trascrittoma è definito come l’insieme delle molecole di RNA che sono prodotte da una cellula e rappresentano un passaggio necessario nel processo che dal gene porta alla produzione della proteina. Le molecole di RNA possono essere suddivise in due grandi gruppi: gli RNA codificanti o messaggeri e gli RNA non codificanti. Mentre la prima classe è stata oggetto di ampi studi negli ultimi decenni, gli RNA non codificanti sono stati scoperti solo di recente e associati a funzioni puramente regolative. La classe più importante coinvolta nel processo regolativo degli RNA messaggeri è quella dei micro RNA (miRNA) che sono stati oggetto di un studio intenso che li ha messi in relazione con lo sviluppo di patologie come il cancro in quanto coinvolti nella regolazione fine dell’espressione genica di oncogeni, oncosoppressori o geni del ciclo cellulare. In questa tesi presento una serie di soluzioni bioinformatiche mirate a fornire le strutture appropriate per condurre gli esperimenti e le analisi dei dati di trascrittomica. Nel corso del periodo di dottorato, ho sviluppato un metodo che consente l’integrazione dei livelli di espressione genica ottenuti da esperimenti di microarray con informazioni riguardanti la localizzazione degli stessi nei cromosomi o la loro organizzazione in processi biologici. Questo metodo è stato messo a punto e raffinato nel suo funzionamento usando due gruppi di dati disponibili nei database pubblici: il primo riguarda dati di espressione genica ottenuti da esperimenti di microarray su leucemia mieloide acuta; il secondo riguarda l’espressione genica di distrofie muscolari derivanti sempre da dati di microarray. I risultati di questo nuovo metodo si sono dimostrati molto promettenti, in particolare nell’applicazione della meta-analisi che consiste nell’integrare dati provenienti da differenti laboratori. Forte di questo primo risultato, ho applicato questo metodo di analisi anche all’ispezione dei processi sregolati nelle miopatie infiammatorie affiancando ai dati disponibili prodotti nel laboratorio di Genomica Funzionale diretto dal Prof. G. Lanfranchi quelli depositati nei database pubblici. La meta-analisi da me implementata ha permesso di studiare questa serie di dati sfruttando, per la prima volta, la localizzazione dei geni e raggruppandoli per la funzione permettendo di generare ipotesi sui meccanismi patologici. Grazie a questa tipologia di analisi ho ipotizzato il coinvolgimento nelle miopatie infiammatorie delle vie di segnale che fanno capo a JAK/STAT e agli interferoni. Le ipotesi generate analizzando i dati sono state confermate andando a validare i geni coinvolti nelle vie di segnale appena menzionate usando la qRT-PCR. Inoltre, usando approcci di proteomica, in collaborazione con la Prof. C. Gelfi (Università di Milano) e la tecnica ELISA, è stata anche validata la presenza delle proteine coinvolte in queste vie di segnale nei pazienti affetti da miopatie infiammatorie. Nella parte conclusiva del mio dottorato, mi sono occupato di completare ed estendere la conoscenza della fisiologia muscolare. Per far questo mi sono spostato sul maiale, un organismo modello molto importante per lo studio di patologie umane e per la produzione di componenti biologiche che possono essere utilizzate per sostituire quelle degradate nell’uomo (valvole aortiche per esempio). Usando il maiale ho sviluppato un sistema per integrare l’espressione dei miRNA e la regolazione che questi esercitano nei messaggeri target. Come prima cosa ho sviluppato le piattaforme di microarray per eseguire l’analisi dell’espressione genica di 14 tessuti di maiale. In particolare ho sviluppato due tipi di piattaforme per eseguire l’analisi dell’espressione dei trascritti e dei miRNA purificati dallo stesso campione. Con questi dati di espressione ho condotto analisi per delucidare alcuni aspetti inerenti la biogenesi dei miRNA. Infine, la completezza dei dati prodotti mi ha permesso di costruire delle reti di regolazione specifiche per ogni tessuto analizzato. Per confermare la validità del nostro approccio ho analizzato il grado di sovrapposizione tra le sequenze derivate dal nostro studio e le sequenze prodotte dai vari esperimenti di RNA-seq. Con questa analisi ho confermato la validità del mio approccio in quanto è stato rivelato una sovrapposizione importante tra le nostre sequenze e quelle derivate da RNA-seq
APA, Harvard, Vancouver, ISO, and other styles
5

Strafford, J. "Docking and bioinformatics tools to guide enzyme engineering." Thesis, University College London (University of London), 2012. http://discovery.ucl.ac.uk/1339145/.

Full text
Abstract:
The carbon-carbon bond forming ability of transketolase (TK), along with its broad substrate specificity, makes it very attractive as a biocatalyst in industrial organic synthesis. Through the production of saturation mutagenesis libraries focused on individual active site residues, several variants of TK have been discovered with enhanced activities on non-natural substrates. We have used computational and bioinformatics tools to increase our understanding of TK and to guide engineering of the enzyme for further improvements in activity. Computational automated docking is a powerful technique with the potential to identify transient structures along an enzyme reaction pathway that are difficult to obtain by experimental structure determination. We have used the AutoDock algorithm to dock a series of known ketol donor and aldehyde acceptor substrates into the active site of E. coli TK, both in the presence and the absence of reactive intermediates. Comparison of docked conformations with available crystal structure complexes allows us to propose a more complete mechanism at a level of detail not currently possible by experimental structure determination alone. Statistical coupling analysis (SCA) utilises evolutionary sequence data present within multiple sequence alignments to identify energetically coupled networks of residues within protein structures. Using this technique we have identified several coupled networks within the TK enzyme which we have targeted for mutagenesis in multiple mutant variant libraries. Screening of these libraries for increased activity on the non-natural substrate propionaldehyde (PA) has identified combinations of mutations that act synergistically on enzyme activity. Notably, a double variant has been discovered with a 20-fold improvement in kcat relative to wild type on the PA reaction, this is higher than any other TK variant discovered to date.
APA, Harvard, Vancouver, ISO, and other styles
6

Petri, Eric D. C. "Bioinformatics Tools for Finding the Vocabularies of Genomes." Ohio University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1213730223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Mahram, Atabak. "FPGA acceleration of sequence analysis tools in bioinformatics." Thesis, Boston University, 2013. https://hdl.handle.net/2144/11126.

Full text
Abstract:
Thesis (Ph.D.)--Boston University<br>With advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools.
APA, Harvard, Vancouver, ISO, and other styles
8

Stenberg, Johan. "Software Tools for Design of Reagents for Multiplex Genetic Analyses." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6832.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Parida, Mrutyunjaya. "Exploring and analyzing omics using bioinformatics tools and techniques." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6244.

Full text
Abstract:
During the Human Genome Project the first hundred billion bases were sequenced in four years, however, the second hundred billion bases were sequenced in four months (NHGRI, 2013). As efforts were made to improve every aspect of sequencing in this project, cost became inversely proportional to the speed (NHGRI, 2013). Human Genome Project ended in April 2003 but research in faster and cheaper ways to sequence the DNA is active to date (NHGRI, 2013). On the one hand, these advancements have allowed the convenient and unbiased generation and interrogation of a variety of omics datasets; on the other hand, they have substantially contributed towards the ever-increasing size of biological data. Therefore, informatics techniques are indispensable tools in the field of biology and medicine due to their ability to efficiently store and probe large datasets. Bioinformatics is a specialized domain under informatics that focusses on biological data storage, organization and analysis (NHGRI, 2013). Here, I have applied informatics approaches such as database designing and web development in the context of biological datasets or bioinformatics, to create a novel web-based resource that allows users to explore the comprehensive transcriptome of common aquatic tunicate named Oikopleura dioica (O .dioica), and access their associated annotations across key developmental time points, conveniently. This unique resource will substantially contribute towards studies on development, evolution and genetics of chordates using O. dioica as a model. Mendelian or single-gene disorders such as cystic fibrosis, sickle-cell anemia, Huntington’s disease, and Rett’s syndrome run across generations in families (Chial, 2008). Allelic variations associated with Mendelian disorders primarily reside in the protein-coding regions of the genome, collectively called an exome (Stenson et al., 2009). Therefore, sequencing of exome rather than whole genome is an efficient and practical approach to discover etiologic variants in our genome (Bamshad et al., 2011). Renal agenesis (RA) is a severe form of congenital anomalies of the kidney and urinary tract (CAKUT) where children are born with one (unilateral renal agenesis) or no kidneys (bilateral renal agenesis) (Brophy et al., 2017; Yalavarthy & Parikh, 2003). In this study, we have applied exome-sequencing technique to selective human patients in a renal agenesis (RA) pedigree that followed a Mendelian mode of disease transmission. Exome sequencing and molecular techniques combined with my bioinformatics analysis has led to the discovery of a novel RA gene called GREB1L (Brophy et al., 2017). In this study, we have successfully demonstrated the validation of exome sequencing and bioinformatics techniques to narrow down disease-associated mutations in human genome. Additionally, the results from this study has substantially contributed towards understanding the molecular basis of CAKUT. Discovery of novel etiologic variants will enhance our understanding of human diseases and development. High-throughput sequencing technique called RNA-Seq has revolutionized the field of transcriptome analysis (Z. Wang, Gerstein, & Snyder, 2009). Concisely, a library of cDNA is prepared from a RNA sample using an enzyme called reverse transcriptase (Nottingham et al., 2016). Next, the cDNA is fragmented, sequenced using a sequencing platform of choice and mapped to a reference genome, assembled transcriptome, or assembled de novo to generate a transcriptome (Grabherr et al., 2011; Nottingham et al., 2016). Mapping allows detection of high-resolution transcript boundaries, quantification of transcript expression and identification of novel transcripts in the genome. We have applied RNA-Seq to analyze the gene expression patterns in water flea otherwise known as D. pulex to work out the genetic details underlying heavy metal induced stress (unpublished) and predator induced phenotypic plasticity (PIPP) (Rozenberg et al., 2015), independently. My bioinformatics analysis of the RNA-Seq data has facilitated the discovery of key biological processes participating in metal induced stress response and predator induced defense mechanisms in D. pulex. These studies are great additions to the field of ecotoxicogenomics, phenotypic plasticity and have aided us in gaining mechanistic insight into the impact of toxicant and predator exposure on D. pulex at a bimolecular level.
APA, Harvard, Vancouver, ISO, and other styles
10

Malatras, Apostolos. "Bioinformatics tools for the systems biology of dysferlin deficiency." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066627/document.

Full text
Abstract:
Le but de mon projet est de créer et d’appliquer des outils pour l’analyse de la biologie des systèmes musculaires en utilisant différentes données OMICS. Ce projet s’intéresse plus particulièrement à la dysferlinopathie due la déficience d’une protéine appelée dysferline qui est exprimée principalement dans les muscles squelettiques et cardiaque. La perte du dysferline due à la mutation (autosomique-récessive) du gène DYSF entraîne une dystrophie musculaire progressive (LGMD2B, MM, DMAT). Nous avons déjà développé des outils bio-informatiques qui peuvent être utilisés pour l’analyse fonctionnelle de données OMICS, relative à la dyspherlinopathie. Ces derniers incluent le test dit «gene set enrichment analysis», test comparant les profils OMICS d’intérêts aux données OMICS musculaires préalablement publiées ; et l’analyse des réseaux impliquant les diffèrent(e)s protéines et transcrits entre eux/elles. Ainsi, nous avons analysé des centaines de données omiques publiées provenant d’archives publiques. Les outils informatiques que nous avons développés sont CellWhere et MyoMiner. CellWhere est un outil facile à utiliser, permettant de visualiser sur un graphe interactif à la fois les interactions protéine-protéine et la localisation subcellulaire des protéines. Myominer est une base de données spécialisée dans le tissu et les cellules musculaires, et qui fournit une analyse de co-expression, aussi bien dans les tissus sains que pathologiques. Ces outils seront utilisés dans l'analyse et l'interprétation de données transcriptomiques pour les dyspherlinopathies mais également les autres pathologies neuromusculaires<br>The aim of this project was to build and apply tools for the analysis of muscle omics data, with a focus on Dysferlin deficiency. This protein is expressed mainly in skeletal and cardiac muscles, and its loss due to mutation (autosomal-recessive) of the DYSF gene, results in a progressive muscular dystrophy (Limb Girdle Muscular Dystrophy type 2B (LGMD2B), Miyoshi myopathy and distal myopathy with tibialis anterior onset (DMAT)). We have developed various tools and pipelines that can be applied towards a bioinformatics functional analysis of omics data in muscular dystrophies and neuromuscular disorders. These include: tests for enrichment of gene sets derived from previously published muscle microarray data and networking analysis of functional associations between altered transcripts/proteins. To accomplish this, we analyzed hundreds of published omics data from public repositories. The tools we developed are called CellWhere and MyoMiner. CellWhere is a user-friendly tool that combines protein-protein interactions and protein subcellular localizations on an interactive graphical display (https://cellwhere-myo.rhcloud.com). MyoMiner is a muscle cell- and tissue-specific database that provides co-expression analyses in both normal and pathological tissues. Many gene co-expression databases already exist and are used broadly by researchers, but MyoMiner is the first muscle-specific tool of its kind (https://myominer-myo.rhcloud.com). These tools will be used in the analysis and interpretation of transcriptomics data from dysferlinopathic muscle and other neuromuscular conditions and will be important to understand the molecular mechanisms underlying these pathologies
APA, Harvard, Vancouver, ISO, and other styles
11

Malatras, Apostolos. "Bioinformatics tools for the systems biology of dysferlin deficiency." Electronic Thesis or Diss., Paris 6, 2017. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2017PA066627.pdf.

Full text
Abstract:
Le but de mon projet est de créer et d’appliquer des outils pour l’analyse de la biologie des systèmes musculaires en utilisant différentes données OMICS. Ce projet s’intéresse plus particulièrement à la dysferlinopathie due la déficience d’une protéine appelée dysferline qui est exprimée principalement dans les muscles squelettiques et cardiaque. La perte du dysferline due à la mutation (autosomique-récessive) du gène DYSF entraîne une dystrophie musculaire progressive (LGMD2B, MM, DMAT). Nous avons déjà développé des outils bio-informatiques qui peuvent être utilisés pour l’analyse fonctionnelle de données OMICS, relative à la dyspherlinopathie. Ces derniers incluent le test dit «gene set enrichment analysis», test comparant les profils OMICS d’intérêts aux données OMICS musculaires préalablement publiées ; et l’analyse des réseaux impliquant les diffèrent(e)s protéines et transcrits entre eux/elles. Ainsi, nous avons analysé des centaines de données omiques publiées provenant d’archives publiques. Les outils informatiques que nous avons développés sont CellWhere et MyoMiner. CellWhere est un outil facile à utiliser, permettant de visualiser sur un graphe interactif à la fois les interactions protéine-protéine et la localisation subcellulaire des protéines. Myominer est une base de données spécialisée dans le tissu et les cellules musculaires, et qui fournit une analyse de co-expression, aussi bien dans les tissus sains que pathologiques. Ces outils seront utilisés dans l'analyse et l'interprétation de données transcriptomiques pour les dyspherlinopathies mais également les autres pathologies neuromusculaires<br>The aim of this project was to build and apply tools for the analysis of muscle omics data, with a focus on Dysferlin deficiency. This protein is expressed mainly in skeletal and cardiac muscles, and its loss due to mutation (autosomal-recessive) of the DYSF gene, results in a progressive muscular dystrophy (Limb Girdle Muscular Dystrophy type 2B (LGMD2B), Miyoshi myopathy and distal myopathy with tibialis anterior onset (DMAT)). We have developed various tools and pipelines that can be applied towards a bioinformatics functional analysis of omics data in muscular dystrophies and neuromuscular disorders. These include: tests for enrichment of gene sets derived from previously published muscle microarray data and networking analysis of functional associations between altered transcripts/proteins. To accomplish this, we analyzed hundreds of published omics data from public repositories. The tools we developed are called CellWhere and MyoMiner. CellWhere is a user-friendly tool that combines protein-protein interactions and protein subcellular localizations on an interactive graphical display (https://cellwhere-myo.rhcloud.com). MyoMiner is a muscle cell- and tissue-specific database that provides co-expression analyses in both normal and pathological tissues. Many gene co-expression databases already exist and are used broadly by researchers, but MyoMiner is the first muscle-specific tool of its kind (https://myominer-myo.rhcloud.com). These tools will be used in the analysis and interpretation of transcriptomics data from dysferlinopathic muscle and other neuromuscular conditions and will be important to understand the molecular mechanisms underlying these pathologies
APA, Harvard, Vancouver, ISO, and other styles
12

Chiara, M. "BIOINFORMATIC TOOLS FOR NEXT GENERATION GENOMICS." Doctoral thesis, Università degli Studi di Milano, 2012. http://hdl.handle.net/2434/173424.

Full text
Abstract:
New sequencing strategies have redefined the concept of “high-throughput sequencing” and many companies, researchers, and recent reviews use the term “Next-Generation Sequencing” (NGS) instead of high-throughput sequencing. These advances have introduced a new era in genomics and bioinformatics⁠⁠. During my years as PhD student I have developed various software, algorithms and procedures for the analysis of Nest Generation sequencing data required for distinct biological research projects and collaborations in which our research group was involved. The tools and algorithms are thus presented in their appropriate biological contexts. Initially I dedicated myself to the development of scripts and pipelines which were used to assemble and annotate the mitochondrial genome of the model plant Vitis vinifera. The sequence was subsequently used as a reference to study the RNA editing of mitochondrial transcripts, using data produced by the Illumina and SOLiD platforms. I subsequently developed a new approach and a new software package for the detection of of relatively small indels between a donor and a reference genome, using NGS paired-end (PE) data and machine learning algorithms. I was able to show that, suitable Paired End data, contrary to previous assertions, can be used to detect, with high confidence, very small indels in low complexity genomic contexts. Finally I participated in a project aimed at the reconstruction of the genomic sequences of 2 distinct strains of the biotechnologically relevant fungus Fusarium. In this context I performed the sequence assembly to obtain the initial contigs and devised and implemented a new scaffolding algorithm which has proved to be particularly efficient.
APA, Harvard, Vancouver, ISO, and other styles
13

Lopes, Pinto Fernando. "Development of Molecular Biology and Bioinformatics Tools : From Hydrogen Evolution to Cell Division in Cyanobacteria." Doctoral thesis, Uppsala universitet, Institutionen för fotokemi och molekylärvetenskap, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-110842.

Full text
Abstract:
The use of fossil fuels presents a particularly interesting challenge - our society strongly depends on coal and oil, but we are aware that their use is damaging the environment. Currently, this awareness is gaining momentum, and pressure to evolve towards an energetically cleaner planet is very strong. Molecular hydrogen (H2) is an environmentally suitable energy carrier that could initially supplement or even substitute fossil fuels. Ideally, the primary energy source to produce hydrogen gas should be renewable, and the process of conversion back to energy without polluting emissions, making this cycle environmentally clean. Photoconversion of water to hydrogen can be achieved using the following strategies: 1) the use of photochemical fuel cells, 2) by applying photovoltaics, or 3) by promoting production of hydrogen by photosynthetic microorganisms, either phototrophic anoxygenic bacteria and cyanobacteria or eukaryotic green algae. For photobiological H2 production cyanobacteria are among the ideal candidates since they: a) are capable of H2 evolution, and b) have simple nutritional requirements - they can grow in air (N2 and CO2), water and mineral salts, with light as the only energy source. As this project started, a vision and a set of overall goals were established. These postulated that improved H2 production over a long period demanded: 1) selection of strains taking in consideration their specific hydrogen metabolism, 2) genetic modification in order to improve the H2 evolution, and 3) cultivation conditions in bioreactors should be exmined and improved. Within these goals, three main research objectives were set: 1) update and document the use of cyanobacteria for hydrogen production, 2) create tools to improve molecular biology work at the transcription analysis level, and 3) study cell division in cyanobacteria. This work resulted in: 1) the publication of a review on hydrogen evolution by cyanobacteria, 2) the development of tools to assist understanding of transcription, and 3) the start of a new fundamental research approach to ultimately improve the yield of H2 evolution by cyanobacteria.
APA, Harvard, Vancouver, ISO, and other styles
14

DePasquale, Erica. "Development of Computational Tools for Single-Cell Discovery." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1614021318421845.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Furió, Tarí Pedro. "Development of bioinformatic tools for massive sequencing analysis." Doctoral thesis, Universitat Politècnica de València, 2020. http://hdl.handle.net/10251/152485.

Full text
Abstract:
[EN] Transcriptomics is one of the most important and relevant areas of bioinformatics. It allows detecting the genes that are expressed at a particular moment in time to explore the relation between genotype and phenotype. Transcriptomic analysis has been historically performed using microarrays until 2008 when high-throughput RNA sequencing (RNA-Seq) was launched on the market, replacing the old technique. However, despite the clear advantages over microarrays, it was necessary to understand factors such as the quality of the data, reproducibility and replicability of the analyses and potential biases. The first section of the thesis covers these studies. First, an R package called NOISeq was developed and published in the public repository "Bioconductor", which includes a set of tools to better understand the quality of RNA-Seq data, minimise the impact of noise in any posterior analyses and implements two new methodologies (NOISeq and NOISeqBio) to overcome the difficulties of comparing two different groups of samples (differential expression). Second, I show our contribution to the Sequencing Quality Control (SEQC) project, a continuation of the Microarray Quality Control (MAQC) project led by the US Food and Drug Administration (FDA, United States) that aims to assess the reproducibility and replicability of any RNA-Seq analysis. One of the most effective approaches to understand the different factors that influence the regulation of gene expression, such as the synergic effect of transcription factors, methylation events and chromatin accessibility, is the integration of transcriptomic with other omics data. To this aim, a file that contains the chromosomal position where the events take place is required. For this reason, in the second chapter, we present a new and easy to customise tool (RGmatch) to associate chromosomal positions to the exons, transcripts or genes that could regulate the events. Another aspect of great interest is the study of non-coding genes, especially long non-coding RNAs (lncRNAs). Not long ago, these regions were thought not to play a relevant role and were only considered as transcriptional noise. However, they represent a high percentage of the human genes and it was recently shown that they actually play an important role in gene regulation. Due to these motivations, in the last chapter we focus, first, in trying to find a methodology to find out the generic functions of every lncRNA using publicly available data and, second, we develop a new tool (spongeScan) to predict the lncRNAs that could be involved in the sequestration of micro-RNAs (miRNAs) and therefore altering their regulation task.<br>[ES] La transcriptómica es una de las áreas más importantes y destacadas en bioinformática, ya que permite ver qué genes están expresados en un momento dado para poder explorar la relación existente entre genotipo y fenotipo. El análisis transcriptómico se ha realizado históricamente mediante el uso de microarrays hasta que, en el año 2008, la secuenciación masiva de ARN (RNA-Seq) fue lanzada al mercado y comenzó a desplazar poco a poco su uso. Sin embargo, a pesar de las ventajas evidentes frente a los microarrays, resultaba necesario entender factores como la calidad de los datos, reproducibilidad y replicabilidad de los análisis así como los potenciales sesgos. La primera parte de la tesis aborda precisamente estos estudios. En primer lugar, se desarrolla un paquete de R llamado NOISeq, publicado en el repositorio público "Bioconductor", el cual incluye un conjunto de herramientas para entender la calidad de datos de RNA-Seq, herramientas de procesado para minimizar el impacto del ruido en posteriores análisis y dos nuevas metodologías (NOISeq y NOISeqBio) para abordar la problemática de la comparación entre dos grupos (expresión diferencial). Por otro lado, presento nuestra contribución al proyecto Sequencing Quality Control (SEQC), una continuación del proyecto Microarray Quality Control (MAQC) liderado por la US Food and Drug Administration (FDA) que pretende evaluar precisamente la reproducibilidad y replicabilidad de los análisis realizados sobre datos de RNA-Seq. Una de las estrategias más efectivas para entender los diferentes factores que influyen en la regulación de la expresión génica, como puede ser el efecto sinérgico de los factores de transcripción, eventos de metilación y accesibilidad de la cromatina, es la integración de la transcriptómica con otros datos ómicos. Para ello se necesita generar un fichero que indique las posiciones cromosómicas donde se producen estos eventos. Por este motivo, en el segundo capítulo de la tesis presentamos una nueva herramienta (RGmatch) altamente customizable que permite asociar estas posiciones cromosómicas a los posibles genes, transcritos o exones a los que podría estar regulando cada uno de estos eventos. Otro de los aspectos de gran interés en este campo es el estudio de los genes no codificantes, especialmente los ARN largos no codificantes (lncRNAs). Hasta no hace mucho, se pensaba que estos genes no jugaban ningún papel fundamental y se consideraban como simple ruido transcripcional. Sin embargo, suponen un alto porcentaje de los genes del ser humano y se ha demostrado que juegan un papel crucial en la regulación de otros genes. Por este motivo, en el último capítulo nos centramos, en un primer lugar, en intentar obtener una metodología que permita averiguar las funciones generales de cada lncRNA haciendo uso de datos ya publicados y, en segundo lugar, generamos una nueva herramienta (spongeScan) que permite predecir qué lncRNAs podrían estar secuestrando determinados micro-RNAs (miRNAs), alterando así la regulación llevada a cabo por estos últimos.<br>[CA] La transcriptòmica és una de les àrees més importants i destacades en bioinformàtica, ja que permet veure quins gens s'expressen en un moment donat per a poder explorar la relació existent entre genotip i fenotip. L'anàlisi transcriptòmic s'ha fet històricament per mitjà de l'ús de microarrays fins l'any 2008 quan la tècnica de seqüenciació massiva d'ARN (RNA-Seq) es va fer pública i va començar a desplaçar a poc a poc el seu ús. No obstant això, a pesar dels avantatges evidents enfront dels microarrays, resultava necessari entendre factors com la qualitat de les dades, reproducibilitat i replicabilitat dels anàlisis, així com els possibles caires introduïts. La primera part de la tesi aborda precisament estos estudis. En primer lloc, es va programar un paquet de R anomenat NOISeq publicat al repositori públic "Bioconductor", el qual inclou un conjunt d'eines per a entendre la qualitat de les dades de RNA-Seq, eines de processat per a minimitzar l'impact del soroll en anàlisis posteriors i dos noves metodologies (NOISeq i NOISeqBio) per a abordar la problemàtica de la comparació entre dos grups (expressió diferencial). D'altra banda, presente la nostra contribució al projecte Sequencing Quality Control (SEQC), una continuació del projecte Microarray Quality Control (MAQC) liderat per la US Food and Drug Administration (FDA) que pretén avaluar precisament la reproducibilitat i replicabilitat dels anàlisis realitzats sobre dades de RNA-Seq. Una de les estratègies més efectives per a entendre els diferents factors que influïxen a la regulació de l'expressió gènica, com pot ser l'efecte sinèrgic dels factors de transcripció, esdeveniments de metilació i accessibilitat de la cromatina, és la integració de la transcriptómica amb altres dades ómiques. Per això es necessita generar un fitxer que indique les posicions cromosòmiques on es produïxen aquests esdeveniments. Per aquest motiu, en el segon capítol de la tesi presentem una nova eina (RGmatch) altament customizable que permet associar aquestes posicions cromosòmiques als possibles gens, transcrits o exons als que podria estar regulant cada un d'aquests esdeveniments regulatoris. Altre dels aspectes de gran interés en aquest camp és l'estudi dels genes no codificants, especialment dels ARN llargs no codificants (lncRNAs). Fins no fa molt, encara es pensava que aquests gens no jugaven cap paper fonamental i es consideraven com a simple soroll transcripcional. No obstant això, suposen un alt percentatge dels gens de l'ésser humà i s'ha demostrat que juguen un paper crucial en la regulació d'altres gens. Per aquest motiu, en l'últim capítol ens centrem, en un primer lloc, en intentar obtenir una metodologia que permeta esbrinar les funcions generals de cada lncRNA fent ús de dades ja publicades i, en segon lloc, presentem una nova eina (spongeScan) que permet predeir quins lncRNAs podríen estar segrestant determinats micro-RNAs (miRNAs), alterant així la regulació duta a terme per aquests últims.<br>Furió Tarí, P. (2020). Development of bioinformatic tools for massive sequencing analysis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/152485<br>TESIS
APA, Harvard, Vancouver, ISO, and other styles
16

Garma, L. D. (Leonardo D. ). "Structural bioinformatics tools for the comparison and classification of protein interactions." Doctoral thesis, Oulun yliopisto, 2017. http://urn.fi/urn:isbn:9789526216065.

Full text
Abstract:
Abstract Most proteins carry out their functions through interactions with other molecules. Thus, proteins taking part in similar interactions are likely to carry out related functions. One way to determine whether two proteins do take part in similar interactions is by quantifying the likeness of their structures. This work focuses on the development of methods for the comparison of protein-protein and protein-ligand interactions, as well as their application to structure-based classification schemes. A method based on the MultiMer-align (or MM-align) program was developed and used to compare all known dimeric protein complexes. The results of the comparison demonstrates that the method improves over MM-align in a significant number of cases. The data was employed to classify the complexes, resulting in 1,761 different protein-protein interaction types. Through a statistical model, the number of existing protein-protein interaction types in nature was estimated at around 4,000. The model allowed the establishment of a relationship between the number of quaternary families (sequence-based groups of protein-protein complexes) and quaternary folds (structure-based groups). The interactions between proteins and small organic ligands were studied using sequence-independent methodologies. A new method was introduced to test three similarity metrics. The best of these metrics was subsequently employed, together with five other existing methodologies, to conduct an all-to-all comparison of all the known protein-FAD (Flavin-Adenine Dinucleotide) complexes. The results demonstrates that the new methodology captures the best the similarities between complexes in terms of protein-ligand contacts. Based on the all-to-all comparison, the protein-FAD complexes were subsequently separated into 237 groups. In the majority of cases, the classification divided the complexes according to their annotated function. Using a graph-based description of the FAD-binding sites, each group could be further characterized and uniquely described. The study demonstrates that the newly developed methods are superior to the existing ones. The results indicate that both the known protein-protein and the protein-FAD interactions can be classified into a reduced number of types and that in general terms these classifications are consistent with the proteins' functions<br>Tiivistelmä Suurin osa proteiinien toiminnasta tapahtuu vuorovaikutuksessa muiden molekyylien kanssa. Proteiinit, jotka osallistuvat samanlaisiin vuorovaikutuksiin todennäköisesti toimivat samalla tavalla. Kahden proteiinin todennäköisyys esiintyä samanlaisissa vuorovaikutustilanteissa voidaan määrittää tutkimalla niiden rakenteellista samankaltaisuutta. Tämä väitöskirjatyö käsittelee proteiini-proteiini- ja proteiini-ligandi -vuorovaikutusten vertailuun käytettyjen menetelmien kehitystä, ja niiden soveltamista rakenteeseen perustuvissa luokittelujärjestelmissä. Tunnettuja dimeerisiä proteiinikomplekseja tutkittiin uudella MultiMer-align-ohjelmaan (MM-align) perustuvalla menetelmällä. Vertailun tulokset osoittavat, että uusi menetelmä suoriutui MM-alignia paremmin merkittävässä osassa tapauksista. Tuloksia käytettiin myös kompleksien luokitteluun, jonka tuloksena oli 1761 erilaista proteiinien välistä vuorovaikutustyyppiä. Luonnossa esiintyvien proteiinien välisten vuorovaikutusten määrän arvioitiin tilastollisen mallin avulla olevan noin 4000. Tilastollisen mallin avulla saatiin vertailtua sekä sekvenssin (”quaternary families”) sekä rakenteen (”quaternary folds”) mukaan ryhmiteltyjen proteiinikompleksien määriä. Proteiinien ja pienien orgaanisten ligandien välisiä vuorovaikutuksia tutkittiin sekvenssistä riippumattomilla menetelmillä. Uudella menetelmällä testattiin kolmea eri samankaltaisuutta mittaavaa metriikkaa. Näistä parasta käytettiin viiden muun tunnetun menetelmän kanssa vertailemaan kaikkia tunnettuja proteiini-FAD (Flavin-Adenine-Dinucleotide, flaviiniadeniinidinukleotidi) -komplekseja. Proteiini-ligandikontaktien osalta uusi menetelmä kuvasi kompleksien samankaltaisuutta muita menetelmiä paremmin. Vertailun tuloksia hyödyntäen proteiini-FAD-kompleksit luokiteltiin edelleen 237 ryhmään. Suurimmassa osassa tapauksista luokittelujärjestelmä oli onnistunut jakamaan kompleksit ryhmiin niiden toiminnallisuuden mukaisesti. Ryhmät voitiin määritellä yksikäsitteisesti kuvaamalla FAD:n sitoutumispaikka graafisesti. Väitöskirjatyö osoittaa, että siinä kehitetyt menetelmät ovat parempia kuin aikaisemmin käytetyt menetelmät. Tulokset osoittavat, että sekä proteiinien väliset että proteiini-FAD -vuorovaikutukset voidaan luokitella rajattuun määrään vuorovaikutustyyppejä ja yleisesti luokittelu on yhtenevä proteiinien toiminnan suhteen
APA, Harvard, Vancouver, ISO, and other styles
17

Malatras, Apostolos [Verfasser]. "Bioinformatics tools for the systems biology of dysferlin deficiency / Apostolos Malatras." Berlin : Freie Universität Berlin, 2018. http://d-nb.info/1171431333/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Cabrera, Cárdenas Claudia Paola. "Bioinformatics tools for the genetic dissection of complex traits in chickens." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3864.

Full text
Abstract:
This thesis explores the genetic characterization of the mechanisms underlying complex traits in chicken through the use and development of bioinformatics tools. The characterization of quantitative trait loci controlling complex traits has proven to be very challenging. This thesis comprises the study of experimental designs, annotation procedures and functional analyses. These represent some of the main ‘bottlenecks’ involved in the integration of QTLs with the biological interpretation of high-throughput technologies. The thesis begins with an investigation of the bioinformatics tools and procedures available for genome research, briefly reviewing microarray technology and commonly applied experimental designs. A targeted experimental design based on the concept of genetical genomics is then presented and applied in order to study a known functional QTL responsible for chicken body weight. This approach contrasts the gene expression levels of two alternative QTL genotypes, hence narrowing the QTL-phenotype gap, and, giving a direct quantification of the link between the genotypes and the genetic responses. Potential candidate genes responsible for the chicken body weight QTL are identified by using the location of the genes, their expression and biological significance. In order to deal with the multiple sources of information and exploit the data effectively, a systematic approach and a relational database were developed to improve the annotation of the probes of the ARK-Genomics G. gallus 13K v4.0 cDNA array utilized on the experiment. To follow up the investigation of the targeted genetical genomics study, a detailed functional analysis is performed on the dataset. The aim is to identify the downstream effects through the identification of functional variation found in pathways, and secondly to achieve a further characterization of potential candidate genes by using comparative genomics and sequence analyses. Finally the investigation of the body weight QTL syntenic regions and their reported QTLs are presented.
APA, Harvard, Vancouver, ISO, and other styles
19

García, Recio Adrián. "Bioinformatics tools for membrane proteins: from sequences to structure and function." Doctoral thesis, Universitat de Vic - Universitat Central de Catalunya, 2022. http://hdl.handle.net/10803/673699.

Full text
Abstract:
Les proteïnes de membrana són un gran grup de proteïnes que tenen un paper essencial a la cèl·lula. Aquest grup inclou receptors, canals iònics, transportadors i enzims, que representen el 25% de les proteïnes del genoma humà. Al voltant del 50% de les proteïnes de membrana són dianes farmacològiques per a diverses malalties. A més, hi ha mutacions patògenes que afecten el seu plegament, estabilitat i funció per al 90% de les proteïnes de membrana. La investigació sobre proteïnes de membrana ha crescut al llarg dels darrers anys. juntament amb el desenvolupament d'eines computacionals per tractar dades de seqüències, estructurals i funcionals, s'han convertit en essencials per entendre tota la informació disponible sobre aquestes proteïnes. L'objectiu principal d'aquesta tesi és desenvolupar eines bioinformàtiques per modelar i analitzar proteïnes de membrana.<br>Membrane proteins are a large group of proteins that play an essential role in the cell. This group includes receptors, ion channels, transporters, and enzymes, which account for 25% of the proteins in the human genome. About 50% of membrane proteins are pharmacological targets for various diseases. In addition, there are pathogenic mutations that affect their folding, stability, and function for 90% of membrane proteins. Research on membrane proteins has grown in recent years. along with the development of computational tools to process sequence data, both structural and functional, have become essential to understanding all available information about these proteins. The main goal of this thesis is to develop bioinformatics tools to model and analyze membrane proteins.<br>Bioinformàtica
APA, Harvard, Vancouver, ISO, and other styles
20

Pierleoni, Andrea <1979&gt. "Design and implementation of bioinformatics tools for large scale genome annotation." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2008. http://amsdottorato.unibo.it/695/1/Tesi_Pierleoni_Andrea.pdf.

Full text
Abstract:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
APA, Harvard, Vancouver, ISO, and other styles
21

Pierleoni, Andrea <1979&gt. "Design and implementation of bioinformatics tools for large scale genome annotation." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2008. http://amsdottorato.unibo.it/695/.

Full text
Abstract:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
APA, Harvard, Vancouver, ISO, and other styles
22

Liu, Youfang. "Analytical tools for population-based association studies." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-08182008-161113/.

Full text
Abstract:
Disease gene fine mapping is an important task in human genetic research. Association analysis is becoming a primary approach for localizing disease loci, especially when abundant SNPs are available due to the well improved genotyping technology during the last decades. Despite the rapid improvement of detection ability, there are many limitations of association strategy. In this dissertation, we focused on three different topics including haplotype similarity based test, association test incorporating genotyping error and simulation tool for large data set. 1) Previous haplotype similarity based tests donât have the ability to incorporate covariates in the test. In chapter 2, we proposed a new association method based on haplotype similarity that incorporates covariates and utilizes maximum amount of data information. We found that our method gives power improvement when neither LD nor allele frequency is too low and is comparable under other scenarios. 2) In chapter 3, we proposed a new strategy that incorporates the genotyping uncertainty to assess the association between traits and SNPs. Extensive simulation studies for case-control designs demonstrated that intensity information based association test can reduce the impact induced by genotyping error. 3) In chapter 4, we described simulation software, SimuGeno, which is used to simulate large scale genomic data for case-control association studies.
APA, Harvard, Vancouver, ISO, and other styles
23

Marani, Paola <1970&gt. "From "wet biology" to statistical analysis of structural features with bioinformatics tools." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2008. http://amsdottorato.unibo.it/689/1/Tesi_Marani_Paola.pdf.

Full text
Abstract:
Many new Escherichia coli outer membrane proteins have recently been identified by proteomics techniques. However, poorly expressed proteins and proteins expressed only under certain conditions may escape detection when wild-type cells are grown under standard conditions. Here, we have taken a complementary approach where candidate outer membrane proteins have been identified by bioinformatics prediction, cloned and overexpressed, and finally localized by cell fractionation experiments. Out of eight predicted outer membrane proteins, we have confirmed the outer membrane localization for five—YftM, YaiO, YfaZ, CsgF, and YliI—and also provide preliminary data indicating that a sixth—YfaL—may be an outer membrane autotransporter.
APA, Harvard, Vancouver, ISO, and other styles
24

Marani, Paola <1970&gt. "From "wet biology" to statistical analysis of structural features with bioinformatics tools." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2008. http://amsdottorato.unibo.it/689/.

Full text
Abstract:
Many new Escherichia coli outer membrane proteins have recently been identified by proteomics techniques. However, poorly expressed proteins and proteins expressed only under certain conditions may escape detection when wild-type cells are grown under standard conditions. Here, we have taken a complementary approach where candidate outer membrane proteins have been identified by bioinformatics prediction, cloned and overexpressed, and finally localized by cell fractionation experiments. Out of eight predicted outer membrane proteins, we have confirmed the outer membrane localization for five—YftM, YaiO, YfaZ, CsgF, and YliI—and also provide preliminary data indicating that a sixth—YfaL—may be an outer membrane autotransporter.
APA, Harvard, Vancouver, ISO, and other styles
25

Meraba, Rebone Leboreng. "Evaluating the predictive performance of cytotoxic T lymphocyte epitope prediction tools using Elispot assay data." Master's thesis, University of Cape Town, 2018. http://hdl.handle.net/11427/27972.

Full text
Abstract:
Computational T-cell epitope prediction tools have been previously devised to predict potential human leukocyte antigen (HLA) binding peptides from protein sequences. These tools are complements of Enzyme-linked immunosorbent spot (ELISpot) assays - a very commonly applied immunological technique that is used both to identify regions of pathogen genomes that trigger an immune response and to characterize the relationships between an individual's complement of HLA alleles and the degree of immunity that they display. If computational tools could accurately predict HLA-peptide binding, then these tools might be useable as a cheap and reliable alternative to ELISpot assays. A web-based IFN γ ELISpot assay dataset sharing resource, called IMMUNO-SHARE, was developed to enable the simple and straightforward storage and dissemination amongst researchers of large volumes of IFN γ ELISpot assay data. Such experimental data was next used to make HLA-peptide binding predictions with four frequently used T-cell epitope prediction tools - netMHC 3.2, IEDB_ANN, IEDB_ARB Matrix and IEDB_SMM. The predictive performances of all four tools individually and collectively was statistically assessed using non-parametric Spearman rank-order correlation tests. It was found that none of the four tested tools yielded binding affinity predictions that were detectably correlated with the observed ELISpot data. High false positive rates, where high predicted binding affinities between peptides and patient HLAs corresponded in these patients with no appreciable immune responses, were apparent for all four of the tested methods. The low degree of correlation between ELISpot data and HLA-peptide binding predictions and in particular, high false positive rates and relatively low true positive and true negative rates, indicate that the four tested tools would require substantial improvement before they could be seen as a viable alternative to ELISpot assays. Given that the accuracy of predictions of each of the four methods tested is largely dependent on both the quantity and quality of known true binder and true non-binder datasets that were used to train the HLA-peptide binding prediction methods implemented by the tools, it is plausible that the accuracy of these tools could be increased with larger training datasets. Retraining either the current methods or the next generation of prediction tools would therefore be greatly facilitated by the availability of large quantities of publically available HLA-peptide binding interaction information. It is hoped that IMMUNO-SHARE or some other ELISpot data sharing resource could eventually meet this need.
APA, Harvard, Vancouver, ISO, and other styles
26

Shi, Jieming. "Novel bioinformatics tools for miRNA-Seq analysis, RNA structure visualization, and genome-wide repeat detection." Miami University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=miami15003113547315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Murat, Katarzyna. "Bioinformatics analysis of epigenetic variants associated with melanoma." Thesis, University of Bradford, 2018. http://hdl.handle.net/10454/17220.

Full text
Abstract:
The field of cancer genomics is currently being enhanced by the power of Epigenome-wide association studies (EWAS). Over the last couple of years comprehensive sequence data sets have been generated, allowing analysis of genome-wide activity in cohorts of different individuals to be increasingly available. Finding associations between epigenetic variation and phenotype is one of the biggest challenges in biomedical research. Laboratories lacking dedicated resources and programming experience require bioinformatics expertise which can be prohibitively costly and time-consuming. To address this, we have developed a collection of freely available Galaxy tools (Poterlowicz, 2018a), combining analytical methods into a range of convenient analysis pipelines with graphical user-friendly interface.The tool suite includes methods for data preprocessing, quality assessment and differentially methylated region and position discovery. The aim of this project was to make EWAS analysis flexible and accessible to everyone and compatible with routine clinical and biological use. This is exemplified by my work undertaken by integrating DNA methylation profiles of melanoma patients (at baseline and mitogen-activated protein kinase inhibitor MAPKi treatment) to identify novel epigenetic switches responsible for tumour resistance to therapy (Hugo et al., 2015). Configuration files are publicly published on our GitHub repository (Poterlowicz, 2018b) with scripts and dependency settings also available to download and install via Galaxy test toolshed (Poterlowicz, 2018a). Results and experiences using this framework demonstrate the potential for Galaxy to be a bioinformatics solution for multi-omics cancer biomarker discovery tool.
APA, Harvard, Vancouver, ISO, and other styles
28

Wang, Kai. "DEVELOPMENT OF MACHINE LEARNING BASED BIOINFORMATICS TOOLS FORCRISPR DETECTION, PIRNA IDENTIFICATION, AND WHOLE-GENOME BISULFITESEQUENCING DATA ANALYSIS." Miami University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=miami1546437447863901.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Roche, Daniel Barry. "The development of bioinformatics tools for the rapid identification of novel cellulase sequences." Thesis, University of Reading, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.558725.

Full text
Abstract:
The main aim of this project was to develop bioinformatics tools to rapidly identify novel cellulases sequences for use in next generation biofuels production. Firstly, a detailed analysis of the sequences and folds of structurally elucidated cellulases was undertaken. From this analysis it was discovered that cellulases are structurally diverse and are classified into 19 different CA TH superfamilies. The study of cellulase fold space was subsequently utilized for the development of a cellulase specific fold recognition tool, CellulaseFOLD. CellulaseFOLD was found to be over 30% faster than the fastest leading fold recognition tool (HHsearch) for the detection of cellulases. In addition, from the evaluation of 3 cellulase containing proteomes, the CellulaseFOLD method achieved a higher percentage coverage of cellulase sequences when compared to HHsearch. Secondly, FunFOLD a ligand binding site residue prediction tool and a novel metric for its evaluation (the Binding-site Distance Test - BDT score), were developed. The FunFOLD method showed a significant improvement over the best available servers and was shown to be competitive with the top methods. In addition, the BDT score was determined to be a more robust score than the previous metric for ligand binding site evaluation (the MCC score) and was subsequently adopted by official assessors at CASP9. Thirdly, a comprehensive analysis of binding site residues for all structurally elucidated cellulases was undertaken. From this study it was concluded that aromatic residues such as tryptophan were important in saccharide binding. Furthermore, cellulase binding sites contained a higher percentage of charged residues, when compared to the entire cellulase structure. Fourthly, a ligand binding site quality assessment tool, FunFOLDQA, was developed, which assesses predictions prior to the availability of experimental data. The FunFOLDQA score was shown to be highly correlated to both the MCC and BDT metrics. Thus, FunFOLDQA can be utilized to assess binding site predict quality in the absence of experimental data. Finally, both the general (FunFOLD and FunFOLDQA) and cellulase specific algorithms (CellulaseFOLD and the cellulase binding site data) were utilized to assess case study sequences, identified as potential cellulases, from 3 proteomes under intense study in the biofuels industry.
APA, Harvard, Vancouver, ISO, and other styles
30

Binatti, Andrea. "The genomic landscape of solid and hematologic malignancies characterized by new bioinformatics tools." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3424919.

Full text
Abstract:
Whole Exome Sequencing (WES) has high power to discover variants in cancer cells, allowing the identification of molecular features underlying diseases development and progression, with important outcomes for cancer diagnosis/prognostication as well as for development and selection of molecularly targeted therapies in personalized medicine. WES projects pose as well different challenges due to biological factors, such as tumour heterogeneity, altered ploidy, low tumor purity, and technical artifacts, that make not obvious the identification of relevant variants. IWhale, an easy-to-use and customizable pipeline based on Docker and SCons, was developed to analyze cancer WES data, to detect and annotate somatic mutations by a combination of four different callers and integration of information deriving from different databases. Moreover, a systems genetics approach and custom data structures were built up to construct pathway-derived meta-networks of mutated genes depicting their direct interactions and functional relations, to ultimately identify key functions and pathways recurrently hit in cancer cells. In collaboration with different groups, increasingly refined and customized versions of the pipeline were applied in three WES studies regarding Large granular lymphocyte leukemia (LGL-L), pediatric follicular lymphomas (PTNFL and PFLT) and High-Risk Neuroblastoma (HR-NB). LGL-L is a rare chronic leukemia with persistent clonal increase of cytotoxic T cells or natural killer (NK) cells often associated to JAK/STAT pathway activation. By analysis of WES data in 19 patients, including cases without STAT mutations (STAT- patients), novel somatic mutations in recurrently mutated genes were identified. 16 selected variants, including those in the tumor suppressor gene FAT4 and in the epigenetic regulator KMT2D, were validated. The new Q706L and S715F STAT5B variants has been also functionally characterized. With pathway-derived network analysis, functional modules composed by several STAT-interacting or STAT-functional connected genes mutated in STAT-negative patients were discovered. Additional modules with putative pathogenic relevance in LGL-L and mutated in the absence of STAT mutations were identified. In PTNFL, recently recognized as a defined clinicopathological entity, WES analysis of the largest cohort collected so far uncovered mutations in the few genes, TNFRSF14, IRF8 and MAP2K1 previously associated to PTNFL, identifying as well novel mutations and genes. Eleven validated variants prioritized as possible drivers hit the recurrently mutated ARHGEF1, MAP2K1 and TNFRSF14 genes, as well as ATG7, GNA13, RSF1, UBAP2, and ZNF608. G-protein coupled receptor signaling and chromatin modifying enzyme alterations was linked for the first time to PTNFL and PFLT according to obtained findings. NB, a solid cancer arising from primitive neural crest cells and accounting for 9% of pediatric tumors, is characterized by high clinical heterogeneity and low mutation recurrence even in known driver (MYCN, ALK, ATRX). To clarify the biological basis of disease aggressiveness, WES was used to examine the genomic landscape of HR-NB patients at metastatic stage with short survival (SS) and long survival (LS). A few genes, including SMARCA4, SMO, ZNF44 and CHD2, were recurrently mutated only in the SS group and HotNet2 analysis revealed that in the two patient groups, mutations occurred in different pathways. Notably mutations of SS patients clustered into a six significantly mutated subnetworks, involved into MAPK pathway associated with the organization of the extracellular matrix, to cell motility through PTK2 signaling, to matrix metalloproteinase activity, to centrosome maturation and chromosome remodeling, to metabolism of nucleotides and lipoproteins, and to transport of small molecules. Since FDA-approved compounds targeting the deregulated pathways are available these findings may help to improve the treatment of HR-NB patients with most aggressive disease.<br>Il sequenziamento dell’esoma (WES) rileva efficacemente varianti in cellule tumorali, identificando le caratteristiche molecolari coinvolte nella patogenesi e nella progressione della malattia, con importanti risvolti per la diagnosi e per lo sviluppo e la scelta di terapie personalizzate. L’analisi di dati WES di tumori presenta tuttavia varie complicazioni dovute all’eterogeneità tumorale, ad alterazioni della ploidia, a contaminazioni dei campioni o ad artefatti tecnici. La pipeline iWhale, basata su Docker e SCons, è stata sviluppata per analizzare dati WES di tumori con l’obiettivo di rilevare ed annotare mutazioni somatiche tramite l’uso di quattro diversi software (MuTect, MuTect2, Strelka2 e VarScan2) e l’integrazione di informazioni provenienti da vari database. Inoltre, ho collaborato allo sviluppo di un metodo per la costruzione di meta-reti di geni mutati che sono annotati in database di pathway e ho costruito una struttura di dati customizzata per rilevare statisticamente pathway ricorrentemente mutati in cellule tumorali. In collaborazione con diversi gruppi di ricerca, ho utilizzato ed adattato di volta in volta versioni progressivamente più rifinite della mia pipeline in studi riguardanti la leucemia linfocitica granulare a grandi cellule T (LGL-L), due tipi di linfomi follicolari pediatrici (PTNFL e PFLT) e Neuroblastoma ad alto rischio (HR-NB). LGL-L è una leucemia cronica rara caratterizzata da una persistente crescita clonale di cellule citotossiche T o natural killer (NK) dovuta all’attivazione del pathway JAK/STAT. Mediante analisi WES sono state identificate nuove mutazioni somatiche in geni ricorrentemente mutati in 19 pazienti con LGL-L, comprendenti casi senza mutazioni nei geni STAT. Sono state selezionate per validazione con sequenziamento Sanger 16 varianti in diversi geni, tra le quali l’oncosoppressore FAT4 e il regolatore epigenetico KMT2D. Nuove varianti Q706L e S715F in STAT5B sono state anche caratterizzate funzionalmente. Grazie ad analisi di reti derivate da pathway, sono state identificate delle componenti funzionali composte da geni mutati, funzionalmente o direttamente interagenti con i geni STAT, in pazienti STAT negativi. Altre componenti funzionali con una possibile rilevanza nella patogenesi di LGL-L in assenza di mutazioni nei geni STAT sono emerse dalle analisi. Una coorte di pazienti affetti da linfomi follicolari pediatrici è stata analizzata tramite WES. Sono state confermate mutazioni presenti in TNFRSF14, IRF8 e MAP2K1, geni precedentemente associati a PTNFL, e sono stati caratterizzati nuove mutazioni e geni con possibile coinvolgimento nello sviluppo di PTNFL. Undici varianti presenti in ARHGEF1, MAP2K1, TNFRSF14, ATG7, GNA13, RSF1, UBAP2 e ZNF608 sono state validate e selezionate come possibili eventi driver in PTNFL e PFLT. I nostri risultati hanno per la prima volta permesso di associare il pathway GPCR ed enzimi modificatori della cromatina ai linfomi follicolari pediatrici. NB è un tumore solido che origina dalle cellule della cresta neurale primitiva ed è caratterizzato da un’alta eterogeneità clinica e da pochi geni ricorrentemente mutati (MYCN, ALK, ATRX). Per investigare sulle basi biologiche coinvolte nell’aggressività di NB, è stato effettuato WES di pazienti affetti da HR-NB con metastasi e divisi in base alla sopravvivenza (pazienti SS e LS, rispettivamente con sopravvivenza inferiore o uguale e superiore a 5 anni). Solo i geni SMARCA4, SMO, ZNF44 e CHD2 sono stati trovati mutati ricorrentemente in modo specifico in pazienti SS. HotNet2 ha rivelato che le mutazioni rilevate nei due gruppi ricadevano in pathway diversi. Le mutazioni dei pazienti SS si sono raggruppate in sei sotto-reti significativamente mutate, coinvolte nell’organizzazione della matrice extracellulare tramite MAPK pathway, nella motilità cellulare tramite PTK2, nell’attività delle metalloproteinasi della matrice, nella maturazione del centrosoma e nel rimodellamento dei cromosomi. Grazie all’esistenza di farmaci già approvati dalla FDA che hanno come bersaglio alcune delle proteine mutate o delle pathway identificate, i risultati ottenuti possono facilitare lo sviluppo di terapie mirate ai pazienti con le forme più aggressive di HR-NB.
APA, Harvard, Vancouver, ISO, and other styles
31

Arango, Argoty Gustavo Alonso. "Computational Tools for Annotating Antibiotic Resistance in Metagenomic Data." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/88987.

Full text
Abstract:
Metagenomics has become a reliable tool for the analysis of the microbial diversity and the molecular mechanisms carried out by microbial communities. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. Interpretation of specific information from metagenomic data is especially a challenge for environmental samples as current annotation systems only offer broad classification of microbial diversity and function. Therefore, I developed MetaStorm, a public web-service that facilitates customization of computational analysis for metagenomic data. The identification of antibiotic resistance genes (ARGs) from metagenomic data is carried out by searches against curated databases producing a high rate of false negatives. Thus, I developed DeepARG, a deep learning approach that uses the distribution of sequence alignments to predict over 30 antibiotic resistance categories with a high accuracy. Curation of ARGs is a labor intensive process where errors can be easily propagated. Thus, I developed ARGminer, a web platform dedicated to the annotation and inspection of ARGs by using crowdsourcing. Effective environmental monitoring tools should ideally capture not only ARGs, but also mobile genetic elements and indicators of co-selective forces, such as metal resistance genes. Here, I introduce NanoARG, an online computational resource that takes advantage of the long reads produced by nanopore sequencing technology to provide insights into mobility, co-selection, and pathogenicity. Sequence alignment has been one of the preferred methods for analyzing metagenomic data. However, it is slow and requires high computing resources. Therefore, I developed MetaMLP, a machine learning approach that uses a novel representation of protein sequences to perform classifications over protein functions. The method is accurate, is able to identify a larger number of hits compared to sequence alignments, and is >50 times faster than sequence alignment techniques.<br>Doctor of Philosophy<br>Antimicrobial resistance (AMR) is one of the biggest threats to human public health. It has been estimated that the number of deaths caused by AMR will surpass the ones caused by cancer on 2050. The seriousness of these projections requires urgent actions to understand and control the spread of AMR. In the last few years, metagenomics has stand out as a reliable tool for the analysis of the microbial diversity and the AMR. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics, a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. In particular, by the development of computational pipelines to process metagenomics data in the cloud and distributed systems, the development of machine learning and deep learning tools to ease the computational cost of detecting antibiotic resistance genes in metagenomic data, and the integration of crowdsourcing as a way to curate and validate antibiotic resistance genes.
APA, Harvard, Vancouver, ISO, and other styles
32

Adwik, G. A. "Use of molecular and bioinformatics tools for developing methods of epidemiological identification of trypanosomes." Thesis, University of Salford, 2016. http://usir.salford.ac.uk/37755/.

Full text
Abstract:
Human African trypanosomiasis (HAT), also known as sleeping sickness, has been a major health problem for populations in Africa caused by the Trypanosoma brucei spp parasite. Although, the current number of the reported cases is on the decrease, more efforts are required to try to control or eliminate the disease. The recent advances in molecular techniques have contributed towards identifying taxonomic groups at all levels (species, subspecies, populations, strains and isolates). Commonly, field samples are collected and stored using Whatman FTA cards. Many molecular epidemiological tools are available for detection and strain typing in trypanosomes. These tools include nested ITS-PCR, which is based on size variation of the ITS genes and MGE-PCR, which is based on variations in position of mobile genetic elements (MGEs). Although commonly used, these tools have not been fully validated. For example, the ITS-PCR has not been used or validated against blood samples obtained from sleeping sickness patients in Angola. Furthermore, the MGE-PCR system has not been evaluated for use directly from FTA cards. The aim of this thesis is to develop improved molecular tools to assist diagnostic and epidemiological studies. In order to improve the molecular diagnostic use of Whatman FTA cards, an extraction method based on Chelex was investigated. Using Chelex to extractT. brucei DNA from FTA cards, followed by a nested ITS-PCR detection system, allowed parasite DNA detection to 1ng/µl. To evaluate this tool on field samples, ITS-PCR amplification was carried on DNA eluted by Chelex extraction from 36 FTA cards spotted with blood from Angolan patients tested positive for trypanosomiasis by the card agglutination test (CATT). Twenty four of these samples were successfully PCR amplified using mammalian tubulin primers. Of these 24 samples, 11 (= 45.8%) were confirmed as trypanosome positive utilising a specific ITS-PCR based approach. As such, this indicates that further work is necessary to improve the PCR-based reliability of diagnosis. To this end, an MGE-PCR approach was used to attempt parasite strain identification. Although the MGE-PCR was found to be more sensitive than ITS-PCR in amplification of DNA from FTA cards, the resulting sequence data was not able to confirm that the amplicons were of trypanosome origin and hence further analysis, or approaches, are required. With a view to developing new diagnostic tools a bioinformatic analysis of mobile elements inserted in RHS/pseudogenes in the T.brucei genome was carried out. The aims of this were to locate variable regions of these genes that could be used as detailed markers for trypanosome strain identification. Sequences of the RHS genes were retrieved from the T.brucei brucei and T. brucei gambiense genomes to investigate positional diversity of MGEs within this family of genes. Differences were found in the presence/absence of RIME elements in one RHS gene between the two subspecies. More detailed investigation of all RHS gene classes in T. b. brucei showed six classes of RHS gene types and within each class, individual sequences showed evidence of insertion by MGEs. In some specific instances, evidence of pre-insertion, insertion and subsequent removal of MGEs was seen. This enabled a temporal evolutionary sequence of events to be interpreted. As such, the RHS genes offer the opportunity to develop specific molecular epidemiological tools for investigating the evolution of MGEs in field samples and to carry out temporally informed epidemiological tracking of isolates.
APA, Harvard, Vancouver, ISO, and other styles
33

Motro, Yair. "Comparative genomics analysis and development of bioinformatics tools for two newly sequenced spirochaete species." Thesis, Motro, Yair (2008) Comparative genomics analysis and development of bioinformatics tools for two newly sequenced spirochaete species. PhD thesis, Murdoch University, 2008. https://researchrepository.murdoch.edu.au/id/eprint/41679/.

Full text
Abstract:
The bacterial family Spirochaetales contains a number of potent pathogens responsible for serious and well-known diseases, such as tick Lyme disease (Borrelia burgdoferri), leptospirosis (Leptospira interrogens), and sypillis (Treponema pallidum). Though the mentioned species have been extensively investigated, there still remain spirochaete genera, and the spirochaete family as a whole, that have been minimally characterised. The Brachyspira genera includes species primarily responsible for gastro-intestinal diseases. Some biological characteristics of the two species B. hyodysenteriae and B. pilosicoli are known. For example B. hyodysenteriae causes disease in swine, while B. pilosicoli causes disease in a wide range of animals and humans. As there are no whole genome sequences available for any Brachyspira species, their underlying molecular mechanisms, evolution and function are not understood. This work is part of a large project which aims to sequence the two whole genome sequences for vaccine design and development. This thesis represents the first report of an in-depth comparative genome analysis (CGA) of the novel whole genome sequences of both Brachyspira species, providing greater understanding into their genomic functional relationships, evolution and diversity, while also identifying elements for potential vaccine and drug design and development.
APA, Harvard, Vancouver, ISO, and other styles
34

Carraro, Marco. "Development of bioinformatics tools to predict disease predisposition from Next Generation Sequencing (NGS) data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3426807.

Full text
Abstract:
The sequencing of the human genome has opened up completely new avenues in research and the notion of personalized medicine has become common. DNA Sequencing technology has evolved by several orders of magnitude, coming into the range of $1,000 for a complete human genome. The promise of identifying genetic variants that influence our lifestyles and make us susceptible to diseases is now becoming reality. However, genome interpretation remains one the most challenging problems of modern biology. The focus of my PhD project is the development of bioinformatics tools to predict diseases predisposition from sequencing data. Several of these methods have been tested in the context of the Critical Assessment of Genome Interpretation (CAGI), always achieving good prediction performances. During my PhD project I faced the complete spectrum of challenges to be address in order to translate the sequencing revolution into clinical practice. One of the biggest problem when dealing with sequencing data is the interpretation of variants pathogenic effect. Dozens of bioinformatics tools have been created to separate mutations that could be involved in a pathogenic phenotype from neutral variants. In this context the problem of benchmarking is critical, as prediction performance are usually tested on different sets of variants, making the comparison among these tools impossible. To address this problem I performed a blinded comparison of pathogenicity predictors in the context of CAGI, realizing the most complete performance assessment among all the iterations of this collaborative experiment. Another challenge that needs to be address to realize the personalized medicine revolution is the phenotype prediction. During my PhD I had the opportunity to develop several methods for the complex phenotype prediction from targeted enrichment and exome sequencing data. In this context challenges like misinterpretation or overinterpretation of variants pathogenicity have emerged, like in the case of phenotype prediction from the Hopkins Clinical Panel. In addition, other complementary issues of phenotype predictions, like the possible presence of incidental findings have to be considered. Ad hoc prediction strategies have been defined while facing with different kinds of sequencing data. A clear example is the case of Crohn’s disease risk prediction. Always in the context of the CAGI experiment, three iterations of this prediction challenge have been run so far. Analysis of datasets revealed how population structure and bias in data preparation and sequencing could affect prediction performance, leading to inflated results. For this reason a completely new prediction strategy has been defined for the last edition of the Crohn’s disease challenge, exploiting data from Genome Wide Association Studies and Protein Protein Interaction network, to address the problem of missing heritability. Good prediction performance have been achieved, especially for individuals with an extreme predicted risk score. Last, my work has been focused on the prediction of a health related trait: the blood group phenotype. The accuracy of serological tests is very poor for minor blood groups or weak phenotypes. Blood groups incompatibilities can be harmful for critical individuals like oncohematological patients. BOOGIE exploits haplotype tables, and the nearest neighbor algorithm to identify the correct phenotype of a patient. The accuracy of our method has been tested in ABO and RhD systems achieving good results. In addition, our analyses paved the way for a further increase in performance, moving towards a prediction system that in the future could become a real alternative to wet lab experiments.<br>Il completamento del progetto genoma umano ha aperto numerosi nuovi orizzonti di ricerca. Tra questi, la possibilità di conoscere le basi genetiche che rendono ogni individuo suscettibile alle diverse malattie ha aperto la strada ad una nuova rivoluzione: l’avvento della medicina personalizzata. Le tecnologie di sequenziamento del DNA hanno subito una notevole evoluzione, ed oggi il prezzo per sequenziare un genoma è ormai prossimo alla soglia psicologica dei $ 1 000. La promessa di identificare varianti genetiche che influenzano il nostro stile di vita e che ci rendono suscettibili alle malattie sta quindi diventando realtà. Tuttavia, molto lavoro è ancora necessario perché questo nuovo tipo di medicina possa trasformarsi in realtà. In particolare la sfida oggi non è più data dalla generazione dei dati di sequenziamento, ma è rappresentata invece dalla loro interpretazione. L'obiettivo del mio progetto di dottorato è lo sviluppo di metodi bioinformatici per predire la predisposizione a patologie, a partire da dati di sequenziamento. Molti di questi metodi sono stati testati nel contesto del Critical Assessment of Genome Interpretation (CAGI), una competizione internazionale focalizzata nel definire lo stato dell’arte per l’interpretazione del genoma, ottenendo sempre buoni risultati. Durante il mio progetto di dottorato ho avuto l'opportunità di affrontare l’intero spettro delle sfide che devono essere gestite per tradurre le nuove capacità di sequenziamento del genoma in pratica clinica. Uno dei problemi principali che si devono gestire quando si ha a che fare con dati di sequenziamento è l'interpretazione della patogenicità delle mutazioni. Decine di predittori sono stati creati per separare varianti neutrali dalle mutazioni che possono essere causa di un fenotipo patologico. In questo contesto il problema del benchmarking è fondamentale, in quanto le prestazioni di questi tool sono di solito testate su diversi dataset di varianti, rendendo impossibile un confronto di performance. Per affrontare questo problema, una comparazione dell’accuratezza di questi predittori è stata effettuata su un set di mutazioni con fenotipo ignoto nel contesto del CAGI, realizzando la valutazione per predittori di patogenicità più completa tra tutte le edizioni di questo esperimento collaborativo. La previsione di fenotipi a partire da dati di sequenziamento è un'altra sfida che deve essere affrontata per realizzare le promesse della medicina personalizzata. Durante il mio dottorato ho avuto l'opportunità di sviluppare diversi predittori per fenotipi complessi utilizzando dati provenienti da pannelli genici ed esomi. In questo contesto sono stati affrontati problemi come errori di interpretazione o la sovra interpretazione della patogenicità della varianti, come nel caso della sfida focalizzata sulla predizione di fenotipi a partire dall’Hopkins Clinical Panel. Sono inoltre emersi altri problemi complementari alla previsione di fenotipo, come per esempio la possibile presenza di risultati accidentali. Specifiche strategie di predizione sono state definite lavorando con diversi tipi di dati di sequenziamento. Un esempio è dato dal morbo di Crohn. Tre edizioni del CAGI hanno proposto la sfida di identificare individui sani o affetti da questa patologia infiammatoria utilizzando unicamente dati di sequenziamento dell’esoma. L'analisi dei dataset ha rivelato come la presenza di struttura di popolazione e problemi nella preparazione e sequenziamento degli esomi abbiano compromesso le predizioni per questo fenotipo, generando una sovrastima delle performance di predizione. Tenendo in considerazione questo dato è stata definita una strategia di predizione completamente nuova per questo fenotipo, testata in occasione dell'ultima edizione del CAGI. Dati provenienti da studi di associazione GWAS e l’analisi delle reti di interazione proteica sono stati utilizzati per definire liste di geni coinvolti nell’insorgenza della malattia. Buone performance di predizione sono state ottenute in particolare per gli individui a cui era stata assegnata una elevata probabilità di essere affetti. In ultima istanza, il mio lavoro è stato focalizzato sulla predizione di gruppi sanguigni, sempre a partire da dati di sequenziamento. L'accuratezza dei test sierologici, infatti, è ridotta in caso di gruppi di sangue minori o fenotipi deboli. Incompatibilità per tali gruppi sanguigni possono essere critiche per alcune classi di individui, come nel caso dei pazienti oncoematologici. La nostra strategia di predizione ha sfruttato i dati genotipici per geni che codificano per gruppi sanguigni, presenti in database dedicati, e il principio di nearest neighbour per effettuare le predizioni. L’accuratezza del nostro metodo è stata testata sui sistemi ABO e RhD ottenendo buone performance di predizione. Inoltre le nostre analisi hanno aperto la strada ad un ulteriore aumento delle prestazioni per questo tool.
APA, Harvard, Vancouver, ISO, and other styles
35

Verma, Rajni [Verfasser]. "Development and Application of Novel Bioinformatics and Computational Modeling Tools for Protein Engineering Advanced Computational Tools for Protein Engineering / Rajni Verma." Bremen : IRC-Library, Information Resource Center der Jacobs University Bremen, 2013. http://d-nb.info/103526966X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Greated, Alicia. "The IncP-9 plasmid group : characterisation of genomic sequences and development of tools for environmental monitoring." Thesis, University of Birmingham, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.366379.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Blischak, Paul David. "Developing Computational Tools for Evolutionary Inferences in Polyploids." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1531400134548368.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Hossain, A. S. Md Mukarram. "Scalable tools for high-throughput viral sequence analysis." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/276228.

Full text
Abstract:
Viral sequence data are increasingly being used to estimate evolutionary and epidemiological parameters to understand the dynamics of viral diseases. This thesis focuses on developing novel and improved computational methods for high-throughput analysis of large viral sequence datasets. I have developed a novel computational pipeline, Pipelign, to detect potentially unrelated sequences from groups of viral sequences during sequence alignment. Pipelign detected a large number of unrelated and mis-annotated sequences from several viral sequence datasets collected from GenBank. I subsequently developed ANVIL, a machine learning-based recombination detection and subtyping framework for pathogen sequences. ANVIL's performance was benchmarked using two large HIV datasets collected from the Los Alamos HIV Sequence Database and the UK HIV Drug Resistance Database, as well as on simulated data. Finally, I present a computational pipeline named Phlow, for rapid phylodynamic inference of heterochronous pathogen sequence data. Phlow is implemented with specialised and published analysis tools to infer important phylodynamic parameters from large datasets. Phlow was run with three empirical viral datasets and their outputs were compared with published results. These results show that Phlow is suitable for high-throughput exploratory phylodynamic analysis of large viral datasets. When combined, these three novel computational tools offer a comprehensive system for large scale viral sequence analysis addressing three important aspects: 1) establishing accurate evolutionary history, 2) recombination detection and subtyping, and 3) inferring phylodynamic history from heterochronous sequence datasets.
APA, Harvard, Vancouver, ISO, and other styles
39

Kamepalli, Phanindra. "User Interface and Modified Testbench to Support Comprehensive Analysis of Protein Structural Alignment Tools." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1313766325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Faust, Karoline. "Development, assessment and application of bioinformatics tools for the extraction of pathways from metabolic networks." Doctoral thesis, Universite Libre de Bruxelles, 2010. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210054.

Full text
Abstract:
Genes can be associated in numerous ways, e.g. by co-expression in micro-arrays, co-regulation in operons and regulons or co-localization on the genome. Association of genes often indicates that they contribute to a common biological function, such as a pathway. The aim of this thesis is to predict metabolic pathways from associated enzyme-coding genes. The prediction approach developed in this work consists of two steps: First, the reactions are obtained that are carried out by the enzymes coded by the genes. Second, the gaps between these seed reactions are filled with intermediate compounds and reactions. In order to select these intermediates, metabolic data is needed. This work made use of metabolic data collected from the two major metabolic databases, KEGG and MetaCyc. The metabolic data is represented as a network (or graph) consisting of reaction nodes and compound nodes. Interme- diate compounds and reactions are then predicted by connecting the seed reactions obtained from the query genes in this metabolic network using a graph algorithm.<p>In large metabolic networks, there are numerous ways to connect the seed reactions. The main problem of the graph-based prediction approach is to differentiate biochemically valid connections from others. Metabolic networks contain hub compounds, which are involved in a large number of reactions, such as ATP, NADPH, H2O or CO2. When a graph algorithm traverses the metabolic network via these hub compounds, the resulting metabolic pathway is often biochemically invalid.<p>In the first step of the thesis, an already existing approach to predict pathways from two seeds was improved. In the previous approach, the metabolic network was weighted to penalize hub compounds and an extensive evaluation was performed, which showed that the weighted network yielded higher prediction accuracies than either a raw or filtered network (where hub compounds are removed). In the improved approach, hub compounds are avoided using reaction-specific side/main compound an- notations from KEGG RPAIR. As an evaluation showed, this approach in combination with weights increases prediction accuracy with respect to the weighted, filtered and raw network.<p>In the second step of the thesis, path finding between two seeds was extended to pathway prediction given multiple seeds. Several multiple-seed pathay prediction approaches were evaluated, namely three Steiner tree solving heuristics and a random-walk based algorithm called kWalks. The evaluation showed that a combination of kWalks with a Steiner tree heuristic applied to a weighted graph yielded the highest prediction accuracy.<p>Finally, the best perfoming algorithm was applied to a microarray data set, which measured gene expression in S. cerevisiae cells growing on 21 different compounds as sole nitrogen source. For 20 nitrogen sources, gene groups were obtained that were significantly over-expressed or suppressed with respect to urea as reference nitrogen source. For each of these 40 gene groups, a metabolic pathway was predicted that represents the part of metabolism up- or down-regulated in the presence of the investigated nitrogen source.<p>The graph-based prediction of pathways is not restricted to metabolic networks. It may be applied to any biological network and to any data set yielding groups of associated genes, enzymes or compounds. Thus, multiple-end pathway prediction can serve to interpret various high-throughput data sets.<br>Doctorat en Sciences<br>info:eu-repo/semantics/nonPublished
APA, Harvard, Vancouver, ISO, and other styles
41

Cheng, Kei Chin. "Analysis of gene expression data in transgenic and non- transgenic soybean cultivars using bioinformatics tools." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=18455.

Full text
Abstract:
Current safety assessment for novel crops, including transgenic crops, uses a targeted approach, which determines crop safeness by assessing the content of a few specific chemical components. However, microarray technology can simultaneously assess the whole transcriptome and can therefore be used to analyze target genes as well as unintended effects. In this study, we used this technique as a non-targeted approach. Gene expression data from a microarray experiment with five soybean cultivars was analyzed using bioinformatics. Two cultivars were transgenic (RoundUp®) and three were non-transgenic. We show that the variation in gene expression between transgenic and non-transgenic soybean is less than that between non-transgenic cultivars. A MySQL database coupled with CGI web interfaces was developed to store and present the results (http://thor.agrenv.mcgill.ca/cgi-bin/soy/soybean.cgi). By integrating the microarray data with gene annotations and other soybean data, a comprehensive view of differences in gene expression can be explored between cultivars.<br>Les méthodes actuelles d'évaluation du risque pour des cultures nouvelles, incluant les cultures transgéniques, utilisent une approche ciblée; elles évaluent le contenu en composés chimiques spécifiques. La technologie des micropuces étant maintenant disponible, il est possible d'évaluer la totalité du transcriptome. Nous avons utilisé cette technologie comme approche non-ciblée. Dans la présente étude, les données d'expériences de micropuces comparant l'expression des gènes de cinq cultivars de soja sont analysées par des méthodes bioinformatiques. Deux de ces cultivars sont des soja transgéniques RoundUp® et trois sont non-transgéniques. Nous montrons que la variation de l'expression des gènes entre soja transgéniques et non-transgéniques est moins grande qu'entre des cultivars non-transgéniques. Une base de données MySQL et une interface web CGI ont été développées pour entreposer et récupérer les données. L'intégration avec d'autres données sur le soja a rendu possible l'exploration de données génétiques globales entre cultivars en terme de fonctions biologiques.
APA, Harvard, Vancouver, ISO, and other styles
42

Khalid, Sabah. "Design, Development And Implementation Of Bioinformatics Tools For The Mining Of Microarray Gene Expression Data." Thesis, Brunel University, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487014.

Full text
Abstract:
The aspiration to understand the molecular complexities of the human body to the highest level in an efficient manner creates unique problems for the scientist in the design and implementation offunctional genomic studies. Furthermore with the involvement many hundreds ofgenes in even the simplest biological processes, how does one begin to identify the genes from the entire genome that playa significant role within a particular biological phenomena. The answer lies with microarray technology providing a resourceful solution enabling the simultaneous isolation ofgenes that may participate in any biological process under investigation, from an entire genome in a single experiment. Although an incredibly powerful technology, with the ability to generate vast amounts ofgene expression data, the technology and generated data is futile without the use of bioinformatics tools data interpretation in a manner that is meaningful to a biologist. As biologists are not experts in computer science, the role of the bioinformatician is not simply the development of novel algorithms but to ful(v integrate them within applications that are user-friendly for a biologist to utilise without the inten'ention ofa computer scientist. In light ofthis, we have designed and developed practical applications for a biologist specific for functional annotation, gene chip fabrication and cross comparison ofmicroarray data. Underlying every microarray experiment is a specific biological question centred on Underlying every microarray experiment is a specific biological question centred on an area ofexpertise. While it is useful to view the functions ofsignificantly differentially expressed genes across multiple un-related disciplines, it is more important to understand gene function within the specialised field for a specific biological question in order to continue further focussed research. In light of this we have generated an application called the multifunctional Immune Ontologiser for the functional classification ofgenes and gene functions more suited for the molecular immunology expert. Due to this specific nature of some microarray experiments initiated to answer highly focussed biological questions, often, available gene chips may not be ofbenefit. In this instance, the use ofcustomised gene chips would be more beneficial. Thus we have provided biologists with a tool to extract biological informationfrom gene-sets to allow the customised creation ofany number ofgene chips. We have exploited this tool to create an oncology- and an immuno-tolerance gene chip for our research purposes. Lastly, with the public availability of several hundred microarray experiments, public repositories contain much hidden biological knowledge that has the potential to be highly valuable ifmined in the correct way. Thus we have developed MaXlab, the first fully functional application for the meta-analysis of biological signatures from Aflymetrix and cDNA microarray studies to gain further insights into related biological phenomena.
APA, Harvard, Vancouver, ISO, and other styles
43

Patel, Hitesh [Verfasser], and Irmgard [Akademischer Betreuer] Merfort. "Use and development of chem-bioinformatics tools and methods for drug discovery and target identification." Freiburg : Universität, 2015. http://d-nb.info/1115495917/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

PIRRO', STEFANO. "HiPPO and Panda: two bioinformatics tools to support analysis of high-dimensional mass cytometry data." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2016. http://hdl.handle.net/2108/201732.

Full text
Abstract:
Biological processes are often modulated by the interaction of different cell types, in a complex network of relations and dependencies. For this reason, biological research aims to both increase the number of cellular features that can be surveyed simultaneously and the resolution at which such observations are possible. High-dimensional mass cytometry is particularly well suited to tracking cells in complex tissues because more than 40 parameters can be monitored at the same time, on hundreds of thousands of cells per sample. Several computational approaches have been proposed to reduce the multidimensionality of the datasets produced by this technology and to cluster events by their multi-dimensional similarity (i.e. SPADE and viSNE). In order to overcome some limitations of the available toolboxes, I developed two new bioinformatics tools named HiPPO (http://moleculargenetics.uniroma2.it/hippo) and PANDA(http://moleculargenetics.uniroma2.it/panda). HiPPO (High-throughput Population Profiler) takes advantage of a supervised quantitation approach to discretize the expression distribution curves generated for each intracellular and surface protein monitored in the experiments. Cells in the continuous, multidimensional dataset are converted into a bi-dimensional matrix where row and columns are events (cells) and markers, respectively. For characterizing cell populations, HiPPO queries PANDA, a manually- curated database which stores expression profiles for selected markers of primary cells. Comparison between PANDA discrete expression profiles with those identified in the populations under study allows to monitor cell type abundance. Moreover, given a set of experiments in different conditions, HiPPO uses the KolmogorovSmirnov non-parametric test to evaluate the variation of protein expression levels, for any identified population. The analysis is conducted interactively, through a user-friendly web application. The robustness and reliability of HiPPO has been tested on a couple of experimental datasets. In the first case, human healthy bone marrow samples (Bendall et al., 2011) have been analyzed and the results compared with SPADE (Qiu et al., 2011), viSNE (Amir et al., 2013) and manual gating performed by the authors. In the second test, I took advantage of the expertise on CyTOF technology in our laboratory and I have analyzed mass cytometry data of skeletal muscle mononuclear cells from healthy and dystrophic (mdx) mice, in order to quantify fibro-adipogenic progenitors (FAPs) and determine changes in population abundance in different conditions. In both cases the HiPPO accuracy in the identification of known cells is higher than that of SPADE and viSNE, when compared to manual gating. Differently from the tools that are currently available HiPPO also offers the capability of matching the antigenic profile of the “quasi-homogeneous” populations that are identified by the cell clustering procedure to profiles of cell populations that have been characterized and described in the literature. For this task Hippo takes advantage of PANDA, a second resource that I have developed during my PhD. PANDA (Population Analysis Database) is the first manually-curated database which aims at capturing the expression profiles of selected markers in primary cells by integrating multiple layers of information in a user-friendly web portal. The curation process is conducted by experts that retrieve and interpret expression data from the literature. At the time of writing, PANDA mainly focuses the curation effort on the immune system and on the cell populations participating in skeletal muscle regeneration. Panda annotates 32 different cell types in the H. sapiens and M. musculus organisms but aims at increasing the amount of curated data, extending curations to other tissues, organs and organisms.
APA, Harvard, Vancouver, ISO, and other styles
45

Pathak, Vaibhav Sanjay. "IDENTIFYING SOMATIC COPY NUMBER ABERRATIONS WITHIN GLIOBLASTOMA MULTIFORME AND LOW GRADE GLIOMAS USING BIOINFORMATICS TOOLS EXCAVATOR AND XHMM." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1481394117039479.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Das, Abhiram. "Computational tools for the analysis of biological networks in plants." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54340.

Full text
Abstract:
This thesis presents research associated to phenotyping of plants by applying informatics techniques which includes databases, web technologies, image processing and feature measurements of 2D and 3D images. The thesis presents two enabling bioinformatics tools related by a shared set of research objectives and distinct by the nature of their applications. The first project called ClearedLeavesDB, is a common platform for plant biologists to share data and metadata about cleared leaf images. This project resulted in an online interactive database of cleared leaf images. The second project called Digital Imaging of Root Traits (DIRT), is an application to store, manage, share and process root images as well as analyze root image traits with respect to different experiments. This application is deployed on iPlant's cyber-infrastructure and currently supports management of 2D root images and high-throughput processing and structural descriptor/trait estimation from root images. The application enables storage, management and sharing heterogeneous image data and metadata including dynamic environmental and descriptor data. In the final part of the thesis, I describe ongoing challenges in developing new methods to measure global and local descriptors from reconstructed 3D root images.
APA, Harvard, Vancouver, ISO, and other styles
47

Katz, Lee Scott. "Computational tools for molecular epidemiology and computational genomics of Neisseria meningitidis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/42934.

Full text
Abstract:
Neisseria meningitidis is a gram negative, and sometimes encapsulated, diplococcus that causes devastating disease worldwide. For the worldwide genetic surveillance of N. meningitidis, the gold standard for profiling the bacterium uses genetic loci found around the genome. Unfortunately, the software for analyzing the data for these profiles is difficult to use for a variety of reasons. This thesis shows my suite of tools called the Meningococcus Genome Informatics Platform for the analysis of these profiling data. To better understand N. meningitidis, the CDC Meningitis Laboratory and other world class laboratories have adopted a whole genome approach. To facilitate this approach, I have developed a computational genomics assembly and annotation pipeline called the CG-Pipeline. It assembles a genome, predicts locations of various features, and then annotates those features. Next, I developed a comparative genomics browser and database called NBase. Using CG-Pipeline and NBase, I addressed two open questions in N. meningitidis research. First, there are N. meningitidis isolates that cause disease but many that do not cause disease. What is the genomic basis of disease associated versus asymptomatically carried isolates of N. meningitidis? Second, some isolates' capsule type cannot be easily determined. Since isolates are grouped into one of many serogroups based on this capsule, which aids in epidemiological studies and public health response to N. meningitidis, often an isolate cannot be grouped. Thus the question is what is the genomic basis of nongroupability? This thesis addresses both of these questions on a whole genome level.
APA, Harvard, Vancouver, ISO, and other styles
48

Torabi, Moghadam Behrooz. "Computational discovery of DNA methylation patterns as biomarkers of ageing, cancer, and mental disorders : Algorithms and Tools." Doctoral thesis, Uppsala universitet, Institutionen för cell- och molekylärbiologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-320720.

Full text
Abstract:
Epigenetics refers to the mitotically heritable modifications in gene expression without a change in the genetic code. A combination of molecular, chemical and environmental factors constituting the epigenome is involved, together with the genome, in setting up the unique functionality of each cell type. DNA methylation is the most studied epigenetic mark in mammals, where a methyl group is added to the cytosine in a cytosine-phosphate-guanine dinucleotides or a CpG site. It has been shown to have a major role in various biological phenomena such as chromosome X inactivation, regulation of gene expression, cell differentiation, genomic imprinting. Furthermore, aberrant patterns of DNA methylation have been observed in various diseases including cancer. In this thesis, we have utilized machine learning methods and developed new methods and tools to analyze DNA methylation patterns as a biomarker of ageing, cancer subtyping and mental disorders. In Paper I, we introduced a pipeline of Monte Carlo Feature Selection and rule-base modeling using ROSETTA in order to identify combinations of CpG sites that classify samples in different age intervals based on the DNA methylation levels. The combination of genes that showed up to be acting together, motivated us to develop an interactive pathway browser, named PiiL, to check the methylation status of multiple genes in a pathway. The tool enhances detecting differential patterns of DNA methylation and/or gene expression by quickly assessing large data sets. In Paper III, we developed a novel unsupervised clustering method, methylSaguaro, for analyzing various types of cancers, to detect cancer subtypes based on their DNA methylation patterns. Using this method we confirmed the previously reported findings that challenge the histological grouping of the patients, and proposed new subtypes based on DNA methylation patterns. In Paper IV, we investigated the DNA methylation patterns in a cohort of schizophrenic and healthy samples, using all the methods that were introduced and developed in the first three papers.
APA, Harvard, Vancouver, ISO, and other styles
49

Federation, Alexander Joel. "The Development of Chemical and Computational Tools to Study Transcriptional Regulation in Cancer." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:17463980.

Full text
Abstract:
Eukaryotic gene regulation is a complex process requiring the action of many multicomponent complexes in the cell. Specific inhibitors of chromatin-associated factors allow the functional study of protein domains without genetic removal of the entire protein. Here, two small molecule probes were used to study the role of DOT1L and BET proteins in cancer biology. DOT1L is a histone methyltransferase with activity correlating with positive regulation of transcription. In MLL-rearranged leukemia, DOT1L is recruited aberrantly to early developmental transcription factors, leading to their inappropriate expression and leukemia maintenance. The development of an assay platform for DOT1L allowed the investigation of many small molecule DOT1L inhibitors, leading to compounds with improved potency and pharmacokinetics. Studying the action of BET bromodomain inhibitors led to the identification of super enhancers, large tissue-specific regulatory elements driving the expression of genes critical for the function of the cell. Super enhancers are often found in oncogenic translocation events, especially in B cell malignancies. This study identified a subset of super enhancers that promote off-target DNA damage from the B cell antibody diversity enzyme AID, leading to double strand break events and translocations. Super enhancers also regulate the expression of master transcription factors (TFs) in a given cell type. Using the topology of the super enhancer, the sites of master TF binding can be predicted, allowing the construction of network models for transcriptional regulation. These models were built in a large number of healthy and diseased cell types, including the pediatric malignancy medulloblastoma. In medulloblastoma, a network motif was identified that matches an expression pattern seen in a transient cell population in the developing cerebellum, providing evidence for the previously unknown cell of origin for Group 4 medulloblastoma.<br>Chemical Biology
APA, Harvard, Vancouver, ISO, and other styles
50

Goldstein, Theodore C. "Tools for extracting actionable medical knowledge from genomic big data." Thesis, University of California, Santa Cruz, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3589324.

Full text
Abstract:
<p> Cancer is an ideal target for personal genomics-based medicine that uses high-throughput genome assays such as DNA sequencing, RNA sequencing, and expression analysis (collectively called <i>omics</i>); however, researchers and physicians are overwhelmed by the quantities of big data from these assays and cannot interpret this information accurately without specialized tools. To address this problem, I have created software methods and tools called <i>OCCAM</i> (OmiC&nbsp;data Cancer Analytic Model) and DIPSC (Differential Pathway Signature Correlation) for automatically extracting knowledge from this data and turning it into an actionable knowledge base called the <i>activitome.</i> An activitome signature measures a mutation's effect on the cellular molecular pathway. As well, activitome signatures can also be computed for clinical phenotypes. By comparing the vectors of activitome signatures of different mutations and clinical outcomes, intrinsic relationships between these events may be uncovered. OCCAM identifies activitome signatures that can be used to guide the development and application of therapies. DIPSC overcomes the confounding problem of correlating multiple activitome signatures from the same set of samples. In addition, to support the collection of this big data, I have developed <i>MedBook,</i> a federated distributed social network designed for a medical research and decision support system. OCCAM and DIPSC are two of the many apps that will operate inside of MedBook. MedBook extends the Galaxy system with a signature database, an end-user oriented application platform, a rich data medical knowledge-publishing model, and the Biomedical Evidence Graph (BMEG). The goal of MedBook is to improve the outcomes by learning from every patient.</p>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography