To see the other types of publications on this topic, follow the link: Comparative bioinformatics.

Dissertations / Theses on the topic 'Comparative bioinformatics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Comparative bioinformatics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Chatzou, Maria 1985. "Large-scale comparative bioinformatics analyses." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/587086.

Full text
Abstract:
One of the main and most recent challenges of modern biology is to keep-up with growing amount of biological data coming from next generation sequencing technologies. Keeping up with the growing volumes of experiments will be the only way to make sense of the data and extract actionable biological insights. Large-scale comparative bioinformatics analyses are an integral part of this procedure. When doing comparative bioinformatics, multiple sequence alignments (MSAs) are by far the most widely used models as they provide a unique insight into the accurate measure of sequence similarities and are therefore instrumental to revealing genetic and/or functional relationships among evolutionarily related species. Unfortunately, the well-established limitation of MSA methods when dealing with very large datasets potentially compromises all downstream analysis. In this thesis I expose the current relevance of multiple sequence aligners, I show how their current scaling up is leading to serious numerical stability issues and how they impact phylogenetic tree reconstruction. For this purpose, I have developed two new methods, MEGA-Coffee, a large scale aligner and Shootstrap a novel bootstrapping measure incorporating MSA instability with branch support estimates when computing trees. The large amount of computation required by these two projects was carried using Nextflow, a new computational framework that I have developed to improve computational efficiency and reproducibility of large-scale analyses like the one carried out in the context of these studies.<br>Uno de los principales y más recientes retos de la biología moderna es poder hacer frente a la creciente cantidad de datos biológicos procedentes de las tecnologías de secuenciación de alto rendimiento. Mantenerse al día con los crecientes volúmenes de datos experimentales es el único modo de poder interpretar estos datos y extraer conclusiones biológicos relevantes. Los análisis bioinformáticos comparativos a gran escala son una parte integral de este procedimiento. Al hacer bioinformática comparativa, los alineamientos múltiple de secuencias (MSA) son con mucho los modelos más utilizados, ya que proporcionan una visión única de la medida exacta de similitudes de secuencia y son, por tanto, fundamentales para inferir las relaciones genéticas y / o funcionales entre las especies evolutivamente relacionadas. Desafortunadamente, la conocida limitación de los métodos MSA para analizar grandes bases de datos, puede potencialmente comprometer todos los análisis realizados a continuación. En esta tesis expongo la relevancia actual de los métodos de alineamientos multiples de secuencia, muestro cómo su uso en datos masivos está dando lugar a serios problemas de estabilidad numérica y su impacto en la reconstrucción del árbol filogenético. Para este propósito, he desarrollado dos nuevos métodos, MEGA-café, un alineador de gran escala y Shootstrap una nueva medida de bootstrapping que incorpora la inestabilidad del MSA con las estimaciones de apoyo de rama en el cálculo de árboles filogéneticos. La gran cantidad de cálculo requerido por estos dos proyectos se realizó utilizando Nextflow, un nuevo marco computacional que se ha desarrollado para mejorar la eficiencia computacional y la reproducibilidad del análisis a gran escala como la que se lleva a cabo en el contexto de estos estudios.
APA, Harvard, Vancouver, ISO, and other styles
2

Åkerborg, Örjan. "Taking advantage of phylogenetic trees in comparative genomics." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4757.

Full text
Abstract:
Phylogenomics can be regarded as evolution and genomics in co-operation. Various kinds of evolutionary studies, gene family analysis among them, demand access to genome-scale datasets. But it is also clear that many genomics studies, such as assignment of gene function, are much improved by evolutionary analysis. The work leading to this thesis is a contribution to the phylogenomics field. We have used phylogenetic relationships between species in genome-scale searches for two intriguing genomic features, namely and A-to-I RNA editing. In the first case we used pairwise species comparisons, specifically human-mouse and human-chimpanzee, to infer existence of functional mammalian pseudogenes. In the second case we profited upon later years' rapid growth of the number of sequenced genomes, and used 17-species multiple sequence alignments. In both these studies we have used non-genomic data, gene expression data and synteny relations among these, to verify predictions. In the A-to-I editing project we used 454 sequencing for experimental verification. We have further contributed a maximum a posteriori (MAP) method for fast and accurate dating analysis of speciations and other evolutionary events. This work follows recent years' trend of leaving the strict molecular clock when performing phylogenetic inference. We discretised the time interval from the leaves to the root in the tree, and used a dynamic programming (DP) algorithm to optimally factorise branch lengths into substitution rates and divergence times. We analysed two biological datasets and compared our results with recent MCMC-based methodologies. The dating point estimates that our method delivers were found to be of high quality while the gain in speed was dramatic. Finally we applied the DP strategy in a new setting. This time we used a grid laid out on a species tree instead of on an interval. The discretisation gives together with speciation times a common timeframe for a gene tree and the corresponding species tree. This is the key to integration of the sequence evolution process and the gene evolution process. Out of several potential application areas we chose gene tree reconstruction. We performed genome-wide analysis of yeast gene families and found that our methodology performs very well.<br>QC 20100923
APA, Harvard, Vancouver, ISO, and other styles
3

Zheng, Chunfang. "Genome rearrangement algorithms applied to comparative maps." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/27313.

Full text
Abstract:
The Hannenhalli-Pevzner algorithm for computing the evolutionary distance between two genomes is very efficient when the genomes are signed and totally ordered. But in real comparative maps, the data suffer from problems such as coarseness, missing data, no signs, paralogy, order conflicts and mapping noise. In this thesis we have developed a suite of algorithms for genome rearrangement analysis in the presence of noise and incomplete information. For coarseness and missing data, we represent each chromosome as a partial order, summarized by a directed acyclic graph (DAG). We augment each DAG to a directed graph (DG) in which all possible linearizations are embedded. The chromosomal DGs representing two genomes are combined to produce a single bicoloured graph. The major contribution of the thesis is an algorithm for extracting a maximal decomposition of some subgraph into alternating coloured cycles, determining an optimal sequence of rearrangements, and hence the genomic distance. Also based on this framework, we have proposed an algorithm to solve all the above problems of comparative maps simultaneously by adding heuristic preprocessing to the exact algorithm approach. We have applied this to the comparison of maize and sorghum genomic maps on the GRAMENE database. A further contribution treats the inflation of genome distance by high levels of noise due to incorrectly resolved paralogy and error at the mapping, sequencing and alignment levels. We have developed an algorithm to remove the noise by maximizing strips and tested its robustness as noise levels increase.
APA, Harvard, Vancouver, ISO, and other styles
4

Thelander, Tilia. "Optimisation of ForenSeq STR data analysis with FDSTools and comparative analysis with UAS." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20053.

Full text
Abstract:
DNA profiling with short tandem repeat data generated with massively parallel sequencing is associated with several challenges. FDSTools is an open-source software which applies correction models based on a reference database to correct DNA profiles. The correction models aim to provide an accurate representation of the true DNA profile and associated artefacts. Low analytical thresholds in FDSTools are suggested to improve detection of minor profiles in complex mixtures. The objective was to optimise FDSTools analysis for ForenSeq data, and to establish a Swedish reference database. The FDSTools analysis was subsequently compared to default analysis with the commercial Universal Analysis Software, and the likelihood ratio was evaluated. The FDSTools Library file was adapted for ForenSeq data. FASTQ files from single- and mixed-source samples were analysed with the software. The concordance between the software was assessed, and analytical thresholds in FDSTools were optimised. Likelihood ratios were calculated for sequencing- and capillary electrophoresis data to investigate the benefit of sequence level information. A reference database and correction models could not be generated, meaning that uncorrected data was used. The two software showed a 98.5% concordance. Disconcordance was caused by allele drop-out in heterozygous loci which implicated that certain markers may require individual interpretation. Lowering the analytical thresholds in FDSTools appeared to improve mixture deconvolution, but the lack of correction models obscured interpretation. Hence, without correction models optimial analytical thresholds could not be defined. Likelihood ratio based on sequencing data was not consistently higher compared to capillary electrophoresis, suggesting that sequence information is not always advantageous.
APA, Harvard, Vancouver, ISO, and other styles
5

Walter, Klaudia. "Statistical methods for comparative genomics in the field of bioinformatics." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Johnson, Sarah. "Comparative Resistomics of Ancient and Modern Human Microbiomes." Thesis, University of North Texas, 2020. https://digital.library.unt.edu/ark:/67531/metadc1707269/.

Full text
Abstract:
Increased exposure to antibiotics has led to the dissemination of genes conferring resistance to antimicrobial metabolites throughout human microbiomes globally via horizontal gene transfer (HGT). This has resulted in the emergence of new resistant strains leading to a rising epidemic of deaths from previously treatable infections. Evidence suggests that before the age of anthropogenic antibiotic use, microbes living within a community produced antibiotic metabolites and, subsequently, maintained such genes for several useful functions and a balance of diversity in nature. The question of the origin of these resistant genes is difficult to answer, but with continued advancements in ancient genomic analysis, researchers have developed methods of acquiring a more accurate representation of the microbiome associated with our human ancestors by extracting fossilized microbial specimens from dental calculus and directly sequencing the metagenomes. This thesis outlines the production of taxonomic and functional profiles of 20 different human and non-human oral microbiome samples using metagenomics tools originally developed for living individuals, altered for use with ancient microbial specimens. Putative antimicrobial resistant (AMR) genes derived from these profiles were reconstructed and conserved functional regions were identified. From the data that is available regarding the human microbiome from a range of time points throughout history dating back to Neanderthal specimens, it is possible to elucidate relationships between these AMR genes and to better understand the evolutionary trajectory of antibiotic resistance.
APA, Harvard, Vancouver, ISO, and other styles
7

Mthombeni, Jabulani S. "A comparative bioinformatic analysis of zinc binuclear cluster proteins." Thesis, Rhodes University, 2005. http://hdl.handle.net/10962/d1004064.

Full text
Abstract:
Members of the zinc binuclear cluster family are important fungal transcriptional regulators sharing a common DNA binding domain. Da181p is a pleotropic zinc binuclear cluster protein involved in the induction of the UGA genes required for the γ-aminobutyrate nitrogen catabolic pathway in Saccharomyces cerevisiae. The zinc binuclear cluster domain is indispensable for function in Da181p and little is known about other domains in this protein. The aim of the study was to explore the zinc binuclear cluster protein family using comparative bioinformatics as a complement to biochemical and structural approaches. A database of all zinc binuclear cluster proteins was composed. A total of 118 zinc binuclear proteins are reported in this work. Thirty nine previously unidentified zinc binuclear cluster proteins were found. Four homologues of Da181p were identified by homology searching. Important sequence motifs were identified in the aligned sequences of Da181p and its homologues. The coiled coil motif found in the Ga14p zinc binuclear cluster protein could not be identified in Da181p and its homologues. This suggested that Da181p did not dimerise through this structural motif as other zinc binuclear cluster proteins. Solvent accessible site that could be phosphorylated by protein kinase C or casein kinase II and the role of such sites in the possible regulation of Da181p function were discussed.
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Yang. "Understanding lineage-specific biology through comparative genomics." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:23398cc7-8bbe-4f5a-8cd9-1104591400cc.

Full text
Abstract:
A major challenge in biology is to identify how different species arose and acquired distinct phenotypic traits. High-throughput sequencing is transforming our understanding of biology by allowing us to study genomes and cellular processes at genome-wide levels. Only a decade subsequent to the publication of the first human genome draft, genome assemblies of hundreds of organisms have been produced. Yet, genome analysis remains challenging and advances have lagged far behind our sequencing abilities and other technological advances. The next generation of comparative genomicists must therefore understand, invent and apply a wide number of computational tools in order to study biology in the most efficient manner and in order to pose the most interesting questions. This thesis spans areas covering evolutionary genomics, gene regulation, and computational methods development. A major aim was to understand how genetic variation contributes to variation in phenotypic traits. This was approached using a large variety of evolutionary and comparative genomics tools. In particular, high-throughput sequencing datasets were analysed to study single-cell transcriptomics, gene duplications, gene architecture evolution, and alternative splicing. Additionally, in cases where off-the-shelf analysis tools were inexistent, novel pipelines and programs were designed and implemented to solve algorithmic problems such as scaffolding genome assemblies and short-read mapping onto small exons.
APA, Harvard, Vancouver, ISO, and other styles
9

Mostowy, Serge. "Comparative genomics of the Mycobacterium tuberculosis complex." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=111834.

Full text
Abstract:
The study of microbial evolution has been recently accelerated by the advent of comparative genomics, an approach enabling investigation of organisms at the whole-genome level. Tools of comparative genomics, including the DNA microarray, have been applied in bacterial genomes towards studying heterogeneity in DNA content, and to monitor global gene expression. When focused upon the study of microbial pathogens, genome analysis has provided unprecedented insight into their evolution, virulence, and host adaptation. Contributing towards this, I herein explore the evolutionary change affecting genomes of the Mycobacterium tuberculosis complex (MTC), a group of closely related bacterial organisms responsible for causing tuberculosis (TB) across a diverse range of mammals. Despite the introduction nearly a century ago of BCG, a family of live attenuated vaccines intentioned on preventing human TB, the uncertainty surrounding its usefulness is punctuated by the reality that TB continues to be responsible for claiming over 2 million lives per year. As pursued throughout this thesis, a precise understanding of the differences in genomic content among the MTC, and its impact on gene expression and biological function, promises to expose underlying mechanisms of TB pathogenesis, and suggest rational approaches towards the design of improved diagnostics and vaccines to prevent disease.<br>With the availability of whole-genome sequence data and tools of comparative genomics, our publications have advanced the recognition that large sequence polymorphisms (LSPs) deleted from Mycobacterium tuberculosis, the causative agent of TB in humans, serve as accurate markers for molecular epidemiologic assessment and phylogenetic analysis. These LSPs have proven informative both for the types of genes that vary between strains, and for the molecular signatures that characterize different MTC members. Genomic analysis of atypical MTC has revealed their diversity and adaptability, illuminating previously unexpected directions of MTC evolution. As demonstrated from parallel analysis of BCG vaccines, a phylogenetic stratification of genotypes offers a predictive framework upon which to base future genetic and phenotypic studies of the MTC. Overall, the work presented in this thesis has provided unique insights and lessons having direct clinical relevance towards understanding TB pathogenesis and BCG vaccination.
APA, Harvard, Vancouver, ISO, and other styles
10

Page, Justin Thomas. "Bioinformatics for the Comparative Genomic Analysis of the Cotton (Gossypium) Polyploid Complex." BYU ScholarsArchive, 2015. https://scholarsarchive.byu.edu/etd/5557.

Full text
Abstract:
Understanding the composition, evolution, and function of the cotton (Gossypium) genome is complicated by the joint presence of two genomes in its nucleus (AT and DT genomes). Specifically, read-mapping (a fundamental part of next-generation sequence analysis) cannot adequately differentiate reads as belonging to one genome or the other. These two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we developed PolyCat to categorize reads according to their genome of origin based on homoeo-SNPs that differentiate the two genomes. We re-sequenced the genomes of extant diploid relatives of tetraploid cotton that contain the A1 (Gossypium herbaceum), A2 (Gossypium arboreum), or D5 (Gossypium raimondii) genomes. We identified 24 million SNPs between the A-diploid and D-diploid genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the A-genomes and D-genomes at all detected polymorphic loci. This index can be used by PolyCat to assign reads from an allotetraploid to its genome-of-origin. Continued characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agronomic production of cotton. We re-sequenced 34 allotetraploid cotton lines, representing all 7 tetraploid cotton species. The analysis of these genomes-using PolyCat and PolyDog-provides us with the beginnings of a HapMap-like resource for cotton species, including indices of both homoeo-SNPs and allele-SNPs. With this information, we explore the phylogenetic relationships among cotton species, including the newly characterized species G. ekmanianum and G. stephensii. We examine gene conversion both recent and ancient, discovering that recent gene conversion is extremely rare, and ancient gene conversion is far less extensive than previously believed, with many previously identified conversion events being more probably due to autapamorphic SNPs in the descent of diploid relatives. In order to carry out these experiments, many tools for next-generation sequence analysis were developed. These tools, along with PolyCat and PolyDog, comprise the tool suite BamBam.
APA, Harvard, Vancouver, ISO, and other styles
11

Mbah-Mbole, Georgia Fru. "Comparative study of topology based pathway enrichment analysis methods for cardiac hypertrophy from a stem cell model using : ToPASeq and EnrichmentBrowser packages." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19465.

Full text
Abstract:
Pathway enrichment analysis is an approach extensively used when analyzing high throughput data to identify pathways enriched within a group of differentially expressed genes. Furthermore, different methods utilizing the topology of the pathway offer a unique way of analyzing and interpreting gene expression data. These methods usually offer pathway topologies with a limited number of methods and visualization of results. Also, the use of different methods individually and comparison of their results can be very cumbersome, time-consuming and prone to errors due to the need for repeated data conversion and transfer. Packages that offer a common interface to multiple methods are therefore necessary, to provide a uniform way of calling these methods or specifying parameters, and making simultaneous application of the methods easier. In this study topology-based pathway enrichment analysis was performed by using the R packages EnrichmentBrowser and ToPASeq on a time series RNA-Seq data for cardiac hypertrophy in order to compare their usability. Additionally, different topology-based enrichment analysis methods included in the packages were compared with a non-topology-based pathway enrichment analysis method as well as the combination of their results in order to assess biological insights. Regarding usability, the available instructions for how to use both EnrichmentBrowser and ToPASeq were easy to understand and apply in the R workspace. Furthermore, both packages were easy to install and adjust to various parameters. However, ToPASeq returned errors when some parameters other than the default ones were used. Also, one of the differences between the tools was the flexibility of options for visualization and interpretation of the results, where EnrichmentBrowser had clear advantages. Regarding biological insights, the methods SPIA and DEGraph produced significant pathways linked to the phenotype cardiac hypertrophy, with a clear advantage for SPIA that performed well in both tested data setups. Finally, combining results from both SPIA and GSEA (non-topology-based pathway enrichment analysis method) improved individual ranking by increasing confidence in specific target pathways and eliminating irrelevant pathways.
APA, Harvard, Vancouver, ISO, and other styles
12

Belgard, Tildon Grant. "Comparative neurotranscriptomics in mammals and birds." Thesis, University of Oxford, 2011. http://ora.ox.ac.uk/objects/uuid:932c796c-d219-4df3-85cc-7d9db19d7d6b.

Full text
Abstract:
In this thesis I apply new sequencing technologies and analytical methods derived from genomics and computer science to the neuroanatomy of gene expression. The first project explores characteristics of gene expression across adult neocortical layers in a representative mammal – the mouse. Amongst the thousands of genes and transcripts differentially expressed across layers, I found common functional characteristics of genes that define certain layers, candidate cases of isoform switching, and over a thousand apparent long intergenic non-coding RNA transcripts. The second project compares patterns of gene expression in the structurally diverged adult derivatives of the pallium in mice and chickens. Overall, gene expression levels were moderately correlated between the two species. While expression patterns of ‘marker’ genes were only poorly conserved in these regions, there nevertheless was significant conservation of cross-species marker genes for homologous structures, cell types and functionally analogous regions. Many aspects of these data from both projects can now be easily browsed and searched from custom-built web interfaces. In addition to generating unprecedented genome-wide resources for the neuroscience community to explore the functional and structural dimensions of gene expression amongst different pallial regions in mammals and birds, this work also provides new insights into the widespread evolutionary shuffling of adult marker gene expression.
APA, Harvard, Vancouver, ISO, and other styles
13

Jentzsch, Iris Miriam Vargas. "Comparative genomics of microsatellite abundance: a critical analysis of methods and definitions." Thesis, University of Canterbury. Biological Sciences, 2009. http://hdl.handle.net/10092/4282.

Full text
Abstract:
This PhD dissertation is focused on short tandemly repeated nucleotide patterns which occur extremely often across DNA sequences, called microsatellites. The main characteristic of microsatellites, and probably the reason why they are so abundant across genomes, is the extremely high frequency of specific replication errors occurring within their sequences, which usually cause addition or deletion of one or more complete tandem repeat units. Due to these errors, frequent fluctuations in the number of repetitive units can be observed among cellular and organismal generations. The molecular mechanisms as well as the consequences of these microsatellite mutations, both, on a generational as well as on an evolutionary scale, have sparked debate and controversy among the scientific community. Furthermore, the bioinformatic approaches used to study microsatellites and the ways microsatellites are referred to in the general literature are often not rigurous, leading to misinterpretations and inconsistencies among studies. As an introduction to this complex topic, in Chapter I I present a review of the knowledge accumulated on microsatellites during the past two decades. A major part of this chapter has been published in the Encyclopedia of Life Sciences in a Chapter about microsatellite evolution (see Publication 1 in Appendix II). The ongoing controversy about the rates and patterns of microsatellite mutation was evident to me since before starting this PhD thesis. However, the subtler problems inherent to the computational analyses of microsatellites within genomes only became apparent when retrieving information on microsatellite distribution and abundance for the design of comparative genomic analyses. There are numerous publications analyzing the microsatellite content of genomes but, in most cases, the results presented can neither be reliably compared nor reproduced, mainly due to the lack of details on the microsatellite search process (particularly the program’s algorithm and the search parameters used) and because the results are expressed in terms that are relative to the search process (i.e. measures based on the absolute number of microsatellites). Therefore, in Chapter II I present a critical review of all available software tools designed to scan DNA sequences for microsatellites. My aim in undertaking this review was to assess the comparability of search results among microsatellite programs, and to identify the programs most suitable for the generation of microsatellite datasets for a thorough and reproducible comparative analysis of microsatellite content among genomic sequences. Using sequence data where the number and types of microsatellites were empirical know I compared the ability of 19 programs to accurately identify and report microsatellites. I then chose the two programs which, based on the algorithm and its parameters as well as the output informativity, offered the information most suitable for biological interpretation, while also reflecting as close as possible the microsatellite content of the test files. From the analysis of microsatellite search results generated by the various programs available, it became apparent that the program’s search parameters, which are specified by the user in order to define the microsatellite characteristics to the program, influence dramatically the resulting datasets. This is especially true for programs suited to allow imperfections within tandem repeats, because imperfect repetitions can not be defined accurately as is the case for perfect ones, and because several different algorithms have been proposed to address this problem. The detection of approximate microsatellites is, however, essential for the study of microsatellite evolution and for comparative analyses based on microsatellites. It is now well accepted that small deviations from perfect tandem repeat structure are common within microsatellites and larger repeats, and a number of different algorithms have been developed to confront the challenge of finding and registering microsatellites with all expectable kinds of imperfection. However, biologists have still to apply these tools to their full potential. In biological analyses single tandem repeat hits are consistently interpreted as isolated and independent repeats. This interpretation also depends on the search strategy used to report the microsatellites in DNA sequences and, therefore, I was particularly interested in the capacity of repeat finding programs to report imperfect microsatellites allowing interpretations that are useful in a biological sense. After analzying a series of tandem repeat finding programs I optimized my microsatellite searches to yield the best possible datasets for assessing and comparing the degree of imperfection of microsatellites among different genomes (Chapter III) During the program comparisons performed in Chapter II, I show that the most critical search parameter influencing microsatellite search results is the minimum length threshold. Biologically speaking, there is no consensus with respect to the minimum length, beyond which a short tandem repeat is expected to become prone to microsatellite-like mutations. Usually, a single absolute value of ~12 nucleotides is assigned irrespective of motif length.. In other cases thresholds are assigned in terms of number of repeat units (i.e. 3 to 5 repeats or more), which are better applied individually for each motif. The variation in these thresholds is considerable and not always justifiable. In addition, any current minimum length measures are likely naïve because it is clear that different microsatellite motifs undergo replication slippage at different length thresholds. Therefore, in Chapter III, I apply two probabilistic models to predict the minimum length at which microsatellites of varying motif types become overrepresented in different genomes based on the individual oligonucleotide frequency data of these genomes. Finally, after a range of optimizations and critical analyses, I performed a preliminary analysis of microsatellite abundance among 24 high quality complete eukaryotic genomes, including also 8 prokaryotic and 5 archaeal genomes for contrast. The availability of the methodologies and the microsatellite datasets generated in this project will allow informed formulation of questions for more specific genome research, either about microsatellites, or about other genomic features microsatellites could influence. These datasets are what I would have needed at the beginning of my PhD to support my experimental design, and are essential for the adequate data interpretation of microsatellite data in the context of the major evolutionary units; chromosomes and genomes.
APA, Harvard, Vancouver, ISO, and other styles
14

Epamino, George Willian Condomitti. "Alinhamento múltiplo de genomas de eucariotos com montagens altamente fragmentadas." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-31102017-102826/.

Full text
Abstract:
O advento do sequenciamento de nova geração (NGS - Next Generation Sequencing) nos últimos anos proporcionou um aumento expressivo no número de projetos genômicos. De maneira simplificada, as máquinas sequenciadoras geram como resultado fragmentos de DNA que são utilizados por programas montadores de genoma. Esses programas tentam juntar os fragmentos de DNA de modo a obter a representação completa da sequência genômica (por exemplo um cromossomo) da espécie sendo sequenciada. Em alguns casos o processo de montagem pode ser executado com maior facilidade para organismos com genomas de tamanhos pequenos (por exemplo bactérias com genoma em torno de 5Mpb), através de pipelines que automatizam a maior parte da tarefa. Um cenário mais complicado surge quando a espécie possui genoma com grande comprimento (acima de 1Gpb) e elementos repetidos, como no caso de alguns eucariotos. Nesses casos o resultado da montagem é geralmente composto por milhares de fragmentos (chamados de contigs), uma ordem de magnitude muito superior ao número de cromossomos estimado para um organismo (comumente da ordem de dois dígitos), dando origem a uma montagem altamente fragmentada. Uma atividade comum nesses projetos é a comparação da montagem com a de outro genoma como forma de validação e também para identificação de regiões conservadas entre os organismos. Embora o problema de alinhamento par-a-par de genomas grandes seja bem contornado por abordagens existentes, o alinhamento múltiplo (AM) de genomas grandes em estado fragmentado ainda é uma tarefa de difícil resolução, por demandar alto custo computacional e grande quantidade de tempo. Este trabalho consiste em uma metologia para fazer alinhamento múltiplo de genomas grandes de eucariotos com montagens altamente fragmentadas. Nossa implementação, baseada em alinhamento estrela, se mostrou capaz de fazer AM de grupos de montagens com diversos níveis de fragmentação. O maior deles, um conjunto de 5 genomas de répteis, levou 14 horas de processamento para fornecer um mapa de regiões conservadas entre as espécies. O algoritmo foi implementado em um software que batizamos de FROG (FRagment Overlap multiple Genome alignment), de código aberto e disponível sob licença GPLv3.<br>The advent of Next Generation Sequencing (NGS) in recent years has led to an expressive increase in the number of genomic projects. In a simplified way, sequencing machines generate DNA fragments that are used by genome assembler software. These programs try to merge the DNA fragments to obtain the complete representation of the genomic sequence (for example a chromosome) of the species being sequenced. In some cases the assembling process can be performed more easily for organisms with small-sized genomes (e.g. bacteria with a genome length of approximately 5Mpb) through pipelines that automate most of the task. A trickier scenario arises when the species has a very large genome (above 1Gbp) and complex elements, as in the case of some eukaryotes. In those cases the result of the assembly is usually composed of thousands of fragments (called contigs), an order of magnitude much higher than the number of chromosomes estimated for an organism (usually in the order two digits), giving rise to a highly fragmented assembly. A common activity in these projects is the comparison of the assembly with that of another genome as a form of validation and also to identify common elements between organisms. Although the problem of pairwise alignment of large genomes is well circumvented by existing approaches, multiple alignment of large genomes with highly fragmented assemblies remains a difficult task due to its time and computational requirements. This work consists of a methodology for doing multiple alignment of large eukaryotic genomes with highly fragmented assemblies, a problem that few solutions are able to cope with. Our star alignment-based implementation, was able to accomplish a MSA of groups of assemblies with different levels of fragmentation. The largest of them, a set of 5 reptilian genomes where the B. jararaca assembly (800,000 contigs, N50 of 3.1Kbp) was used as anchor, took 14 hours of execution time to provide a map of conserved regions among the participating species. The algorithm was implemented in a software named FROG (FRagment Overlap multiple Genome alignment), available under the General Public License v3 (GPLv3) terms.
APA, Harvard, Vancouver, ISO, and other styles
15

Arvidsson, Staffan. "Actors and higher order functions : A Comparative Study of Parallel Programming Language Support for Bioinformatics." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-242739.

Full text
Abstract:
Parallel programming can sometimes be a tedious task when dealing with problems like race conditions and synchronization. Functional programming can greatly reduce the complexity of parallelization by removing side effects and variables, eliminating the need for locks and synchronization. This thesis assesses the applicability of functional programming and the actor model using the field of bioinformatics as a case study, focusing on genome assembly. Functional programming is found to provide parallelization at a high abstraction level in some cases, but in most of the program there is no way to provide parallelization without adding synchronization and non-pure functional code. The actor model facilitate parallelization of a greater part of the program but increases the program complexity due to communication and synchronization between actors. Neither of the approaches gave efficient speedup due to the characteristics of the algorithm that was implemented, which proved to be memory bound. A shared memory parallelization thus showed to be inefficient and that a need for distributed implementations are needed for achieving speedup for genome assemblers
APA, Harvard, Vancouver, ISO, and other styles
16

Peterson, Mark Erik. "Evolutionary constraints on the structural similarity of proteins and applications to comparative protein structure modeling." Diss., Search in ProQuest Dissertations & Theses. UC Only, 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3339202.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Benoit, Gaëtan. "Métagénomique comparative de novo à grande échelle." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S088/document.

Full text
Abstract:
La métagénomique comparative est dite de novo lorsque les échantillons sont comparés sans connaissances a priori. La similarité est alors estimée en comptant le nombre de séquences d’ADN similaires entre les jeux de données. Un projet métagénomique génère typiquement des centaines de jeux de données. Chaque jeu contient des dizaines de millions de courtes séquences d’ADN de 100 à 200 nucléotides (appelées lectures). Dans le contexte du début de cette thèse, il aurait fallu des années pour comparer une telle masse de données avec les méthodes usuelles. Cette thèse présente des approches de novo pour calculer très rapidement la similarité entre de nombreux jeux de données. Les travaux que nous proposons se basent sur le k-mer (mot de taille k) comme unité de comparaison des métagénomes. La méthode principale développée pendant cette thèse, nommée Simka, calcule de nombreuses mesures de similarité en remplacement les comptages d’espèces classiquement utilisés par des comptages de grands k-mers (k &gt; 21). Simka passe à l’échelle sur les projets métagénomiques actuels grâce à un nouvelle stratégie pour compter les k-mers de nombreux jeux de données en parallèle. Les expériences sur les données du projet Human Microbiome Projet et Tara Oceans montrent que les similarités calculées par Simka sont bien corrélées avec les similarités basées sur des comptages d’espèces ou d’OTUs. Simka a traité ces projets (plus de 30 milliards de lectures réparties dans des centaines de jeux) en quelques heures. C’est actuellement le seul outil à passer à l’échelle sur une telle quantité de données, tout en étant complet du point de vue des résultats de comparaisons<br>Metagenomics studies the genomic content of a sample extracted from a natural environment. Among available analyses, comparative metagenomics aims at estimating the similarity between two or more environmental samples at the genomic level. The traditional approach compares the samples based on their content in known identified species. However, this method is biased by the incompleteness of reference databases. By contrast, de novo comparative metagenomics does not rely on a priori knowledge. Sample similarity is estimated by counting the number of similar DNA sequences between datasets. A metagenomic project typically generates hundreds of datasets. Each dataset contains tens of millions of short DNA sequences ranging from 100 to 150 base pairs (called reads). In the context of this thesis, it would require years to compare such an amount of data with usual methods. This thesis presents novel de novo approaches to quickly compute the similarity between numerous datasets. The main idea underlying our work is to use the k-mer (word of size k) as a comparison unit of the metagenomes. The main method developed during this thesis, called Simka, computes several similarity measures by replacing species counts by k-mer counts (k &gt; 21). Simka scales-up today’s metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Experiments on data from the Human Microbiome Project and Tara Oceans show that the similarities computed by Simka are well correlated with reference-based and OTU-based similarities. Simka processed these projects (more than 30 billions of reads distributed in hundreds of datasets) in few hours. It is currently the only tool able to scale-up such projects, while providing precise and extensive comparison results
APA, Harvard, Vancouver, ISO, and other styles
18

Theodore, Jamal A. "A Framework for Comparative Analysis of Gene Expressions and Mutations Linked to Cancer." Thesis, The George Washington University, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1556730.

Full text
Abstract:
<p> A Framework for Comparative Analysis of Gene Expressions and Mutations Linked to Cancer Analysis of the aberrations occurring at the functional sites of genes and proteins is essential to understanding the genomic basis of human disease. There are many data sources that offer rich repository of information on sequence features, but their heterogeneity poses a challenge to developing an intuitive and high-confidence workflow for next-generation sequencing (NGS) data analysis. Moreover, the failure of existing repositories to incorporate results from both small-scale and large-scale studies has inhibited the identification of many novel non-synonymous single-nucleotide variations (nsSNVs). The HIVE (High-performance Integrated Virtual Environment) platform offers integrated and curated sources of nsSNVs and gene expression data from trusted genomic and proteomic repositories and publications. We demonstrate a data-driven functional genomics approach primarily leveraging the HIVE framework to identify priority targets for further investigation in the lab. Additionally, we developed the HIVE Genecast mobile app for Android devices that is annotated with our priority target results to provide scientists with access to gene sequence information while away from their workspaces.</p>
APA, Harvard, Vancouver, ISO, and other styles
19

Fulcher, Benjamin D. "Highly comparative time-series analysis." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:642b65cf-4686-4709-9f9d-135e73cfe12e.

Full text
Abstract:
In this thesis, a highly comparative framework for time-series analysis is developed. The approach draws on large, interdisciplinary collections of over 9000 time-series analysis methods, or operations, and over 30 000 time series, which we have assembled. Statistical learning methods were used to analyze structure in the set of operations applied to the time series, allowing us to relate different types of scientific methods to one another, and to investigate redundancy across them. An analogous process applied to the data allowed different types of time series to be linked based on their properties, and in particular to connect time series generated by theoretical models with those measured from relevant real-world systems. In the remainder of the thesis, methods for addressing specific problems in time-series analysis are presented that use our diverse collection of operations to represent time series in terms of their measured properties. The broad utility of this highly comparative approach is demonstrated using various case studies, including the discrimination of pathological heart beat series, classification of Parkinsonian phonemes, estimation of the scaling exponent of self-affine time series, prediction of cord pH from fetal heart rates recorded during labor, and the assignment of emotional content to speech recordings. Our methods are also applied to labeled datasets of short time-series patterns studied in temporal data mining, where our feature-based approach exhibits benefits over conventional time-domain classifiers. Lastly, a feature-based dimensionality reduction framework is developed that links dependencies measured between operations to the number of free parameters in a time-series model that could be used to generate a time-series dataset.
APA, Harvard, Vancouver, ISO, and other styles
20

Keedwell, Edward. "Knowledge discovery from gene expression data using neural-genetic models : a comparative study of four European countries with special attention to the education of these children." Thesis, University of Exeter, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288704.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Sturgill, David Matthew. "Comparative Genome Analysis of Three Brucella spp. and a Data Model for Automated Multiple Genome Comparison." Thesis, Virginia Tech, 2003. http://hdl.handle.net/10919/10163.

Full text
Abstract:
Comparative analysis of multiple genomes presents many challenges ranging from management of information about thousands of local similarities to definition of features by combination of evidence from multiple analyses and experiments. This research represents the development stage of a database-backed pipeline for comparative analysis of multiple genomes. The genomes of three recently sequenced species of Brucella were compared and a superset of known and hypothetical coding sequences was identified to be used in design of a discriminatory genomic cDNA array for comparative functional genomics experiments. Comparisons were made of coding regions from the public, annotated sequence of B. melitensis (GenBank) to the annotated sequence of B. suis (TIGR) and to the newly-sequenced B. abortus (personal communication, S. Halling, National Animal Disease Center, USDA). A systematic approach to analysis of multiple genome sequences is described including a data model for storage of defined features is presented along with necessary descriptive information such as input parameters and scores from the methods used to define features. A collection of adjacency relationships between features is also stored, creating a unified database that can be mined for patterns of features which repeat among or within genomes. The biological utility of the data model was demonstrated by a detailed analysis of the multiple genome comparison used to create the sample data set. This examination of genetic differences between three Brucella species with different virulence patterns and host preferences enabled investigation of the genomic basis of virulence. In the B. suis genome, seventy-one differentiating genes were found, including a contiguous 17.6 kb region unique to the species. Although only one unique species-specific gene was identified in the B. melitensis genome and none in the B. abortus genome, seventy-nine differentiating genes were found to be present in only two of the three Brucella species. These differentiating features may be significant in explaining differences in virulence or host specificity. RT-PCR analysis was performed to determine whether these genes are transcribed in vitro. Detailed comparisons were performed on a putative B. suis pathogenicity island (PAI). An overview of these genomic differences and discussion of their significance in the context of host preference and virulence is presented.<br>Master of Science
APA, Harvard, Vancouver, ISO, and other styles
22

Coghill, Lyndon M. "Statistical and Comparative Phylogeography of Mexican Freshwater Taxa in Extreme Aquatic Environments." ScholarWorks@UNO, 2013. http://scholarworks.uno.edu/td/1724.

Full text
Abstract:
Phylogeography aims to understand the processes that underlie the distribution of genetic variation within and among closely related species. Although the means by which this goal might be achieved differ considerably from those that spawned the field some thirty years ago, the foundation and conceptual breakthroughs made by Avise are nonetheless the same and are as relevant today as they were two decades ago. Namely, patterns of neutral genetic variation among individuals carry the signature of a species’ demographic past, and the spatial and temporal environmental heterogeneity across a species’ geographic range can influence patterns of evolutionary change. Aquatic systems throughout Mexico provide unique opportunities to study phenotypic plasticity and evolution in relation to climatic and environmental selective forces. There are several unique, often isolated aquatic environments throughout Mexico that have a history of geographic isolation and reconnection. The first study presented herein shows significant mitochondrial sequence divergence was also discovered between L. megalotis populations on either side of the Sierra de San Marcos that bisects the valley of Cuatro Ciénegas and that the populations in the valley are genetically distinct from those found outside of the valley. The second study recovered signals of two divergence events in Cuatro Ciénegas for six codistributed taxa, and reveals that both events occured in the Pleistocene during periods of increased aridity suggesting that climatic effects might have played a role in these species’ divergence. The final study presents an Illumina-based high-resolution species phylogeny for Astyanax mexicanus providing added support that there are multiple origins to cave populations and further clarifying the uniqueness of the Sabinos and Rio Subterráneo caves.
APA, Harvard, Vancouver, ISO, and other styles
23

Melters, Daniel Patrick. "Comparative Analysis of Tandem Repeats from Eukaryotic Genomes| Insight in Centromere Evolution." Thesis, University of California, Davis, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3602160.

Full text
Abstract:
<p>Centromeres are the chromosomal loci where microtubule spindles bind, via the kinetochore, during mitosis and meiosis. Paradoxically the centromere, as a functional unit, is essential to guarantee faithful chromosome segregation, whereas its underlying DNA sequences and associated kinetochore proteins are fast evolving. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. In spite of their importance, very little is known about the degree to which centromeric tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from species using publicly available genomic sequence and our own data. We found that despite an overall lack of sequence conservation, centromeric tandem repeats from diverse species showed similar modes of evolution. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. In addition, we performed a survey of fungi genomes for the presence of high-copy tandem repeats, but found little evidence to suggest that high-copy centromeric repeats are a common feature feature in fungi, with the possible exception of the <i>Zygomycota</i>. phylum. Finally, in most species the kinetochore assembles at a single locus, but in some cases the kinetochore forms along the entire length of the chromosomes forming holocentric chromosomes. Following a literature review we estimate that holocentricity is very common and has evolved at least thirteen times.
APA, Harvard, Vancouver, ISO, and other styles
24

Brett, Benjamin Thomas. "A computational approach for comparative oncogenomics using mouse models." Diss., University of Iowa, 2014. https://ir.uiowa.edu/etd/4582.

Full text
Abstract:
Cancer is the second most common cause of death in the United States. It is a complex disease with environmental, genetic, and lifestyle factors influencing the likelihood of getting cancer and the development of any resulting tumor. Understanding the genetics of cancer is integral to developing novel patient-specific treatments. However, due to complexity, hundreds to thousands of tumors are required for sufficient power to identify the network of relationships among these genes. Animal models of cancer are commonly used to reduce cost and to control experimental variables allowing for more specific hypothesis testing. The Sleeping Beauty transposon mutagenesis system can be used to model cancer in mice. While the Sleeping Beauty mutagenesis system is an important tool in understanding cancer, it has specific computational needs. Experiments need to be analyzed in a fast, unbiased, and efficient manner. A computational method must also accurately model the system allowing for validation and interpretation. Here I present an updated Integration Analysis System and use this system to validate the assumptions present in forward genetic screens of cancer using the Sleeping Beauty. This system allows for rapid identification of cancer genes, but does not directly aid in understanding the relationship between the genes. Given the complexity of cancer, understanding the relationship between cancer genes is very difficult. I have created a connectedness network utilizing the STRING database to better derive an understanding of cancer genes. STRING is a database of known and predicted protein-protein interactions. The connectedness between pairs of genes is calculated using a network reliability metric. This database allows for increased power to detect known pathways when compared to STRING alone. Combining this connectivity network with the set of cancer genes identified by the Integration Analysis System is a strategy for rapid and efficient interpretation of the genetic results.
APA, Harvard, Vancouver, ISO, and other styles
25

Ohniwa, Ryosuke L. "Comparative analyses of genome architectures among prokaryote, organelle and eukaryote by nano-scale imaging, molecular genetics and bioinformatics." 京都大学 (Kyoto University), 2007. http://hdl.handle.net/2433/136993.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Sanches, Pablo Rodrigo. "Ferramentas computacionais para o estudo estrutural e funcional de genes de dermatófitos potencialmente envolvidos na patogenicidade." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/17/17135/tde-06012016-152343/.

Full text
Abstract:
Dermatófitos são fungos filamentosos que infectam substratos queratinizados como pele, unha e cabelo em busca de nutrientes para se desenvolverem e permanecerem no hospedeiro. Pertencem aos gêneros Epidermophyton, Microsporum ou Trichophyton, os quais, dependendo de seu habitat natural, são classificados em espécies geofílicas, zoofílicas ou antropofílicas. O uso indiscriminado de antifúngicos levou à seleção de cepas resistentes, e o comportamento invasivo desses patógenos em pacientes imunodeprimidos aumentou nos últimos anos, dificultando o tratamento das dermatofitoses. Há, portanto, a necessidade de estudos para um melhor entendimento da biologia dos dermatófitos devido as suas importâncias médica e/ou veterinária e o escasso conhecimento da interação destes patógenos com os hospedeiros. No presente trabalho, analisamos oito espécies de dermatófitos: Arthroderma benhamiae, Microsporum canis, Microsporum gypseum, Trichophyton interdigitale, Trichophyton equinum, Trichophyton rubrum, Trichophyton tonsurans e Trichophyton verrucosum. Análises de genômica comparativa e de expressão de genes potencialmente envolvidos na degradação de queratina foram realizadas. Além disso, efetuamos o sequenciamento genômico em larga escala de uma das linhagens. A estrutura dos genes sub3, sub5 e sub7, que codificam serina endopeptidases com atividade queratinolítica, mep3 e mep4, que codificam proteínas pertencentes ao grupo das metaloendopeptidases, dppV, lap1 e lap2, que codificam exopeptidases, foi analisada por meio de ferramentas computacionais. Essas análises revelaram que os genes que codificam proteases possuem alto grau de conservação em suas estruturas, que é menor quando comparadas apenas suas regiões não codificadoras. As análises permitiram também a identificação em regiões promotoras de consensos específicos a gêneros de dermatófitos. Observamos que o acúmulo de transcritos destes genes, avaliados durante o cultivo em queratina, mimetizando o processo infeccioso, não está correlacionado à similaridade das sequências gênicas entre as espécies. Não encontramos correlação entre o nicho preferencial dos dermatófitos e suas sequências gênicas ou níveis transcricionais. Observamos que, na grande maioria das vezes, genes que codificam endo e exopeptidases, possuem acúmulo de transcritos em períodos iniciais de degradação de queratina. Nossos resultados sugerem que diferenças pontuais na sequencia gênica, diferenças em regiões promotoras ou, até mesmo, expressão variável destes genes que codificam um conjunto proteico com funções sinérgicas e provavelmente compensatórias, contribuam para os diferentes graus de reações inflamatórias no hospedeiro, bem como para a especificidade patógeno-hospedeiro.<br>Dermatophytes are filamentous fungi that infect keratinized substrates such as skin, nail and hair, searching for nutrients for their development and permanence in the host. They belong to the genera Epidermophyton, Microsporum or Trichophyton, and, depending on their natural habitat, are classified into geophilics, zoophilics or anthropophilics species. The indiscriminate use of antifungals has led to the selection of resistant strains, and the invasive behaviour of these pathogens in immunocompromised patients increased in the last years, hampering the treatment of the dermatophytoses. Therefore, there is a need of studies for a better understanding of the biology of the dermatophytes due to their medical and/or veterinary importance and the scarce knowledge about the interaction of these pathogens with their hosts. In this work, we analyzed eight species of dermatophytes: Arthroderma benhamiae, Microsporum canis, Microsporum gypseum, Trichophyton interdigitale, Trichophyton equinum, Trichophyton rubrum, Trichophyton tonsurans, and Trichophyton verrucosum. Comparative genomics and gene expression analyses of genes potentially involved in keratin degradation were performed. Moreover, we performed a large-scale genome sequencing of one of the strains. The structure of the genes sub3, sub5, and sub7, which encode serine endopeptidases with keratinolytic activity, mep3, and mep4, which encode proteins belonging to the group of the metalloendopeptidases, dppV, lap1, and lap2, encoding exopeptidases, were analyzed by computational tools. These analyses revealed that the genes encoding proteases possesses high degree of conservation in their structures, which are lower when their non-coding regions are compared. The analyses also allowed the identification of consensus in promoter regions, specific of dermatophytes genera. We observed that the transcripts accumulation of these genes, evaluated during the cultivation in keratin, mimicking the infection process, is not correlated to the gene sequence similarities among the species. We have not found any correlation between the preferential niche of dermatophytes and their gene sequences or transcription levels. Most of the times, we observed that genes encoding endo and exopeptidases accumulated transcripts at the beginning of keratin degradation. Our results suggest that specific differences in the genic sequencing, differences in promoter regions, or even variable expression of these genes encoding a set of proteins with synergic and probably compensatory functions, contribute to different levels of inflammatory reactions in the host, as well as to the host-pathogen specificity.
APA, Harvard, Vancouver, ISO, and other styles
27

Kehr, Stephanie. "Expanding the SnoRNA Interaction Network." Doctoral thesis, Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-216221.

Full text
Abstract:
Small nucleolar RNAs (snoRNAs) are one of the most abundant and evolutionary ancient group of small non-coding RNAs. Their main function is to target chemical modifications of ribosomal RNAs (rRNAs) and small nuclear (snRNAs). They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of modification that they govern. The box H/ACA snoRNAs are responsible for targeting pseudouridylation sites and the box C/D snoRNAs for directing 2’-O-methylation of ribonucleotides. A subclass that localize to the Cajal bodies, termed scaRNAs, are responsible for methylation and pseudouridylation of snRNAs. In addition an amazing diversity of non-canonical functions of individual snoRNAs arose. The modification patterns in rRNAs and snRNAs are retained during evolution making it even possible to project them from yeast onto human. The stringent conservation of modification sites and the slow evolution of rRNAs and snRNAs contradicts the rapid evolution of snoRNA sequences. Recent studies that incorporate high-throughput sequencing experiments still identify undetected snoRNAs even in well studied organisms as human. The snoRNAbase, which has been the standard database for human snoRNAs has not been updated ince 2006 and misses these new data. Along with the lack of a centralized data collection across species, which incorporates also snoRNA class specific characteristics the need to integrate distributed data from literature and databases into a comprehensive snoRNA set arose. Although several snoRNA studies included pro forma target predictions in individual species and more and more studies focus on non-canonical functions of subclasses a systematic survey on the guiding function and especially functional homologies of snoRNAs was not available. To establish a sound set of snoRNAs a computational snoRNA annotation pipeline, named snoStrip that identifies homologous snoRNAs in related species was employed. For large scale investigation of the snoRNA function, state-of-the-art target pedictions were performed with our software RNAsnoop and PLEXY. Further, a new measure the Interaction Conservation Index (ICI) was developed to evaluate the conservation of snoRNA function. The snoStrip pipeline was applied to vertebrate species, where the genome sequence has been available. In addition, it was used in several ncRNA annotation studies (48 avian, spotted gar) of newly assembled genomes to contribute the snoRNA genes. Detailed target analysis of the new vertebrate snoRNA set revealed that in general functions of homologous snoRNAs are evolutionarily stable, thus, members of the same snoRNA family guide equivalent modifications. The conservation of snoRNA sequences is high at target binding regions while the remaining sequence varies significantly. In addition to elucidating principles of correlated evolution it was possible, with the help of the ICI measure, to assign functions to previously orphan snoRNAs and to associate snoRNAs as partners to known but so far unexplained chemical modifications. As further pattern redundant guiding became apparent. For many modification sites more than one snoRNA encodes the appropriate antisense element (ASE), which could ensure constant modification through snoRNAs that have different expression patterns. Furthermore, predictions of snoRNA functions in conjunction with sequence conservation could identify distant homologies. Due to the high overall entropy of snoRNA sequences, such relationships are hard to detect by means of sequence homology search methods alone. The snoRNA interaction network was further expanded through novel snoRNAs that were detected in data from high-throughput experiments in human and mouse. Through subsequent target analysis the new snoRNAs could immediately explain known modifications that had no appropriate snoRNA guide assigned before. In a further study a full catalog of expressed snoRNAs in human was provided. Beside canonical snoRNAs also recent findings like AluACAs, sno-lncRNAs and extraordinary short SNORD-like transcripts were taken into account. Again the target analysis workflow identified undetected connections between snoRNA guides and modifications. Especially some species/clade specific interactions of SNORD-like genes emerged that seem to act as bona fide snoRNA guides for rRNA and snRNA modifications. For all high confident new snoRNA genes identified during this work official gene names were requested from the HUGO Gene Nomenclature Committee (HGNC) avoiding further naming confusion.
APA, Harvard, Vancouver, ISO, and other styles
28

Sundell, David. "Novel resources enabling comparative regulomics in forest tree species." Doctoral thesis, Umeå universitet, Umeå Plant Science Centre (UPSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-133984.

Full text
Abstract:
Lignocellulosic plants are the most abundant source of terrestrial biomass and are one of the potential sources of renewable energy that can replace the use of fossil fuels. For a country such as Sweden, where the forest industry accounts for 10% of the total export, there would be large economical benefits associated with increased biomass yield. The availability of research on wood development conducted in conifer tree species, which represent the majority of the forestry in Sweden, is limited and the majority of research has been conducted in model angiosperm species such as Arabidopsis thaliana. However, the large evolutionary distance between angiosperms and gymnosperms limits the possibility to identify orthologous genes and regulatory pathways by comparing sequence similarity alone. At such large evolutionary distances, the identification of gene similarity is, in most cases, not sufficient and additional information is required for functional annotation. In this thesis, two high-spatial resolution datasets profiling wood development were processed; one from the angiosperm tree Populus tremula and the other from the conifer species Picea abies. These datasets were each published together with a web resource including tools for the exploration of gene expression, co-expression and functional enrichment of gene sets. One developed resource allows interactive, comparative co-expression analysis between species to identify conserved and diverged co-expression modules. These tools make it possible to identifying conserved regulatory modules that can focus downstream research and provide biologists with a resource to identify regulatory genes for targeted trait improvement.<br>Lignocellulosa är den vanligast förekommande källan till markburen biomassa och är en av de förnybara energikällor som potentiellt kan ersätta användningen av fossila bränslen. För ett land som Sverige, där skogsindustrin som står för 10 \% av den totala exporten, skulle därför en ökad produktion av biomassa kunna ge stora ekonomiska fördelar. Forskningen på barrträd, som utgör majoriteten av svensk skog är begränsad och den huvudsakliga forskningen som har bedrivits på växter, har skett i modell organismer tillhörande gruppen gömfröiga växter som till exempel i Arabidopsis thaliana. Det evolutionära avståndet mellan gömfröiga (blommor och träd) och nakenfröiga (gran och tall) begränsar dock möjligheten att identifiera regulatoriska system mellan dessa grupper. Vid sådana stora evolutionära avstånd krävs det mer än att bara identifiera en gen i en modellorganism utan ytterligare information krävs som till exempel genuttrycksdata. I denna avhandling har två högupplösta experiment som profilerar vedens utveckling undersökts; ett från gömfröiga träd Populus tremula och det andra från nakenföriga träd (barrträd) Picea abies. Datat som behandlats har publicerats tillsammans med webbsidor med flera olika verktyg för att bland annat visa genuttryck, se korrelationer av genuttryck och test för anrikning av funktionella gener i en grupp. En resurs som utvecklats tillåter interaktiva jämförelser av korrelationer mellan arter för att kunna identifiera moduler (grupper av gener) som bevaras eller skilts åt mellan arter över tid. Identifieringen av sådana bevarade moduler kan hjälpa att fokusera framtida forskning samt ge biologer en möjlighet att identifiera regulatoriska gener för en riktad förbättring av egenskaper hos träd.
APA, Harvard, Vancouver, ISO, and other styles
29

Pulicani, Sylvain. "Lien entre les réarrangements chromosomiques et la structure de la chromatine chez la Drosophile." Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS105/document.

Full text
Abstract:
Entre espèces, les génomes présentent des différences dans leur organisation, que ce soit au niveau du caryotype ou de l'ordre des gènes. Ceci reste vrai même entre espèces relativement proches comme l'humain et la souris, et est du aux réarrangements chromosomiques. Reconstruire l'histoire évolutive d'une lignée revient donc à déterminer des scénarios de réarrangements qui transforment un génome actuel en un autre. Le génome ancestral se trouve alors être l'un des états intermédiaires atteint par l'un de ces scénarios.Les réarrangements chromosomiques sont des évènements biologiques violents pour la cellule. En effet, de nombreux mécanismes moléculaires ont pour fonction de stopper le cycle cellulaire dans le cas où le génome aurait été altéré. De plus, les réarrangements peuvent être à l'origine de phénotypes aberrants, et donc probablement désavantageux pour leur porteur. Au vu de tout cela, il paraît raisonnable de poser l'hypothèse selon laquelle les scénarios de réarrangements sont parcimonieux.Cependant, il est admis que ce seul critère ne permet pas de reconstruire efficacement l'histoire évolutive des génomes. En effet, quelque soit le modèle utilisé pour générer les scénarios, leur nombre est exponentiel en le nombre de réarrangements. Une autre contrainte biologique doit donc être ajoutée. La conservation de la structure spatiale de la chromatine pourrait être un critère manquant essentiel. Il a été montré in vitro que lors d'une cassure double-brin suivie d'une réparation non-homologue, le brin utilisé pour la réparation se situe spatialement proche de la cassure. Notre hypothèse est donc que les points de cassures qui sont proches en 3D ont plus probablement participé à des réarrangements que les autres. Cela est appuyé par des analyses génomiques sur des cellules somatiques et entre espèces. Nommons cette hypothèse: l'hypothèse de localité.Notre approche a été de proposer une méthode pour utiliser l'information structurale afin de prioriser les scénarios de réarrangements. Les données de Hi-C ont été l'information structurale qui nous a permis d'appliquer la méthode aux scénarios entre D. melanogaster et D. yakuba.Ces résultats nous ont ensuite menés à nous demander si la structure de la chromatine ne pouvait pas elle-même évoluer. Elle serait alors susceptible d'être considérée comme un caractère phylogénétique. Cette idée est appuyée par d'autres résultats montrant la conservation de domaines topologiques entre espèces.Cette question ne semble pas avoir été posée auparavant. Elle est pourtant très intéressante car elle permet d'ouvrir tout un champ d'étude. En effet, si la structure de la chromatine porte un signal phylogénétique, alors il devient possible de s'interroger sur les mécanismes en œuvre lors de la sélection, ou sur la possibilité de reconstruire l'état ancestral de cette structure. Par la suite, il serait même possible de comparer l'évolution de la séquence et celle de la structure de la chromatine.Nous avons ainsi défini une distance entre les structures des génomes, basée sur la comparaison des contacts entre loci orthologues. Nous l'avons appliquée à une ensemble de six espèces comprenant l'humain, la souris et quatre drosophiles. Ces résultats confirment la présence d'un signal phylogénétique dans la structure spatiale des génomes. Ils mettent également en lumière l'intérêt de la mise en place de méthodes permettant de comparer efficacement des données de contacts entre espèces<br>Different species have different genome organization. Whether it be the karyotype or gene order, these differences are seen even with relatively close species like Human and Mouse. This is caused by the chromosomal rearrangement. Infererence of rearrangement scenarios that transform one present-day species into another can give insight into evolutionary states, the ancestral genome being one of the intermediates of the true scenario.The chromosomal rearrangements are violent biological events for the cell. Indeed, numerous mechanisms are present to stop the cell cycle when the genome sequence is altered. Moreover, rearrangements can be the source of aberrant phenotypes, which are probably unfavorable for the carrier. With all that, it seams reasonable to assume the rearrangement scenarios are parsimonious.However, it is accepted that this criterion alone is not sufficient to efficiently build the evolutionary history of the genomes. Indeed, for whatever model we choose, the number of scenario is exponential in the number of rearrangements. Another biological constraint is needed. The spatial structure of the chromatin could be an essential missing criterion. It has been shown in vitro that when a double-stranded break of the DNA is non-homologously repaired, the strand used for repairing is close in space to the breakpoint. Our hypothesis is that the closer the breakpoints are in space, the more probable they are to participate in a rearrangement. This hold on genomics analysis of somatic cells, and between species. Let's name that hypothesis the locality hypothesis.We proposed a method to use the structural information in order to prioritize the rearrangements scenarios. The Hi-C data were the structural information that allowed us to apply our method to scenarios between D. melanogaster and D. yakuba.This results led us to ask whether the chromatin structure could evolve by itself. Then, it could be used as a phylogenetic mark. This idea is related to previous results showing the conservation of topological domains between species.This question seams to be new, and could open a new line of investigation. If the chromatin structure holds a phylogenetical signal, it becomes possible to ask ourselves about the mechanisms that occur during the selection, or if it is possible for the ancestral state to be inferred. Then, it could even be possible to compare the evolution of the sequence with the one of the chromatin structure.Thus, we defined a distance between genome structures, based on the comparison of contacts between orthologous loci. We applied this distance to a set of six species, including the Human, the Mouse and four Drosophila. This result confirms the presence of a phylogenetic signal in the spatial structure of the genomes. They also showed that we're in need for efficient methods to compare contacts data between species
APA, Harvard, Vancouver, ISO, and other styles
30

Stangl, Karen E. "Comparative Proteomic Analysis of Phase-Switch in the Dimorphic Fungus, Penicillium marneffei." Youngstown State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1233446433.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Migeon, Pierre. "Comparative genomics of repetitive elements between maize inbred lines B73 and Mo17." Thesis, Kansas State University, 2017. http://hdl.handle.net/2097/35377.

Full text
Abstract:
Master of Science<br>Genetics Interdepartmental Program<br>Sanzhen Liu<br>The major component of complex genomes is repetitive elements, which remain recalcitrant to characterization. Using maize as a model system, we analyzed whole genome shotgun (WGS) sequences for the two maize inbred lines B73 and Mo17 using k-mer analysis to quantify the differences between the two genomes. Significant differences were identified in highly repetitive sequences, including centromere, 45S ribosomal DNA (rDNA), knob, and telomere repeats. Genotype specific 45S rDNA sequences were discovered. The B73 and Mo17 polymorphic k-mers were used to examine allele-specific expression of 45S rDNA in the hybrids. Although Mo17 contains higher copy number than B73, equivalent levels of overall 45S rDNA expression indicates that transcriptional or post-transcriptional regulation mechanisms operate for the 45S rDNA in the hybrids. Using WGS sequences of B73xMo17 doubled haploids, genomic locations showing differential repetitive contents were genetically mapped, revealing differences in organization of highly repetitive sequences between the two genomes. In an analysis of WGS sequences of HapMap2 lines, including maize wild progenitor, landraces, and improved lines, decreases and increases in abundance of additional sets of k-mers associated with centromere, 45S rDNA, knob, and retrotransposons were found among groups, revealing global evolutionary trends of genomic repeats during maize domestication and improvement.
APA, Harvard, Vancouver, ISO, and other styles
32

Mahato, Joyanto. "Comparative study of three Fe (III)-ion reducing bacteria gives insights into bioelectricity generation in the MFC technique." Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18598.

Full text
Abstract:
Microbial fuel cell (MFC) technology is a renewable energy source that employs microorganisms as biocatalysts to degrade substrates into electrons and protons, and then transfer the electrons to the anode electrode. Electron transfer rates by microorganisms depend on many factors as well as on their diverse electron transfer mechanisms. The present study compared cytochromes, flavoproteins, electron transfer complexes, redoxins and other extracellular membrane proteins that have direct involvement in electron transfer mechanisms in Escherichia coli str. K-12 MG1655, Rhodopseudomonas pulastris DX-1 and Shewanella oneidensis MR-1. Escherichia coli str. The results showed that K-12 MG1655 had a more diverse range of extracellular proteins for electron transfer mechanisms compared to Rhodopseudomonas pulastris DX-1 and Shewanella oneidensis MR-1. Escherichia coli str. K-12 MG1655 expressed more flavoproteins, redoxin and electron transfer complex related proteins that had direct involvement in electron transfer mechanisms compared to two other bacterial species indicating that it may be able to transfer more electrons when employed in MFC technique. Escherichia coli str. K-12 MG1655 expressed 16 cytochromes, 9 flavoproteins, 6 redoxins, 6 electron transport complexes, 1 hypothetical and 1 oxidoreductase proteins. On the other hand, Rhodopseudomonas pulastris DX-1 and Shewanella oneidensis MR-1 expressed 26 and 35 cytochromes proteins. But these two bacterial species expressed less flavoproteins and redoxin related proteins and they didn’t express any electron transport complexes or hypothetical and oxidoreductase related proteins for electron transfer. STRING and SMART results suggested that the identified proteins transferred electrons either by connecting with other types of identified proteins in the constructed gene network or independently by taking part in oxidation-reduction reaction, metal ion reduction reaction or by their FMN binding activities.
APA, Harvard, Vancouver, ISO, and other styles
33

Sjöstrand, Joel. "Reconciling gene family evolution and species evolution." Doctoral thesis, Stockholms universitet, Numerisk analys och datalogi (NADA), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-93346.

Full text
Abstract:
Species evolution can often be adequately described with a phylogenetic tree. Interestingly, this is the case also for the evolution of homologous genes; a gene in an ancestral species may – through gene duplication, gene loss, lateral gene transfer (LGT), and speciation events – give rise to a gene family distributed across contemporaneous species. However, molecular sequence evolution and genetic recombination make the history – the gene tree – non-trivial to reconstruct from present-day sequences. This history is of biological interest, e.g., for inferring potential functional equivalences of extant gene pairs. In this thesis, we present biologically sound probabilistic models for gene family evolution guided by species evolution – effectively yielding a gene-species tree reconciliation. Using Bayesian Markov-chain Monte Carlo (MCMC) inference techniques, we show that by taking advantage of the information provided by the species tree, our methods achieve more reliable gene tree estimates than traditional species tree-uninformed approaches. Specifically, we describe a comprehensive model that accounts for gene duplication, gene loss, a relaxed molecular clock, and sequence evolution, and we show that the method performs admirably on synthetic and biological data. Further-more, we present two expansions of the inference procedure, enabling it to pro-vide (i) refined gene tree estimates with timed duplications, and (ii) probabilistic orthology estimates – i.e., that the origin of a pair of extant genes is a speciation. Finally, we present a substantial development of the model to account also for LGT. A sophisticated algorithmic framework of dynamic programming and numerical methods for differential equations is used to resolve the computational hurdles that LGT brings about. We apply the method on two bacterial datasets where LGT is believed to be prominent, in order to estimate genome-wide LGT and duplication rates. We further show that traditional methods – in which gene trees are reconstructed and reconciled with the species tree in separate stages – are prone to yield inferior gene tree estimates that will overestimate the number of LGT events.<br>Arters evolution kan i många fall beskrivas med ett träd, vilket redan Darwins anteckningsböcker från HMS Beagle vittnar om. Detta gäller också homologa gener; en gen i en ancestral art kan – genom genduplikationer, genförluster, lateral gentransfer (LGT) och artbildningar – ge upphov till en genfamilj spridd över samtida arter. Att från sekvenser från nu levande arter rekonstruera genfamiljens framväxt – genträdet – är icke-trivialt på grund av genetisk rekombination och sekvensevolution. Genträdet är emellertid av biologiskt intresse, i synnerhet för att det möjliggör antaganden om funktionellt släktskap mellan nutida genpar. Denna avhandling behandlar biologiskt välgrundade sannolikhetsmodeller för genfamiljsevolution. Dessa modeller tar hjälp av artevolutionens starka inverkan på genfamiljens historia, och ger väsentligen upphov till en förlikning av genträd och artträd. Genom Bayesiansk inferens baserad på Markov-chain Monte Carlo (MCMC) visar vi att våra metoder presterar bättre genträdsskattningar än traditionella ansatser som inte tar artträdet i beaktning. Mer specifikt beskriver vi en modell som omfattar genduplikationer, genförluster, en relaxerad molekylär klocka, samt sekvensevolution, och visar att metoden ger högkvalitativa skattningar på både syntetiska och biologiska data. Vidare presenterar vi två utvidgningar av detta ramverk som möjliggör (i) genträdsskattningar med tidpunkter för duplikationer, samt (ii) probabilistiska ortologiskattningar – d.v.s. att två nutida gener härstammar från en artbildning. Slutligen presenterar vi en modell som inkluderar LGT utöver ovan nämnda mekanismer. De beräkningsmässiga svårigheter som LGT ger upphov till löses med ett intrikat ramverk av dynamisk programmering och numeriska metoder för differentialekvationer. Vi tillämpar metoden för att skatta LGT- och duplikationsraten hos två bakteriella dataset där LGT förmodas ha spelat en central roll. Vi visar också att traditionella metoder – där genträd skattas och förlikas med artträdet i separata steg – tenderar att ge sämre genträdsskattningar, och därmed överskatta antalet LGT-händelser.<br><p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 5: Manuscript.</p>
APA, Harvard, Vancouver, ISO, and other styles
34

Atolagbe, Oluwatomisin Toluwanimi. "Comparative Analysis of the Transcriptomes of M1 and M2 Macrophages." University of Toledo Health Science Campus / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=mco150166566713963.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Cruz, Pulido Diana Patricia. "Comparative transcriptome profiling of human and pig intestinal epithelial cells after Deltacoronavirus infection." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587711071257247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Mittal, Dipti. "A Benchmark Data Set and Comparative Study for Protein Structural Alignment Tools." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1223065980.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Gomi, Masahiro, Ryusuke Sawada, Masashi Sonoyama, et al. "Comparative proteomics of the prokaryota using secretory proteins." Chem-Bio Informatics Society, 2005. http://hdl.handle.net/2237/9270.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Golenetskaya, Natalia. "Adressing scaling challenges in comparative genomics." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00865840.

Full text
Abstract:
La génomique comparée est essentiellement une forme de fouille de données dans des grandes collections de relations n-aires. La croissance du nombre de génomes sequencés créé un stress sur la génomique comparée qui croit, au pire géométriquement, avec la croissance en données de séquence. Aujourd'hui même des laboratoires de taille modeste obtient, de façon routine, plusieurs génomes à la fois - et comme des grands consortia attend de pouvoir réaliser des analyses tout-contre-tout dans le cadre de ses stratégies multi-génomes. Afin d'adresser les besoins à tous niveaux il est nécessaire de repenser les cadres algorithmiques et les technologies de stockage de données utilisés pour la génomique comparée. Pour répondre à ces défis de mise à l'échelle, dans cette thèse nous développons des méthodes originales basées sur les technologies NoSQL et MapReduce. À partir d'une caractérisation des sorts de données utilisés en génomique comparée et d'une étude des utilisations typiques, nous définissons un formalisme pour le Big Data en génomique, l'implémentons dans la plateforme NoSQL Cassandra, et évaluons sa performance. Ensuite, à partir de deux analyses globales très différentes en génomique comparée, nous définissons deux stratégies pour adapter ces applications au paradigme MapReduce et dérivons de nouveaux algorithmes. Pour le premier, l'identification d'événements de fusion et de fission de gènes au sein d'une phylogénie, nous reformulons le problème sous forme d'un parcours en parallèle borné qui évite la latence d'algorithmes de graphe. Pour le second, le clustering consensus utilisé pour identifier des familles de protéines, nous définissons une procédure d'échantillonnage itérative qui converge rapidement vers le résultat global voulu. Pour chacun de ces deux algorithmes, nous l'implémentons dans la plateforme MapReduce Hadoop, et évaluons leurs performances. Cette performance est compétitive et passe à l'échelle beaucoup mieux que les algorithmes existants, mais exige un effort particulier (et futur) pour inventer les algorithmes spécifiques.
APA, Harvard, Vancouver, ISO, and other styles
39

Moyo, Sipho Dugunye. "Comparative study of clan CA cysteine proteases: an insight into the protozoan parasites." Thesis, Rhodes University, 2015. http://hdl.handle.net/10962/d1020309.

Full text
Abstract:
Protozoan infections such as Malaria, Leishmaniasis, Toxoplasmosis, Chaga’s disease and African trypanosomiasis caused by the Plasmodium, Leishmania, Toxoplasma and Trypanosoma genuses respectively; inflict a huge economic, health and social impact in endemic regions particularly tropical and sub-tropical regions. The combined infections are estimated at over a billion annually and approximately 1.1 million deaths annually. The global burden of the protozoan infections is worsened by the increased drug resistance, toxicity and the relatively high cost of treatment and prophylaxis. Therefore there has been a high demand for new drugs and drug targets that play a role in parasite virulence. Cysteine proteases have been validated as viable drug targets due to their role in the infectivity stage of the parasites within the human host. There is a variety of cysteine proteases hence they are subdivided into families and in this study we focus on the clan CA, papain family C1 proteases. The current inhibitors for the protozoan cysteine proteases lack selectivity and specificity which contributes to drug toxicity. Therefore there is a need to identify the differences and similarities between the host, vector and protozoan proteases. This study uses a variety of bioinformatics tools to assess these differences and similarities. The Plasmodium cysteine protease FP-2 is the most characterized protease hence it was used as a reference to all the other proteases and its homologs were retrieved, aligned and the evolutionary relationships established. The homologs were also analysed for common motifs and the physicochemical properties determined which were validated using the Kruskal-Wallis test. These analyses revealed that the host and vector cathepsins share similar properties while the parasite cathepsins differ. At sub-site level sub-site 2 showed greater variations suggesting diverse ligand specificity within the proteases, a revelation that is vital in the design of antiprotozoan inhibitors.
APA, Harvard, Vancouver, ISO, and other styles
40

Rey, Carine. "Détection de l’évolution convergente à l’échelle génomique : développement de méthodes et étude des adaptations indépendantes à la vie en milieu aride chez les rongeurs." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEN060.

Full text
Abstract:
La convergence phénotypique, c’est-à-dire l’acquisition indépendante de caractères similaires par des espèces différentes, est omniprésente dans la nature et a été souvent étudiée. Mais ce processus évolutif n’est pas bien compris. Par exemple, de nombreux chercheurs cherchent à comprendre s’il existe des bases génétiques convergentes sous-jacentes à ces convergences phénotypiques.Quelques substitutions convergentes corrélées à un phénotype convergent ont été décrites dans la littérature, mais il existe peu d’études à l’échelle génomique. Ceci peut s’expliquer par deux problèmes méthodologiques : 1/ D’une part, la difficulté de créer des jeux de données multi-espèces pour des analyses comparatives. 2/ D’autre part, le manque de méthodes dédiées à la détection de la convergence à l’échelle génomique.Au cours de ma thèse, j’ai proposé des solutions à ces deux défis. Dans un premier temps, j’ai créé un programme (CAARS) permettant d’automatiser l’assemblage de jeux de données composés de familles d’orthologues à partir de données RNA-Seq. Puis, j’ai créé un outil (PCOC) pour étudier les substitutions convergentes au sein de séquences codantes, basé sur l’identification de changements de profils d’acides aminés. Ces outils ont été développés dans un souci de reproductibilité et de facilité d’utilisation. J’ai ensuite étudié la capacité de différentes méthodes, dont PCOC, à détecter des substitutions convergentes en présence de facteurs confondants. Enfin, j’ai appliqué ces méthodes à un cas biologique où j’ai cherché à caractériser les bases génomiques de l’adaptation aux milieux arides chez les rongeurs<br>Phenotypic convergence, the independent acquisition of similar characters by different species,is widespread in nature and has been extensively studied. But this evolutionary process is not well understood. For example, many researchers seek to understand whether there are convergent genetic bases underlying these phenotypic convergences.Some convergent substitutions correlated with a convergent phenotype have been described in the literature, but there are few studies at the genome scale. This can be explained by two methodological problems : 1 / On the one hand, the difficulty of creating multi-species datasets for comparative analyses. 2 / On the other hand, the lack of dedicated methods to detect convergence at the genomic scale.During my thesis, I proposed solutions to these two challenges. As a first step, I created a program (CAARS) to automate the assembly of datasets composed of orthologous families from RNA-Seq data. Then I created a tool (PCOC) to study convergent substitutions within coding sequences, based on the identification of amino acid profile changes rather than strict amino acid changes. These tools have been developed for the sake of reproducibility and ease of use. I then studied the ability of different methods, including PCOC, to detect convergent substitutions in the presence of confounding factors. Finally, I applied these methods to a biological case where I sought to characterize the genomic bases of adaptation to arid environments in rodents
APA, Harvard, Vancouver, ISO, and other styles
41

Brandström, Mikael. "Bioinformatic Analysis of Mutation and Selection in the Vertebrate Non-coding Genome." Doctoral thesis, Uppsala University, Department of Evolution, Genomics and Systematics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8240.

Full text
Abstract:
<p>The majority of the vertebrate genome sequence is not coding for proteins. In recent years, the evolution of this noncoding fraction of the genome has gained interest. These studies have been greatly facilitated by the availability of full genome sequences. The aim of this thesis is to study evolution of the noncoding vertebrate genome through bioinformatic analysis of large-scale genomic datasets.</p><p>In a first analysis we addressed the use of conservation of sequence between highly diverged genomes to infer function. We provided evidence for a turnover of the patterns of negative selection. Hence, measures of constraint based on comparisons of diverged genomes might underestimate the functional proportion of the genome.</p><p>In the following analyses we focused on length variation as found in small-scale insertion and deletion (indel) polymorphisms and microsatellites. For indels in chicken, replication slippage is a likely mutation mechanism, as a large proportion of the indels are parts of tandem-duplicates. Using a set of microsatellite polymorphisms in chicken, where we avoid ascertainment bias, we showed that polymorphism is positively correlated with microsatellite length and AT-content. Furthermore, interruptions in the microsatellite sequence decrease the levels of polymorphism.</p><p>We also analysed the association between microsatellite polymorphism and recombination in the human genome. Here we found increased levels of microsatellite polymorphism in human recombination hotspots and also similar increases in the frequencies of single nucleotide polymorphisms (SNPs) and indels. This points towards natural selection shaping the levels of variation. Alternatively, recombination is mutagenic for all three kinds of polymorphisms. </p><p>Finally, I present the program ILAPlot. It is a tool for visualisation, exploration and data extraction based on BLAST.</p><p>Our combined results highlight the intricate connections between evolutionary phenomena. It also emphasises the importance of length variability in genome evolution, as well as the gradual difference between indels and microsatellites.</p>
APA, Harvard, Vancouver, ISO, and other styles
42

Paschoal, Alexandre Rossi. "GINGA - Graphical Interface for Comparative Genome Analysis: o desenvolvimento de um sistema computacional de visualização gráfica para a análise comparativa de genomas de bactérias." Laboratório Nacional de Computação Científica, 2007. http://www.lncc.br/tdmc/tde_busca/arquivo.php?codArquivo=124.

Full text
Abstract:
Esta dissertação resultou de um sistema computacional voltado para a visualização gráfica de análises comparativas entre genomas de procariotos. O sistema denominado de GINGA Graphical Interface for comparative Genome Analysis foi desenvolvido basicamente para analisar genomas parcialmente seqüenciados por meio da comparação com genomas completos. O sistema mostra a representação do alinhamento entre seqüências de reads, contigs e scaffolds do genoma parcial com a seqüência completa do outro genoma, permitindo a identificação de blocos comuns, regiões específicas e rearranjos. GINGA é um sistema web-based que foi desenvolvido em linguagem PERL para acessar um banco de dados MySQL, onde estão armazenadas as informações obtidas nas análises comparativas. O módulo de interface da biblioteca gráfica GD da linguagem PERL foi utilizado para a construção da ferramenta de visualização. A representação gráfica criada permite a navegação com opções de zoom in/out, disponibilizando as informações de montagem, anotação das seqüências codificadoras e da organização das seqüências entre os genomas. Relatórios são ainda disponibilizados como fonte complementar da apresentação dos resultados. O sistema GINGA foi utilizado para analisar de maneira comparativa o genoma das bactérias Leifsonia xyli subsp. cynodontis (Lxc genoma parcialmente seqüenciado) e Leifsonia xyli subsp. xyli (Lxx genoma completamente seqüenciado). Lxx provoca o raquitismo da soqueria em cana-de-açúcar, enquanto Lxc é capaz de colonizar cana-de-açúcar sem provocar sintomas de doença. O objetivo foi revelar, ainda durante o processo de seqüenciamento do genoma de Lxc, diferenças genéticas existentes entre os genomas dessas duas bactérias. Fizeram parte das análises comparativas um total de 9.754 reads do genoma de Lxc que formaram 1.064 contigs e 317 scaffolds, totalizando 1.470.731 de bases não redundantes. GINGA permitiu a identificação de 206.320 bases (~19%) em seqüências de contigs específicos (contigs que não apresentaram alinhamento algum com o genoma completo de Lxx) e 19 scaffolds (5,9%) que totalizaram 56.884 bases específicas ao genoma de Lxc, além de aproximadamente 1 milhão de nucleotídeos alinhados ao genoma de Lxx e pelo menos 6 grandes rearranjos. Estes resultados foram disponibilizados em uma interface gráfica e relatórios, permitindo orientar o andamento do projeto de seqüenciamento do genoma de Lxc quanto à seleção das regiões a serem seqüenciadas e, simultaneamente, oferecendo informações para a formalização de hipóteses relevantes à biologia destes microorganismos.<br>This study aimed to develop a computational system applied to the comparative analysis of prokaryotic genomes in a graphical view. The system named GINGA Graphical Interface for comparative Genome Analysis was developed to analyse a draft genome sequence in comparison to a complete genome. The system shows the alignment between sequence of reads, contigs and scaffolds from partial sequenced genomes and the complete sequence of another genome and allows the identification shared and unique regions as well as rearrangements. GINGA is a web-based system developed using the PERL language to access a MySQL database where all the information regard to the comparative analysis is stored. The module of the interface to GD (Graphics Library) was used to help the construction of the graphical tool. The graphical view allows zoom in/out on the information on assembly, annotation and the organization of the sequences. Supplementary information can be accessed in the form of reports. GINGA system was used to compare the genomes of Leifsonia xyli subsp. cynodontis (Lxc draft genome sequence) and Leifsonia xyli subsp. xyli (Lxx complete genome sequence). The mail goal was to identify genetic differences that may help to understand the pathogeniciy of Lxx towards sugarcane. A total of 9.754 reads assembled in 1.064 contigs and 317 scaffolds produced 1.470.731 of no redundant bases of Lxc genome and were used in the analysis. GINGA allowed the identification of 206.320 bp (~20%) of Lxc specific sequences organized in contigs and 56.884 bp organized in 19 scaffolds (5,9%), around 1 milion bp aligned to Lxx genome and at least 6 large scale genomic rearrangements. These results were presented in a graphical interface and allowed to guide the partial genome sequencing, helping to decide which regions should be further sequenced and at the same time allowing the formulation of hypothesis related to important biological aspects of these microorganisms
APA, Harvard, Vancouver, ISO, and other styles
43

Laetsch, Dominik Robert. "On the evolution of effector gene families in potato cyst nematodes." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31244.

Full text
Abstract:
Potato cyst nematodes (PCN) are economically relevant plant parasites that infect potato crops. The genomes of three PCN species are available and genome data have been generated for several populations of PCN, to address questions related to the molecular basis of plant parasitism. In this thesis, I employ approaches of comparative genomics to highlight differences and similarities between PCNs and other nematode species. I present two new software solutions to address challenges associated with the field of comparative genomics: BlobTools, a taxonomic interrogation toolkit for quality control of genome assemblies, and KinFin, a solution for the analysis of protein orthology data. I apply both software solutions to genomic datasets of nematodes, platyhelminths, and tardigrades. Based on KinFin analysis of plant parasitic nematodes, I identify protein families in PCNs likely to be involved in host-parasitic interaction, termed effectors, and discuss their functions. I highlight examples of horizontal gene transfer from bacteria to plant parasitic nematodes. Through genomic data of European and South American populations of PCNs, I address variation in populations, infer phylogenetic relationships, and try to estimate the effect of selection on effector genes identified through KinFin. Furthermore, I estimate the rate of variation across the reference genomes of two PCNs.
APA, Harvard, Vancouver, ISO, and other styles
44

Noh, Hyun Ji. "Comparative approaches to the genetics of human neuropsychiatric disorders." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:8cb9ee02-1b12-4b78-bb62-3bbf4d5eba26.

Full text
Abstract:
In this thesis, I investigate the genetics of neuropsychiatric disorders by analysing large data sets derived from high-throughput experiments, using novel comparative genomics approaches. In the first project, I explore characteristics of rare, de novo copy number variants identified among autism patients by employing various bioinformatics resources including Mouse Genome Informatics phenotypes, Gene Ontology terms, and protein-protein interactions. I describe how I objectively identified a number of mouse model phenotypes that are significantly associated with autism, and that provide insight into the aetiologies for both copy number deletions and duplications. In the second project, I investigate the genetics of obsessive-compulsive disorder by resequencing genomic regions of human case-control cohorts and the best spontaneous disease model organisms, namely dogs with canine compulsive disorder, and breed-matched controls. Targeted sequencing experiments yielded a large number of high-quality genetic variants in both humans and dogs. I prioritised variants and genes using case- control comparisons and functional annotations such as types of mutation, evolutionary conservation status and regulatory marks. In turn, I generated several hypotheses that are experimentally tractable. Replication of these findings in a larger cohort is necessary, although it lies beyond the scope of this thesis. Results from both projects indicate that the analytical frameworks employed in this thesis could be profitably applied to other neuropsychiatric disorders.
APA, Harvard, Vancouver, ISO, and other styles
45

Potter, Dustin Paul. "A combinatorial approach to scientific exploration of gene expression data: An integrative method using Formal Concept Analysis for the comparative analysis of microarray data." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/28792.

Full text
Abstract:
Functional genetics is the study of the genes present in a genome of an organism, the complex interplay of all genes and their environment being the primary focus of study. The motivation for such studies is the premise that gene expression patterns in a cell are characteristic of its current state. The availability of the entire genome for many organisms now allows scientists unparalleled opportunities to characterize, classify, and manipulate genes or gene networks involved in metabolism, cellular differentiation, development, and disease. System-wide studies of biological systems have been made possible by the advent of high-throughput and large-scale tools such as microarrays which are capable of measuring the mRNA levels of all genes in a genome. Tools and methods for the integration, visualization, and modeling of the large-scale data obtained in typical systems biology experiments are indispensable. Our work focuses on a method that integrates gene expression values obtained from microarray experiments with biological functional information related to the genes measured in order to make global comparisons of multiple experiments. In our method, the integrated data is represented as a lattice and, using appropriate measures, a reference experiment can be compared to samples from a database of similar experiments, and a ranking of similarity is returned. In this work, support for the validity of our method is demonstrated both theoretically and empirically: a mathematical description of the lattice structure with respect to the integrated information is developed and the method is applied to data sets of both simulated and reported microarray experiments. A fast algorithm for constructing the lattice representation is also developed.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
46

Venkatesan, Arvind M. "Comparative Oncogenomics Identifies Novel Regulators and Clinical Relevance of Neural Crest Identities in Melanoma." eScholarship@UMMS, 2012. http://escholarship.umassmed.edu/gsbs_diss/939.

Full text
Abstract:
Cancers often resurrect embryonic molecular programs to promote disease progression. In melanomas, which are tumors of the neural crest (NC) lineage, a molecular signature of the embryonic NC is often reactivated. These NC factors have been implicated in promoting pro-tumorigenic features like proliferation, migration and therapy resistance. However, the molecular mechanisms that establish and maintain NC identities in melanomas are largely unknown. Additionally, whether the presence of a NC identity has any clinical relevance for patient melanomas is also unclear. Here, using comparative genomic approaches, I have a) identified a novel role for GDF6-activated BMP signaling in reawakening a NC identity in melanomas, and b) identified a NC signature as a clinical predictor of melanoma progression. Like the genomes of many solid cancers, melanoma genomes have widespread copy number variations (CNV) harboring thousands of genes. To identify disease-promoting drivers amongst such huge numbers of genes, I used a comparative oncogenomics approach with zebrafish and human melanomas. This approach led to the identification of a recurrently amplified oncogene, GDF6, that acts via BMP signaling to invoke NC identities in melanomas. In maintaining this identity, GDF6 represses the melanocyte differentiation gene MITF and the proapoptotic factor SOX9, allowing melanoma cells to remain undifferentiated and survive. Functional analysis in zebrafish embryos indicated a role of GDF6 in blocking melanocyte differentiation, suggesting that the developmental function of GDF6 is reiterated in melanomas. In clinical assessments, a major fraction of patient melanomas expressed high GDF6, and its expression correlated with poor patient survival. These studies provide novel insights into regulation of NC identities in melanomas and offer GDF6 and components of BMP pathway as targets for therapeutic intervention. In additional studies, I wanted to test whether a broader NC identity in melanomas had any clinical relevance. In these studies, I performed transcriptome analysis of zebrafish melanomas and derived a 15-gene NC signature. This NC gene signature positively correlated with the expression of SOX10, a known NC marker in human melanomas. Patients whose melanomas expressed this signature showed poor overall survival. These findings identify an important predictive signature in human melanomas and also illuminate the clinical importance of NC identity in this disease.
APA, Harvard, Vancouver, ISO, and other styles
47

Venkatesan, Arvind M. "Comparative Oncogenomics Identifies Novel Regulators and Clinical Relevance of Neural Crest Identities in Melanoma." eScholarship@UMMS, 2017. https://escholarship.umassmed.edu/gsbs_diss/939.

Full text
Abstract:
Cancers often resurrect embryonic molecular programs to promote disease progression. In melanomas, which are tumors of the neural crest (NC) lineage, a molecular signature of the embryonic NC is often reactivated. These NC factors have been implicated in promoting pro-tumorigenic features like proliferation, migration and therapy resistance. However, the molecular mechanisms that establish and maintain NC identities in melanomas are largely unknown. Additionally, whether the presence of a NC identity has any clinical relevance for patient melanomas is also unclear. Here, using comparative genomic approaches, I have a) identified a novel role for GDF6-activated BMP signaling in reawakening a NC identity in melanomas, and b) identified a NC signature as a clinical predictor of melanoma progression. Like the genomes of many solid cancers, melanoma genomes have widespread copy number variations (CNV) harboring thousands of genes. To identify disease-promoting drivers amongst such huge numbers of genes, I used a comparative oncogenomics approach with zebrafish and human melanomas. This approach led to the identification of a recurrently amplified oncogene, GDF6, that acts via BMP signaling to invoke NC identities in melanomas. In maintaining this identity, GDF6 represses the melanocyte differentiation gene MITF and the proapoptotic factor SOX9, allowing melanoma cells to remain undifferentiated and survive. Functional analysis in zebrafish embryos indicated a role of GDF6 in blocking melanocyte differentiation, suggesting that the developmental function of GDF6 is reiterated in melanomas. In clinical assessments, a major fraction of patient melanomas expressed high GDF6, and its expression correlated with poor patient survival. These studies provide novel insights into regulation of NC identities in melanomas and offer GDF6 and components of BMP pathway as targets for therapeutic intervention. In additional studies, I wanted to test whether a broader NC identity in melanomas had any clinical relevance. In these studies, I performed transcriptome analysis of zebrafish melanomas and derived a 15-gene NC signature. This NC gene signature positively correlated with the expression of SOX10, a known NC marker in human melanomas. Patients whose melanomas expressed this signature showed poor overall survival. These findings identify an important predictive signature in human melanomas and also illuminate the clinical importance of NC identity in this disease.
APA, Harvard, Vancouver, ISO, and other styles
48

Anzola, Lagos Juan Manuel. "Computational identification and evolutionaty enalysis of metazoan micrornas." [College Station, Tex. : Texas A&M University, 2008. http://hdl.handle.net/1969.1/ETD-TAMU-3115.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Raborn, R. Taylor. "Genome-wide analysis of transcription initiation and promoter architecture in eukaryotes." Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/4728.

Full text
Abstract:
The transcriptome represents the entirety of RNA molecules within a cell or tissue at a given time. Recent advances have facilitated the production of large-scale, global interrogations of transcriptomes, finding that genomes are extensively transcribed and contain diverse classes of RNAs (Dinger et al., 2009). Information generated by high-throughput analyses of mRNA transcription start sites (TSSs) such as CAGE (Cap Analysis of Gene Expression) indicate that eukaryotic genomes have complex landscapes of transcription initiation. The TSS is important for the annotation of cis-regulatory sequences, because it provides a link between the mRNA transcript and the promoter. The patterns of TSS distributions observed within mRNA 5' end profiling studies prevent straightforward annotation of putative promoters. To address this challenge, we developed a method to identify- on a genome-wide basis- the putative promoter, which we define by TSS distributions and designate the transcription start region (TSR). We applied a clustering method to identify and annotate TSRs within the budding yeast Saccharomyces cerevisiae using a full-length cDNA dataset (Miura et al., 2006). To validate these TSR annotations, we performed an integrative genomic analysis using multiple datasets. Our method identified TSRs at positions consistent with bona fide promoters in S. cerevisiae. In addition, using 5'RACE, we find overall agreement between computationally-defined TSRs and TSSs identified experimentally. From this analysis, we find that a significant proportion of genes exhibiting alternative promoter usage within sporulation are associated with respiration, suggesting that this is regulated on a condition-specific basis in budding yeast. We further developed our TSS clustering method into a bioinformatics tool called TSRchitect, which identifies and annotates TSRs from large-scale TSS profiling information. TSRchitect is capable of handling both tag and sequence-based TSS information and efficiently computes TSRs from global TSS datasets on a desktop computer. We find support for TSRchitect's annotations in human from a CAGE experiment from the ENCODE (Encyclopedia of DNA Elements) project. Finally, we use TSRchitect to identify TSRs from the transcriptomes of diverse eukaryotes. We investigated the conservation of TSRs among orthologous genes. We frequently identify multiple TSRs for a given gene, suggesting that alternative promoter usage is widespread. Overall, using TSS profiling data derived from separate tissues within mouse and human, we find that the positions of TSRs are relatively stable across tissues surveyed; however, a small fraction of genes exhibit tissue-specific differences in TSR use. As transcriptome profiling information continues to be generated at an rapid pace, computational approaches are increasingly important. It is anticipated that the method and approach we describe within this dissertation will contribute to an improved of gene regulation and promoter architecture in eukaryotes.
APA, Harvard, Vancouver, ISO, and other styles
50

Steller, Matthew Michael. "A comparative analysis of gene expression among castes of the termite Reticulitermes flavipes using expressed sequence tags (ESTs) and a microarray." Thesis, Manhattan, Kan. : Kansas State University, 2009. http://hdl.handle.net/2097/1471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography