To see the other types of publications on this topic, follow the link: Metagenomic.

Dissertations / Theses on the topic 'Metagenomic'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Metagenomic.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Meyer, Quinton Christian. "Metagenomic approaches to gene discovery." Thesis, University of the Western Cape, 2006. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_7031_1182747173.

Full text
Abstract:

The classical approach to gene discovery has been to culture micro-organisms demonstrating a specific enzyme activity and then to recover the gene of interest through shotgun cloning. The realization that these standard microbiological methods provide limited access to the true microbial biodiversity and therefore the available microbial genetic diversity (collectively termed the Metagenome) has resulted in the development of environmental nucleic acid extraction technologies designed to access this wealth of genetic information, thereby avoiding the limitations of culture dependent genetic exploitation. In this work several gene discovery technologies was employed in an attempt to recover novel bacterial laccase genes (EC 1.10.3.2), a group of enzymes in which considerable biotechnological interest has been expressed. Metagenomic DNA extracted from two organic rich environmental samples was used as the source material for the construction of two genomic DNA libraries. The small insert plasmid based library derived from compost DNA consisted of approximately 106 clones at an average insert size of 2.7Kbp, equivalent to 2.6 Gbp of cloned environmental DNA. A Fosmid based large insert library derived from grape waste DNA consisted of approximately 44000 cfu at an average insert size of 25Kbp (1.1 Gbp cloned DNA). Both libraries were screened for laccase activity but failed to produce novel laccase genes. As an alternative approach, a multicopper oxidase specific PCR detection assay was developed using a laccase positive Streptomyces strain as a model organism. The newly designed primers were used to detect the presence of bacterial multicopper oxidases in environmental samples. This resulted in the identification of nine novel gene fragments showing identity ranging from 37 to 94% to published putative bacterial multicopper oxidase gene sequences. Three clones pMCO6, pMCO8 and pMCO9 were significantly smaller than those typically reported for bacterial laccases and were assigned to a recently described clade of Streptomyces bacterial multicopper oxidases.


Two PCR based techniques were employed to attempt the recovery of flanking regions for two of these genes (pMCO7 and pMCO8). The use of TAIL-PCR resulted in the recovery of 90% of the pMCO7 ORF. As an alternative approach the Vectorette&trade
system was employed to recover the 3&rsquo
downstream region of pMCO8. The complexity of the DNA sample proved to be a considerable technical challenge for the implementation of both these techniques. The feasibility of both these approaches were however demonstrated in principle. Finally, in an attempt to expedite the recovery of fulllength copies of these genes a subtractive hybridization magnetic bead capture technique was adapted and employed to recover a full &ndash
length putative multicopper oxidase gene from a Streptomyces strain in a proof of concept experiment. The StrepA06pMCO gene fragment was used as a &lsquo
driver&rsquo
against fragmented Streptomyces genomic DNA (&lsquo
tester&rsquo
) and resulted in the recovery of a 1215 bp open reading frame. Unexpectedly, this ORF showed only 80% identity to the StrepA06pMCO gene sequence at nucleotide level, and 48% amino acid identity to a putative mco gene derived from a Norcardioides sp JS614.

APA, Harvard, Vancouver, ISO, and other styles
2

Gaspar, John M. "Denoising amplicon-based metagenomic data." Thesis, University of New Hampshire, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3581214.

Full text
Abstract:

Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been written that can reduce error rates in mock community data, in which the true sequences are known, but they were designed to be used in studies of real communities. To evaluate the outcome of the denoising process, we developed methods that do not rely on a priori knowledge of the correct sequences, and we applied these methods to a real-world dataset. We found that the denoising algorithms had substantial negative side-effects on the sequence data. For example, in the most widely used denoising pipeline, AmpliconNoise, the algorithm that was designed to remove pyrosequencing errors changed the reads in a manner inconsistent with the known spectrum of these errors, until one of the parameters was increased substantially from its default value.

With these shortcomings in mind, we developed a novel denoising program, FlowClus. FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. FlowClus produced a lower error rate compared to other denoising algorithms when analyzing a mock community dataset, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from current protocols and irregular flow orders. It has processed a full plate (1.5 million reads) in less than four hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to less than seven minutes.

APA, Harvard, Vancouver, ISO, and other styles
3

Devakandan, Keshini. "Metagenomic characterization of the vaginal microbiome." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/60127.

Full text
Abstract:
Background: The vaginal microbiome is a dynamic environment colonized by a wide array of microorganisms. Although bacterial vaginosis (BV) is characterized by a disruption in the normal bacterial microbiome of the vagina, the factors contributing to recurrent BV remain unknown. In addition, very little is known about the role of viruses in the vaginal microbiome and associated dysbioses. Objectives: 1) characterize the vaginal bacteriome of women with recurrent BV using cpn60 sequencing, compare bacterial profiles to healthy-asymptomatic cohort, and correlate profiles to descriptive characteristics; and 2) characterize the vaginal virome of healthy-asymptomatic, HIV-positive women and women with recurrent BV, and correlate profiles to descriptive characteristics. Methods: Twenty-six women were recruited into the recurrent BV bacteriome study. Vaginal swabs were obtained for cpn60 sequencing and Gram stain Nugent scoring. Additionally, samples from 54 women were analyzed in the virome study: 21 healthy-asymptomatic, 25 HIV-positive and eight recurrent BV. The vaginal swabs were processed to enrich for viruses and then subjected to metagenomics shotgun sequencing. Demographic, behavioural and clinical information was collected for all participants, in both bacteriome and virome studies. Results: Bacteriome analyses detected 122 cpn60 operational taxonomic units (OTUs). Bacterial profiles clustered into six community state types (CSTs). Trends suggested a relationship between BV-associated CSTs and number of sexual partners (past year), oral sex, use of (hormonal) contraception, abnormal discharge (past 48 hours), lifetime history of trichomoniasis, and number of BV episodes (past two months and year). Virome analyses detected a total of 477 species. Viral profiles clustered into seven groups. Viral patterns were identified within bacteriome CSTs, Nugent scores, viral loads, between Lactobacillus-dominant, Lactobacillus iners-dominant, and heterogeneous profiles, and were associated with a number of descriptive characteristics. Conclusions: The vaginal microbiome is highly diverse and potentially associated with many clinical factors. Our ability to use the microbiome data to subdivide women into clusters, and detect trends between clusters and characteristics will expand our knowledge on the vaginal microbiome as a whole.
Medicine, Faculty of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
4

Mewis, Keith. "Functional metagenomic screening for glycoside hydrolases." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/60223.

Full text
Abstract:
Limitations on the cultivation of a majority of naturally occurring microbes have spurred the rise of culture-independent methods for the investigation of environmental microbial communities, a field known as metagenomics. This thesis addresses both functional and informatic approaches to metagenomics with the aim of improving our knowledge of carbohydrate degradation. A high throughput functional metagenomic screen was developed and applied to over 350,000 fosmid clones to search for glycoside hydrolases (GHs) in metagenomic libraries. Screening yielded 798 fosmid clones capable of hydrolyzing a model sugar compound, and the genes responsible were subcloned and biochemically characterized for pH and temperature stability, and substrate specificity. The combination of functional and in silico methods developed were used in a longitudinal study of the beaver (Castor canadensis) digestive tract, in order to gain insight into the sequential degradation of biomass. A linear model was used to identify enrichment of endo-acting versus exo-acting GH families at five locations throughout the digestive tract. The discovery of high numbers of GH43 family genes on functionally identified fosmids resulted in their combination with all other known GH43 genes in order to create subfamily classifications that provide finer resolution of enzyme activities. This classification system resulted in an improved ability to assign functional characteristics to enzymes identified through informatic studies. Of the 37 subfamilies created, only 22 contained a characterized enzyme. Fosmids identified earlier in this work harboured genes from four uncharacterized GH43 subfamilies, and future characterization efforts will further our understanding of the GH43 family. Altogether, the developed methods provide a framework for future studies of biomass degradation and improve the power of both functional and in silico metagenomics.
Science, Faculty of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
5

Bench, Shellie R. "Metagenomic characterization of Chesapeake Bay virioplankton." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 78 p, 2007. http://proquest.umi.com/pqdweb?did=1338865971&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Davis, Carina. "Metagenomic approaches to microbial source tracking." Thesis, University of Canterbury. School of Biological Sciences, 2013. http://hdl.handle.net/10092/8194.

Full text
Abstract:
Water sources are susceptible to faecal contamination from animal and human pollution sources. Pollution of our waterways has significant implications on human health, especially from a pathogen perspective. Microbial source tracking (MST) is a promising field which aims to identify the sources of faecal contamination, and thereby allowing for the development of effective management strategies to minimise pollution and the impact on human health. Many of the currently used methods rely on the identification of host-specific markers within the 16S ribosomal RNA (rRNA) gene of bacteria by use of amplification techniques such as polymerase chain reaction (PCR). However, these methods can be limited by sensitivity, quantification, geographical differences and issues of cost which can limit how many markers are evaluated. Developments in DNA sequencing technologies over the past decade have led to a number of next generation sequencing (NGS) platforms which have a rapid, high throughput approach, resulting in an exponential decrease in the cost of sequencing. This has enabled the development of sequence-based metagenomics, where entire communities from environmental samples can be analysed based on their genetic material. The ability to barcode allows for analysis of multiple samples at once, reducing the cost of sequencing environmental samples even further. This is a promising technique for MST, which has had little investigation to date. The primary focus of the studies described in this thesis was to evaluate the use of NGS technology through a metagenomic approach. Roche 454 amplicon sequencing was used to sequence a 16S rRNA gene target, amplified from faecal and water samples from various sources in New Zealand. Barcode strategies were incorporated in the amplification design to allow multiple samples to be sequenced simultaneously. A proof-of-concept study initially utilised a small sequence dataset to evaluate a range of analysis tools available. Taxonomic identification and diversity measures were used to evaluate a selection of currently available tools designed for analysing metagenomic data, with the Quantitative Insights Into Microbial Ecology (QIIME) platform decided upon for further studies. A larger study, including 35 faecal samples from 13 difference sources and 10 water samples, resulted in 522,065 raw sequencing reads. Diversity results suggest three phyla, Bacteroidetes, Firmicutes and Proteobacteria, are strongly represented across all faecal sources analysed. Microbial diversity analysis using clustering techniques provided evidence of host source being the largest influence on bacterial diversity, with samples from each source generally clustering together. This technique could not be used to identify sources of contamination sources in water samples as the water samples all clustered separately from the faecal samples. More successful was the use of taxonomic classifications to determine bacteria genera that were potentially specific to one source. Water samples were screened for these genera, with six out of the ten water samples being indicators of either ruminant or human contamination. Faecal and water samples were also analysed for a selection of published 16S rRNA PCR markers, using a computational motif-based search method. Of the twenty motifs screened for, 14 were found to be relatively source-specific for ruminant, human, dog or pig faecal samples, with some cross-reactivity with chicken and possum samples. Using this method, the contamination source for six of the ten water samples was identified, with the remaining four samples found to not have enough sequences to assess with confidence. Both metagenomic strategies produced comparable results which were consistent with previous MST analysis. This project demonstrates the potential application of next generation sequencing technologies to microbial source tracking, suggesting the possibility this approach to replace existing microbial source tracking methods.
APA, Harvard, Vancouver, ISO, and other styles
7

Chung, Ryan Kyong-doc. "Deep learning approach to metagenomic binning." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119755.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 39-41).
Understanding the diversity and abundance of microbial populations is paramount to the health of humans and the environment. Estimating the diversity of these populations from whole metagenome shotgun (WMS) sequencing reads is difficult because the size of these datasets and overlapping reads limit what kinds of analysis we can do. Current methods require matching reads to a database of known microbes. These methods are either too slow or lack the sensitivity needed to identify novel species. We propose a convolutional neural network (CNN) based approach to metagenomic binning that embeds reads into a low-dimensional vector space based on taxonomic classification. We show that our method can get the speed and sensitivity necessary taxonomic classification. Our method was able to achieve 13% accuracy on identifying novel genus of bacteria as compared to 7% accuracy of k-mer embedding. At the same time, the speed of our method is within an order of magnitude of that of k-mer embedding, making it viable as a metagenomic analysis tool.
by Ryan Kyong-doc Chung.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
8

Prost, Vincent. "Sparse unsupervised learning for metagenomic data." Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASL013.

Full text
Abstract:
Les avancées technologiques dans le séquençage ADN haut débit ont permis à la métagénomique de considérablement se développer lors de la dernière décennie. Le séquencage des espèces directement dans leur milieu naturel a ouvert de nouveaux horizons dans de nombreux domaines de recherche. La réduction des coûts associée à l'augmentation du débit fait que de plus en plus d'études sont lancées actuellement.Dans cette thèse nous considérons deux problèmes ardus en métagénomique, à savoir le clustering de lectures brutes et l'inférence de réseaux microbiens. Pour résoudre ces problèmes, nous proposons de mettre en oeuvre des méthodes d'apprentissage non supervisées utilisant le principe de parcimonie, ce qui prend la forme concrète de problèmes d'optimisation avec une pénalisation de norme l1.Dans la première partie de la thèse, on considère le problème intermédiaire du clustering des séquences ADN dans des partitions biologiquement pertinentes (binning). La plupart des méthodes computationelles n'effectuent le binning qu'après une étape d'assemblage qui est génératrice d'erreurs (avec la création de contigs chimériques) et de pertes d'information. C'est pourquoi nous nous penchons sur le problème du binning sans assemblage préalable. Nous exploitons le signal de co-abondance des espèces au travers des échantillons mesuré via le comptage des k-mers (sous-séquences de taille k) longs. L'utilisation du Local Sensitive Hashing (LSH) permet de contenir, au coût d'une approximation, l'explosion combinatoire des k-mers possibles dans un espace de cardinal fixé. La première contribution de la thèse est de proposer l'application d'une factorisation en matrices non-négatives creuses (sparse NMF) sur la matrice de comptage des k-mers afin de conjointement extraire une information de variation d'abondance et d'effectuer le clustering des k-mers. Nous montrons d'abord le bien fondé de l'approche au niveau théorique. Puis, nous explorons dans l'état de l'art les méthodes de sparse NMF les mieux adaptées à notre problème. Les méthodes d'apprentissage de dictionnaire en ligne ont particulièrement retenu notre attention de par leur capacité à passer à l'échelle pour des jeux de données comportant un très grand nombre de points. La validation des méthodes de binning en métagénomique sur des données réelles étant difficile à cause de l'absence de vérité terrain, nous avons créé et utilisé plusieurs jeux de données synthétiques pour l'évaluation des différentes méthodes. Nous montrons que l'application de la sparse NMF améliore les méthodes de l'état de l'art pour le binning sur ces jeux de données. Des expérience sur des données métagénomiques réelles issus de 1135 échantillons de microbiotes intestinaux d'individus sains ont également été menées afin de montrer la pertinence de l'approche.Dans la seconde partie de la thèse, on considère les données métagénomiques après le profilage taxonomique, c'est à dire des donnés multivariées représentant les niveaux d'abondance des taxons au sein des échantillons. Les microbes vivant en communautés structurées par des interactions écologiques, il est important de pouvoir identifier ces interactions. Nous nous penchons donc sur le problème de l'inférence de réseau d'interactions microbiennes à partir des profils taxonomiques. Ce problème est souvent abordé dans le cadre théorique des modèles graphiques gaussiens (GGM), pour lequel il existe des algorithmes d'inférence puissants tel que le graphical lasso. Mais les méthodes statistiques existantes sont très limitées par l'aspect extrêmement creux des profils taxonomiques que l'on rencontre en métagénomique, notamment par la grande proportion de zéros dits biologiques (i.e. liés à l'absence réelle de taxons). Nous proposons un model log normal avec inflation de zéro visant à traiter ces zéros biologiques et nous montrons un gain de performance par rapport aux méthodes de l'état de l'art pour l'inférence de réseau d'interactions microbiennes
The development of massively parallel sequencing technologies enables to sequence DNA at high-throughput and low cost, fueling the rise of metagenomics which is the study of complex microbial communities sequenced in their natural environment.Metagenomic problems are usually computationally difficult and are further complicated by the massive amount of data involved.In this thesis we consider two different metagenomics problems: 1. raw reads binning and 2. microbial network inference from taxonomic abundance profiles. We address them using unsupervised machine learning methods leveraging the parsimony principle, typically involving l1 penalized log-likelihood maximization.The assembly of genomes from raw metagenomic datasets is a challenging task akin to assembling a mixture of large puzzles composed of billions or trillions of pieces (DNA sequences). In the first part of this thesis, we consider the related task of clustering sequences into biologically meaningful partitions (binning). Most of the existing computational tools perform binning after read assembly as a pre-processing, which is error-prone (yielding artifacts like chimeric contigs) and discards vast amounts of information in the form of unassembled reads (up to 50% for highly diverse metagenomes). This motivated us to try to address the raw read binning (without prior assembly) problem. We exploit the co-abundance of species across samples as discriminative signal. Abundance is usually measured via the number of occurrences of long k-mers (subsequences of size k). The use of Local Sensitive Hashing (LSH) allows us to contain, at the cost of some approximation, the combinatorial explosion of long k-mers indexing. The first contribution of this thesis is to propose a sparse Non-Negative Matrix factorization (NMF) of the samples x k-mers count matrix in order to extract abundance variation signals. We first show that using sparse NMF is well-grounded since data is a sparse linear mixture of non-negative components. Sparse NMF exploiting online dictionary learning algorithms retained our attention, including its decent behavior on largely asymmetric data matrices. The validation of metagenomic binning being difficult on real datasets, because of the absence of ground truth, we created and used several benchmarks for the different methods evaluated on. We illustrated that sparse NMF improves state of the art binning methods on those datasets. Experiments conducted on a real metagenomic cohort of 1135 human gut microbiota showed the relevance of the approach.In the second part of the thesis, we consider metagenomic data after taxonomic profiling: multivariate data representing abundances of taxa across samples. It is known that microbes live in communities structured by ecological interaction between the members of the community. We focus on the problem of the inference of microbial interaction networks from taxonomic profiles. This problem is frequently cast into the paradigm of Gaussian graphical models (GGMs) for which efficient structure inference algorithms are available, like the graphical lasso. Unfortunately, GGMs or variants thereof can not properly account for the extremely sparse patterns occurring in real-world metagenomic taxonomic profiles. In particular, structural zeros corresponding to true absences of biological signals fail to be properly handled by most statistical methods. We present in this part a zero-inflated log-normal graphical model specifically aimed at handling such "biological" zeros, and demonstrate significant performance gains over state-of-the-art statistical methods for the inference of microbial association networks, with most notable gains obtained when analyzing taxonomic profiles displaying sparsity levels on par with real-world metagenomic datasets
APA, Harvard, Vancouver, ISO, and other styles
9

Schuch, Viviane [UNESP]. "Construção de biblioteca metagenômica para prospecção de genes envolvidos na biossíntese de antibióticos." Universidade Estadual Paulista (UNESP), 2007. http://hdl.handle.net/11449/94940.

Full text
Abstract:
Made available in DSpace on 2014-06-11T19:27:23Z (GMT). No. of bitstreams: 0 Previous issue date: 2007-02-28Bitstream added on 2014-06-13T20:47:49Z : No. of bitstreams: 1 schuch_v_me_jabo.pdf: 3089029 bytes, checksum: 0835ef08e49e97cfdf7ad571bdfc3671 (MD5)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Metabólitos secundários são compostos bioativos, com grande importância para a indústria farmacêutica e agropecuária, produzidos por certos grupos de microrganismos e plantas. Os policetídeos, que são sintetizados por complexos enzimáticos denominados policetídeos sintases (PKSs), desatacam-se entre os metabólitos secundários conhecidos e compõe a estrutura química básica de vários antibióticos. Todos os genes envolvidos na biossíntese de um policetídeo se encontram agrupados fisicamente no cromossomo, e contém genes que são altamente conservados, comumente chamados d~ pks mínima. Os métodos tradicionais para pesquisa de novas drogas, que envolvem o cultivo de microrganismos isolados do solo, não são mais tão promissores, devido à alta taxa de redescoberta de antibióticos já conhecidos, que chega a 99,9%, e à pequena parcela de microrganismos do solo que são cultiváveis pelas técnicas padrões de cultivo, cerca de 1 %. A Metagenômica é uma abordagem promissora que permite acessar o genoma desses organismos incultiváveis, pois consiste na extração de DNA diretamente do ambiente e construção de uma biblioteca com este genoma misto. Neste trabalho descrevemos a construção de uma biblioteca feita com DNA de alto peso molecular isolado diretamente de solo coletado sob arboreto de eucaliptos no Estado de São Paulo, Brasil. A biblioteca possui 9.320 clones e foi construída em vetor cosmídeo, com insertos de tamanho variando entre 30 e 45kb...
Secondary metabolites are bioactive compounds with great importance in the pharmaceutical and agriculture industries, procuced by a few groups of microrganisms and plants. The polyketides that are synthetized by enzimatic complexes, denominated polyketides synthases, outstand among the secondary known metabolites, which are part of the main structure of many antibiotics. Ali genes involved in the biosynthesis of antibiotics are found as clusters in the chromossome. The traditional methods for the research of new drugs that are made from microrganisms cultures isolated from the soil are not so promissing, due to the high rate of rediscorevy of already known species, reaching 99.9%. The other small piece of microrganisms are culturable by standards culture methods, reaching 1 % maximum. Metagenomics is a promissing approach that allows the access to genom of these organisms that are not culturable, as it is carried out by DNA extraction directly from the environment and construction of a mixed genomic library. In this work, we describe the construction of a library made from high molecular weight DNA isolated directly form the soi! undemeath a pinus forest in the State of São Paulo, Brazil. The library shows 9.320 dones and it was constructed in a cosmideo vector, with insert size ranging from 30 to 45 kb. Digestion with difterent restriction enzymes of cosmidial DNA randomly chosen allowed to visualize evident difterences in the restriction fragments among the clones, as does the possibility to determine the average insert size. The initial evaluation of the presence of genes involved in the biosynthesis of antibiotics synthesized by the enzymatic system PKS of kind I, was accomplished by the PCR amplification of clones from the library using specific primers. We studied 4.320 clones and the results suggest a great variety of these genes. The PCR products obtained were sequenced for the determination of identity of the amplified gene.
APA, Harvard, Vancouver, ISO, and other styles
10

Morfopoulou, S. "Bayesian mixture models for metagenomic community profiling." Thesis, University College London (University of London), 2015. http://discovery.ucl.ac.uk/1473450/.

Full text
Abstract:
Metagenomics can be defined as the study of DNA sequences from environmental or community samples. This is a rapidly progressing field and application ideas that seemed outlandish a few years ago are now routine and familiar. Metagenomics’ scope is broad and includes the analysis of a diverse set of samples such as environmental or clinical samples. Human tissues are in essence metagenomic samples due to the presence of microorganisms, such as bacteria, viruses and fungi in both healthy and diseased individuals. Deep sequencing of clinical samples is now an established tool for pathogen detection, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, particularly for viruses. The research presented in this thesis focuses on using Bayesian Mixture Model techniques to produce taxonomic profiles for metagenomic data. A novel Bayesian mixture model framework for resolving complex metagenomic mixtures is introduced, called metaMix. The use of parallel Monte Carlo Markov chains (MCMC) for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. The improved accuracy of metaMix compared to relevant methods is demonstrated, particularly for profiling complex communities consisting of several related species. metaMix was designed specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection. However, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix.
APA, Harvard, Vancouver, ISO, and other styles
11

Tithi, Saima Sultana. "Computational Analysis of Viruses in Metagenomic Data." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/97194.

Full text
Abstract:
Viruses have huge impact on controlling diseases and regulating many key ecosystem processes. As metagenomic data can contain many microbiomes including many viruses, by analyzing metagenomic data we can analyze many viruses at the same time. The first step towards analyzing metagenomic data is to identify and quantify viruses present in the data. In order to answer this question, we developed a computational pipeline, FastViromeExplorer. FastViromeExplorer leverages a pseudoalignment based approach, which is faster than the traditional alignment based approach to quickly align millions/billions of reads. Application of FastViromeExplorer on both human gut samples and environmental samples shows that our tool can successfully identify viruses and quantify the abundances of viruses quickly and accurately even for a large data set. As viruses are getting increased attention in recent times, most of the viruses are still unknown or uncategorized. To discover novel viruses from metagenomic data, we developed a computational pipeline named FVE-novel. FVE-novel leverages a hybrid of both reference based and de novo assembly approach to recover novel viruses from metagenomic data. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two different strains of known phages. Analysis of viral assemblies from metagenomic data reveals that viral assemblies often contain assembly errors like chimeric sequences which means more than one viral genomes are incorrectly assembled together. In order to identify and fix these types of assembly errors, we developed a computational tool called VirChecker. Our tool can identify and fix assembly errors due to chimeric assembly. VirChecker also extends the assembly as much as possible to complete it and then annotates the extended and improved assembly. Application of VirChecker to viral scaffolds collected from an ocean meatgenome sample shows that our tool successfully fixes the assembly errors and extends two novel virus genomes and two strains of known phage genomes.
Doctor of Philosophy
Virus, the most abundant micro-organism on earth has a profound impact on human health and environment. Analyzing metagenomic data for viruses has the beneFIt of analyzing many viruses at a time without the need of cultivating them in the lab environment. Here, in this dissertation, we addressed three research problems of analyzing viruses from metagenomic data. To analyze viruses in metagenomic data, the first question needs to answer is what viruses are there and at what quantity. To answer this question, we developed a computational pipeline, FastViromeExplorer. Our tool can identify viruses from metagenomic data and quantify the abundances of viruses present in the data quickly and accurately even for a large data set. To recover novel virus genomes from metagenomic data, we developed a computational pipeline named FVE-novel. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two strains of known phages. Examination of viral assemblies from metagenomic data reveals that due to the complex nature of metagenome data, viral assemblies often contain assembly errors and are incomplete. To solve this problem, we developed a computational pipeline, named VirChecker, to polish, extend and annotate viral assemblies. Application of VirChecker to virus genomes recovered from an ocean metagenome sample shows that our tool successfully extended and completed those virus genomes.
APA, Harvard, Vancouver, ISO, and other styles
12

Kumar, Ashwani. "Optimizing Parameters for High-quality Metagenomic Assembly." Miami University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=miami1437997082.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

icardi, sara. "Lignocellulose degradation: a proteomic and metagenomic study." Doctoral thesis, Università del Piemonte Orientale, 2018. http://hdl.handle.net/11579/97185.

Full text
Abstract:
Wood decay processes have recently attracted so much attention, as lignocellulose biomass (LCB) represents the most abundant renewable resource on the Earth and can provide fermentable sugar monomers convertible into value-added products. In order to improve the efficiency and ecological sustainability of the process, new insights about lignocellulosic biomass microbial degradation could be of fundamental importance. Organic matter rich environmental samples may host a large variety of microbes, most of them specialized in the degradation of LCB and thus important as potential sources of biochemical catalysts for value added products production, as well as for the global carbon cycle. The aim of this thesis is to study the LCB degradation by two different approaches, exploiting proteomic and metagenomic tools. Proteomic analyses were conducted on the secretomes of a bacterium, Cellulomonas fimi, grown in presence of carboxymethyl-cellulose or different pretreated LCBs as unique carbon sources. Zymography and enzyme activity assays confirmed the lignocellulose degrading capabilities of C. fimi, showing endoglucanase and xylanase activities. The comparison among secretomes (in terms of enzymatic activities and protein composition) obtained after growth on different substrates highlighted: i) the major proteins and CAZymes (Carbohydrate Active enZymes) secreted and involved in LCB degradation and ii) the substrate influence on the secretome protein composition and enzymatic activity. Metagenomic analyses were indeed conducted on two groups of representative samples (two decaying woods and two control soils) in order to characterize the microbial communities inhabiting them. The microorganisms (bacteria and fungi) found to be more represented in decaying wood samples than in soils could be considered the most probably responsible for wood degradation.
APA, Harvard, Vancouver, ISO, and other styles
14

Lourenço, Marcus Venicius de Mello. "Contexto genômico e expressão de genes envolvidos na redução do sulfato em solos de manguezal." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/11/11138/tde-23012017-172654/.

Full text
Abstract:
Os manguezais compõem um bioma de interface entre o continente e o oceano em regiões intertropicais, ambiente este caracterizado por condições únicas ambientais e uma elevada biodiversidade. Este projeto tem como objetivo estudar, utilizando abordagens de metagenômica e metatranscriptomica, as comunidades microbianas encontradas nos manguezais localizados nos municípios de Bertioga/SP e Cananeia/SP, com enfoque nos genes relacionados ao processo de redução do sulfato. Para tanto, uma biblioteca metagenômica contendo 12.960 clones em vetor fosmídeo foi triada por meio de PCR específico para o gene dsrB, ao mesmo passo que esta foi completamente sequenciada em plataforma Illumina HiSeq2000. Foram obtidos três insertos metagenomicos (23D5, MGV 10001431 e MGV 10016026, com 31, 31 e 34 kb, respectivamente). Estes foram então anotados e analisados mais detalhadamente. A inserção 23D5 foi a única a apresentar genes essenciais para a redução dissimilatória do sulfato (apr, hdr, dsr, sat). A diversidade taxonômica dos grupos relacionados ao ciclo do enxofre demonstrou a predominância dos filos Bacteroidetes e Proteobacteria enquanto a análise filogenética para gene dsrB apresentou diferenças entre os três insertos, afiliando os mesmos a sequências similares a Firmicutes e Deltaproteobacteria e revelando serem diferentes das sequências presentes em base de dados. A análise de metatrascriptomica dos quatro manguezais apresentou um padrão de expressão diferencial para o cluster dsr de acordo com o estado de conservação dos manguezais estudados. Estes resultados compõem o primeiro acesso a fragmentos genômicos e a funcionalidade dos mesmos em microrganismos redutores de sulfato em solos de manguezais.
Mangrove is a biome composed of the interface between the continent and the ocean in tropical areas, characterizing by unique environmental conditions and high biodiversity. Here, we aimed to study, using metagenomic and metatranscriptomic approaches, the microbial communities identified in the mangroves located in the cities of Bertioga/SP and Cananeia/SP, focusing on genes related to the sulfate reduction process. For this purpose, a metagenomic library containing 12.960 clones in fosmid vector was screened by PCR for the specific dsrB gene, and the whole library was also completely sequenced by the Illumina HiSeq2000 platform. Three metagenomic inserts were obtained (23D5, MGV 10016026 and MGV 10001431, with 31, 31 and 34 kb, respectively), which were recorded and detail analyzed. The insertion 23D5 was the only one that presents essential genes for dissimilatory sulfate reduction (apr, hdr, dsr, sat). The taxonomic diversity of groups related to the sulfur cycle demonstrated the predominance of Bacteroidetes and Proteobacteria phyla, while phylogenetic analysis to dsrB gene showed differences between the three inserts, affiliating them to similar sequences of Firmicutes and Deltaproteobacteria, and revealing differ from the sequences present in the data base. The metatranscriptomic analysis of the four mangroves showed a pattern of differential expression for the DSR cluster according to the conservation status of the studied mangroves. These results constitute the first access of genomic fragments and functionality of the sulfate reducing microorganisms in mangrove soils.
APA, Harvard, Vancouver, ISO, and other styles
15

Angell, Scott Edward. "Genomic and metagenomic approaches to natural product chemistry." [College Station, Tex. : Texas A&M University, 2008. http://hdl.handle.net/1969.1/ETD-TAMU-2671.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Davenport, Colin. "Genomic and metagenomic application of microbial genome signatures." Hannover Bibliothek der Medizinischen Hochschule Hannover, 2010. http://d-nb.info/100117173X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Jackson, Frances. "Metabolic phenotyping and metagenomic analysis of developing infants." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/58184.

Full text
Abstract:
Early life experiences, including mode of delivery and nutrition during the neonatal period, have been proven to have an impact on health in later life. Studying human metabolic development has major implications for understanding the aetiology and risk of disease, including metabolic syndrome. Initially, a sample preparation protocol was developed and optimised using metabonomic procedures for studying urine and faeces from infants, to accommodate for limited sample volume and to take into account the compositional differences between adult and infant biofluids. This primarily indicated that age is an important variable that contributes to the metabolic profile of biofluids. Faecal metabonomics is fast becoming a useful tool for defining interactions among host, microbial communities and nutritional interventions. Infant development trajectory was assessed through analysis of faecal metabolic profiling by 1H NMR. A large non-clinical cohort longitudinal study was obtained; 1802 faecal samples from 524 infants at 6 time points from 4 days to 730 days postpartum. Furthermore, 1H NMR, UPLC-MS and metagenomic phenotyping techniques was performed on urine (n=278) and faecal (n=308) samples from 150 infants born term or preterm (< 37 wks gestational age). This multi-omics approach provided further demonstration of contribution of microbial co-metabolites to infant metabolism early in life and therefore the potential impact on overall health. This PhD project was able to identify certain metabolic pathways which were shown to be different in relation to gestation age as well as postnatal age, mode of delivery, BMI status and nutrition. In particular, choline and methylamine derivatives (e.g. betaine, trimethylamine), short chain fatty acids (SCFA) and amino acids related to nutrition and the gut microbiome functionality as well as metabolites indicating infant renal development from birth (e.g. myo-inositol, 1-N-methylnicotinamide). Overall, these investigations have shown that an understanding of the sources of variation in biofluid metabolite profiles are essential for interpretation of data acquired during normal infant development.
APA, Harvard, Vancouver, ISO, and other styles
18

Lu, Mingji [Verfasser]. "Metagenomic approaches to discover lipolytic enzymes / Mingji Lu." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://d-nb.info/1233481355/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Laver, Thomas William. "Evaluating metagenomic quantifications from next-generation sequencing data." Thesis, University of Exeter, 2014. http://hdl.handle.net/10871/17439.

Full text
Abstract:
Molecular profiling is exploiting the unprecedented power of next generation DNA sequencing to illuminate the microbial diversity of the natural world. The composition of microbiomes has been implicated as an important factor in human health and the function of ecosystems. It is thus of great importance that measurements of microbiomes are accurate and reliable, and moreover it is essential that the accuracy and reliability of such measurements are well understood. This project sought to provide assessments of the accuracy and precision of measurements made by 16S rDNA amplicon sequencing and whole genome shotgun sequencing, as well as investigate the impact of different experimental and bioinformatics choices on quantitative measurements. To address these aims next generation sequencing data from a well quantified metagenomic control material was utilized. Good precision and accuracy were recorded for 16S primer pairs which were perfectly complementary to the target organisms. Where primers were not perfectly complementary to an organism, its abundance was underestimated. Whole genome shotgun sequencing demonstrated very high levels of precision, with a mean coefficient of variation of 2%, and showed good agreement with the 16S rDNA amplicon sequencing using primer pairs optimized specifically for the target species. Small changes in relative species abundance (less than three fold) should be treated with caution as this thesis demonstrated that sequencing results for species can vary by this amount from digital polymerase chain reaction results. Issues with publically available 16S rDNA sequence databases contribute to a lack of taxonomic resolution; taxa measured at low abundance are also likely to be artifacts of the analysis. In addition to the established sequencing platforms, this thesis also investigated the performance of a promising new experimental DNA sequencing platform developed by Oxford Nanopore Technologies (ONT). The ONT MinION, has an error rate of greater than 40% and, while it produces exceptionally long reads, it is not yet suitable for quantitative metagenomics. This thesis also demonstrated that the use of control materials in molecular profiling is important to verify findings and to understand the impact different experimental and bioinformatics choices have on measurements of the microbiome.
APA, Harvard, Vancouver, ISO, and other styles
20

Gupta, Suraj. "Metagenomic Data Analysis Using Extremely Randomized Tree Algorithm." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/96025.

Full text
Abstract:
Many antibiotic resistance genes (ARGs) conferring resistance to a broad range of antibiotics have often been detected in aquatic environments such as untreated and treated wastewater, river and surface water. ARG proliferation in the aquatic environment could depend upon various factors such as geospatial variations, the type of aquatic body, and the type of wastewater (untreated or treated) discharged into these aquatic environments. Likewise, the strong interconnectivity of aquatic systems may accelerate the spread of ARGs through them. Hence a comparative and a holistic study of different aquatic environments is required to appropriately comprehend the problem of antibiotic resistance. Many studies approach this issue using molecular techniques such as metagenomic sequencing and metagenomic data analysis. Such analyses compare the broad spectrum of ARGs in water and wastewater samples, but these studies use comparisons which are limited to similarity/dissimilarity analyses. However, in such analyses, the discriminatory ARGs (associated ARGs driving such similarity/ dissimilarity measures) may not be identified. Consequentially, the reason which drives the dissimilarities among the samples would not be identified and the reason for antibiotic resistance proliferation may not be clearly understood. In this study, an effective methodology, using Extremely Randomized Trees (ET) Algorithm, was formulated and demonstrated to capture such ARG variations and identify discriminatory ARGs among environmentally derived metagenomes. In this study, data were grouped by: geographic location (to understand the spread of ARGs globally), untreated vs. treated wastewater (to see the effectiveness of WWTPs in removing ARGs), and different aquatic habitats (to understand the impact and spread within aquatic habitats). It was observed that there were certain ARGs which were specific to wastewater samples from certain locations suggesting that site-specific factors can have a certain effect in shaping ARG profiles. Comparing untreated and treated wastewater samples from different WWTPs revealed that biological treatments have a definite impact on shaping the ARG profile. While there were several ARGs which got removed after the treatment, there were some ARGs which showed an increase in relative abundance irrespective of location and treatment plant specific variables. On comparing different aquatic environments, the algorithm identified ARGs which were specific to certain environments. The algorithm captured certain ARGs which were specific to hospital discharges when compared with other aquatic environments. It was determined that the proposed method was efficient in identifying the discriminatory ARGs which could classify the samples according to their groups. Further, it was also effective in capturing low-level variations which generally get over-shadowed in the analysis due to highly abundant genes. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance.
MS
APA, Harvard, Vancouver, ISO, and other styles
21

Robitaille, Nicolas. "METAGENOMIC ANALYSIS OF THE DEVELOPING PERI-IMPLANT SULCUS." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1434667746.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Kelly, Jennifer. "Metagenomic and genomic analysis of the skin microbiota." Thesis, University of Liverpool, 2013. http://livrepository.liverpool.ac.uk/15893/.

Full text
Abstract:
Following birth the skin is rapidly colonised by microorganisms that, over time, delineate into niche-specific microbial communities that often exhibit specific host-associated functions. Due to local physiological conditions, the axilla boasts a unique microbial community that has been implicated in malodour generation via the biotransformation of odourless host-secreted substrates. To more comprehensively understand the role of the axillary microbiome in malodour generation, axillary samples of subjects exhibiting high and low malodour profiles were subject to metagenomic sequencing. Metagenomics is a relatively novel whole-genome shotgun technique that utilises high-throughput sequencing to taxonomically and functionally characterise microbial communities. Prior to the axillary analysis, an in vitro synthetic microbial community of known composition was created and subject to metagenomic sequencing and analysis to determine which methods most accurately represent the taxonomic and functional composition of a microbial community. Additionally, to allow a more thorough understanding of the intraspecies diversity of the most abundant skin genus Staphylococcus, the commensal resident Staphylococcus epidermidis and the closely related pathogen Staphylococcus aureus were both subject to comparative pan-genome analysis. Utilising a direct whole-genome sequencing approach revealed that Corynebacterium might not dominate the axillary microbiota as predominantly as previously thought. A wide range of microbial clades were associated with high levels of axillary malodour, however only the four following species-level groups were enriched: Corynebacterium amycolatum, Corynebacterium kroppenstedtii, Finegoldia magna and Kocuria rhizophila. The characterised ability of certain corynebacterial species to generate malodorous compounds indicates that C. amycolatum and C. kroppenstedtii may play a major role towards the generation of axillary malodour. Pan-genome analysis of the most abundant skin isolate S. epidermidis and its relative S. aureus resulted in the complete description of the core genome of both species, and revealed that S. epidermidis exhibits a much higher degree of intra-species variability than S. aureus. Also, although both species occupy distinctly divergent life-styles, a large proportion of the conserved function was present in the core-genomes of both species, indicating a high degree of shared conservation. Utilisation of high-throughput sequencing technologies allowed a more in-depth analysis of the axillary microbiota and the intraspecies variability of S. epidermidis and S. aureus.
APA, Harvard, Vancouver, ISO, and other styles
23

Ohlhoff, Colin Walter. "Biopolymer gene discovery and characterization using metagenomic libraries." Thesis, Link to the online version, 2008. http://hdl.handle.net/10019/1801.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Goode, Ann Marie Liles Mark Russell. "Polyketide synthase pathway discovery from soil metagenomic libraries." Auburn, Ala., 2009. http://hdl.handle.net/10415/1805.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Tyler, Heather Lee. "Plant-associated bacteria biological, genomic, and metagenomic studies /." [Gainesville, Fla.] : University of Florida, 2009. http://purl.fcla.edu/fcla/etd/UFE0041068.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Sohn, Michael B. "Novel Computational and Statistical Approaches in Metagenomic Studies." Diss., The University of Arizona, 2015. http://hdl.handle.net/10150/556866.

Full text
Abstract:
Metagenomics has a great potential to discover previously unattainable information about microbial communities. The simplest, but extremely powerful approach for studying the characteristics of a microbial community is the analysis of differential abundance, which tries to identify differentially abundant features (e.g. species or genes) across different communities. For instance, detection of differentially abundant microbes across healthy and diseased groups can enable us to identify potential pathogens or probiotics. However, the analysis of differential abundance could mislead us about the characteristics of microbial communities if the counts or abundance of features on different scales are not properly normalized within and between communities. An important prerequisite for the analysis of differential abundance is to accurately estimate the composition of microbial communities, which is commonly known as the analysis of taxonomic composition. Most of prevalent approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree. In this study, two novel methods are developed: one for the analysis of taxonomic composition, called Taxonomic Analysis by Elimination and Correction (TAEC) and the other for the analysis of differential abundance, called Ratio Approach for Identifying Differential Abundance (RAIDA). TAEC utilizes the alignment similarity between known genomes in addition to the similarity between query sequences and sequences of known genomes. It is comprehensively tested on various simulated datasets of diverse complexity of bacterial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in the abundance of bacteria in a given microbial sample. RAIDA utilizes an invariant property of the ratio between the abundance of features, that is, a ratio between the relative abundance of two features is the same as a ratio between the absolute abundance of two features. Through comprehensive simulation studies the performance of RAIDA is consistently powerful and under some situations it greatly surpasses other existing methods for the analysis of differential abundance in metagenomic studies.
APA, Harvard, Vancouver, ISO, and other styles
27

Lebó, Marko. "Přímá klasifikace metagenomických signálů ze sekvenace nanopórem." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400964.

Full text
Abstract:
This diploma thesis deals with taxonomy independent methods for classification of metagenomic signals, aquired by a MinION sequencer. It describes the formation and character of metagenomic data and already existing methods of metagenomic data classification and their development. This thesis also evaluates an impact of the third generation sequencing techniques in the world of metagenomics and further specialises on the function of the Oxford Nanopore MinION sequencing device. Lastly, a custom method for metagenomic data classification, based on data obtained from a MinION sequencer, is proposed and compared with an already existing method of classification.
APA, Harvard, Vancouver, ISO, and other styles
28

Wang, Yi, and 王毅. "Binning and annotation for metagenomic next-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/208040.

Full text
Abstract:
The development of next-generation sequencing technology enables us to obtain a vast number of short reads from metagenomic samples. In metagenomic samples, the reads from different species are mixed together. So, metagenomic binning has been introduced to cluster reads from the same or closely related species and metagenomic annotation is introduced to predict the taxonomic information of each read. Both metagenomic binning and annotation are critical steps in downstream analysis. This thesis discusses the difficulties of these two computational problems and proposes two algorithmic methods, MetaCluster 5.0 and MetaAnnotator, as solutions. There are six major challenges in metagenomic binning: (1) the lack of reference genomes; (2) uneven abundance ratios; (3) short read lengths; (4) a large number of species; (5) the existence of species with extremely-low-abundance; and (6) recovering low-abundance species. To solve these problems, I propose a two-round binning method, MetaCluster 5.0. The improvement achieved by MetaCluster 5.0 is based on three major observations. First, the short q-mer (length-q substring of the sequence with q = 4, 5) frequency distributions of individual sufficiently long fragments sampled from the same genome are more similar than those sampled from different genomes. Second, sufficiently long w-mers (length-w substring of the sequence with w ≈ 30) are usually unique in each individual genome. Third, the k-mer (length-k substring of the sequence with k ≈ 16) frequencies from reads of a species are usually linearly proportional to that of the species’ abundance. The metagenomic annotation methods in the literatures often suffer from five major drawbacks: (1) unable to annotate many reads; (2) less precise annotation for reads and more incorrect annotation for contigs; (3) unable to deal with novel clades with limited references genomes well; (4) performance affected by variable genome sequence similarities between different clades; and (5) high time complexity. In this thesis, a novel tool, MetaAnnotator, is proposed to tackle these problems. There are four major contributions of MetaAnnotator. Firstly, instead of annotating reads/contigs independently, a cluster of reads/contigs are annotated as a whole. Secondly, multiple reference databases are integrated. Thirdly, for each individual clade, quadratic discriminant analysis is applied to capture the similarities between reference sequences in the clade. Fourthly, instead of using alignment tools, MetaAnnotator perform annotation using k-mer exact match which is more efficient. Experiments on both simulated datasets and real datasets show that MetaCluster 5.0 and MetaAnnotator outperform existing tools with higher accuracy as well as less time and space cost.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
29

Newgas, Sophie Alice. "Biocatalysis using plant and metagenomic enzymes for organic synthesis." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10052003/.

Full text
Abstract:
Biocatalysts provide an excellent alternative to traditional organic chemistry strategies, with advantages such as mild reaction conditions and high enantio- and stereoselectivities. The use of metagenomics has enabled new enzymes to be sourced with high sequence diversity. At UCL a metagenomics strategy has been developed for enzyme discovery, in which the library generated is annotated and searched for desired enzyme sequences. In this PhD, a metagenomic approach was used to retrieve 37 short chain reductase/dehydrogenases (SDRs) from an oral environment metagenome. Eight enzymes displayed activity towards cyclohexanone and their substrate selectivities were investigated. Four of the SDRs displayed activity to the Wieland-Miescher ketone (WMK), a motif found in several pharmaceutically relevant compounds. SDR-17 displayed high conversions and stereoselectivities and was co-expressed with the co-factor recycling enzyme glucose-6-phosphate dehydrogenase. This system was then successfully used to reduce (R)-WMK on a preparative scale reaction in 89% isolated yield and > 99% e.e. In further studies using reductases, the substrate specificities of two ketoreductases known as tropinone reductase I and II (TRI and TRII respectively) from the plant D. stramonium and MecgoR from E. coca were investigated. These studies expanded on reported substrate activities with these enzymes in the literature. A selection of symmetric and asymmetric tropinone analogues were synthesised, towards which MecgoR and TRI showed high activities, providing a strategy to access novel alcohols. Furthermore, sixteen ketoreductases were selected from a drain metagenome based on their sequence similarity of over 24% to MecgoR. They were annotated as aldo/keto reductases (ARKs) and five were successfully expressed in E. coli. Interestingly, the novel enzyme AKR-3 displayed activities toward aromatic ketones and aldehydes such as 2-indanone, phenylacetaldehyde and benzaldehyde. Transaminases (TAms) from the enzyme library toolbox at UCL were also tested with tropinone analogues and related cyclic compounds, several of which showed good activities.
APA, Harvard, Vancouver, ISO, and other styles
30

Shah, Shivani. "Graph sparsification and unsupervised machine learning for metagenomic binning." Thesis, Tours, 2019. http://theses.scd.univ-tours.fr/index.php?fichier=2019/shivani.shah_18225.pdf.

Full text
Abstract:
La métagénomique est le domaine de la biologie qui concerne l’étude du contenu génomique des communautés microbiennes directement dans leur environnement. Les données métagénomiques utilisées dans ces travaux de thèse correspondent à des technologies de séquençage produisant des fragments d’ADN courts (reads). L'une des étapes clé de l'analyse des données métagénomiques et développée dans cette étude est le regroupement de reads, appelé également binning. Lors de cette tâche de binning, des groupes (bins) doivent être formés de sorte que chaque groupe soit composé de reads provenant de la même espèce ou genre. La méthodologie traditionnelle consiste à effectuer cette étape sur des séquences plus grandes (contigs), mais cette étape génère potentiellement des séquences dites chimériques. L'un des problèmes liés au binning appliqué aux lectures est lié à la taille importante des jeux de données. La méthodologie traditionnelle appliquée sur les reads, accable les ressources de calcul. Par conséquent, il est nécessaire de développer des approches de binning adaptables à de données massives.Dans cette thèse, nous abordons ce problème en proposant une méthode évolutive pour effectuer le binning. Nous positionnons notre travail parmi les approches de binning basées sur la composition et dans un contexte totalement non supervisé. Afin de réduire la complexité de la tâche de binning, des méthodes sont proposées pour filtrer préalablement les associations entre les données. Le développement de l'approche a été réalisé en deux étapes. D'abord, la méthodologie a été évaluée sur des ensembles de données métagénomiques plus petits (composés de quelques milliers de points). Dans un deuxième temps, nous proposons d’adapter cette approche à des ensembles de données plus volumineux (composés de millions de points) avec des méthodes d’indexation sensibles à la similarité (LSH). La thèse comporte trois contributions majeures.Premièrement, nous proposons un ensemble varié d’algorithmes de filtrage d’associations entre les données (reads) par l’intermédiaire de graphes de proximité. Ces graphes de proximité sont construits pour capturer les relations les plus pertinentes entre reads pour la tâche de binning. Nous exploitons par suite des algorithmes de détection de communautés sur ces graphes pour identifier les groupes de reads d’intérêts. Une étude exploratoire a été réalisée avec plusieurs graphes de proximité et algorithmes de détection de communautés sur trois jeux de données métagénomiques. Suite à cette étude, nous proposons une approche pipeline nommée ProxiClust couplant la construction d’un graphe de type kNN et l’algorithme Louvain de détection de communautés.Deuxièmement, afin d’adresser le problème de la scalabilité et aborder des jeux de données plus volumineux, la matrice de similarité utilisée dans le pipeline est remplacée par l’exploitation de tables de hachage sensibles à la similarité d’intérêt construites à partir de l'approche LSH Sim-Hash. Nous introduisons deux stratégies pour construire des graphes de proximité à partir des tables de hachage: 1) le graphe des microclusters et 2) le graphe kNN approché. Les performances et les limites de ces graphes ont été évaluées sur de grands ensembles de données MC et discutées. Sur la base de cette étude, nous retenons le graphe kNN mutuels comme le graphe de proximité le plus approprié pour les grands ensembles de données. Cette proposition a également été évaluée et confirmée sur des données de séquences métagénomiques de référence issues du challenge international CAMI.Enfin, nous examinons des approches de hachage alternatives pour construire des tables de hachage de meilleures qualités. L’approche de hachage dépendante des données ITQ est introduite et exploitée, puis nous en proposons deux variantes : orthogonale (ITQ-OrthSH) et non orthogonale (ITQ-SH). Ces approches de hachage ont été évaluées et discutées sur les données de reads massives à disposition
Metagenomics is the field biology that relates to the study of genomic content of microbial communities directly in their natural environments. The metagenomic data is generated by sequencing technology that take the enviormental samples as the input. The generated data is composed of short fragments of DNA (called reads), which originate from genomes of all species present in the sample. The datasets size range from thousands to millions of reads. One of the steps of metagenomic data analysis is binning of the reads. In binning groups (called bins) are to be formed such that each group is composed of reads which are likely to originate from the same specie or specie family. It has essentially been treated as a task of clustering in the metagenomic literature. One of the challenges in binning occurs due to the large size of the datasets. The method overwhelms the computational resources required while performing the task. Hence the development of binning approaches which are scalable to large datasets is required.In this thesis, we address this issue by proposing a scalable method to perform binning. We position our work among the compositional based binning approaches (use of short kmers) and in completely unsupervised context. On order to decrease the complexity of the binning task, methods are proposed to perform sparsification of the data prior to clustering. The development of the approach has been performed in two steps. First the idea has been evaluated on smaller metagenomic datasets (composed of few thousands of points). In the second step, we propose to scale this approach to larger datasets (composed of Millions of points) with similarity based indexing methods (LSH approaches). There are three major contributions of the thesis.First, we propose the idea of performing sparsification of the data with proximity graphs, prior to clustering. The proximity graphs are built on the data to capture pair-wise relationships between data points that are relevant for clustering. Then we leverage community detection algorithms on these graphs to identify clusters from the data. An exploratory study has been performed with several proximity graphs and community detection algorithm on three metagenomic datasets. Based on this study we propose an approach named ProxiClust with KNN graph and Louvain community detection to perform binning.Second, to scale this approach to larger datasets the distance matrix in the pipeline is replaced with hash tables built from Sim-hash LSH approach. We introduce two strategies to build proximity graphs from the hash tables: 1) Microclusters graph and 2) Approximate k nearest neighbour graph. The performance of these graphs have been evaluated on large MC datasets. The performance and limitations of these graphs are discussed. The baseline evaluation of these datasets have also been performed to determine their clustering difficulty. Based on this study we propose Mutual-KNN graph to be the appropriate proximity graph for the large datasets. This proposal has also evaluated and confirmed on the CAMI benchmark metagenomic datasets.Lastly, we examine alternative hashing approaches to build better quality hash tables. A data-dependent hashing approach ITQ and orthogonal version of Sim-hash have been included. Two new data dependent hashing approaches named ITQ-SH and ITQ-OrthSH are introduced. All the hashing approaches have been evaluated w.r.t their ability to hash the MC datasets with high precision and recall. AndThe introduction of Mutual-KNN as the appropriate proximity graph has led to new challenges in the pipeline. First, large number of clusters are generated due to high number of components in the Mutual-KNN graph. So, in order to obtain appropriate number of clusters, a strategy needs to be devised to merge the similar clusters. Also an approach to build Mutual-KNN graph from hash tables needs to be designed. This would complete the ProxiClust pipeline for the large datasets
APA, Harvard, Vancouver, ISO, and other styles
31

Arango, Argoty Gustavo Alonso. "Computational Tools for Annotating Antibiotic Resistance in Metagenomic Data." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/88987.

Full text
Abstract:
Metagenomics has become a reliable tool for the analysis of the microbial diversity and the molecular mechanisms carried out by microbial communities. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. Interpretation of specific information from metagenomic data is especially a challenge for environmental samples as current annotation systems only offer broad classification of microbial diversity and function. Therefore, I developed MetaStorm, a public web-service that facilitates customization of computational analysis for metagenomic data. The identification of antibiotic resistance genes (ARGs) from metagenomic data is carried out by searches against curated databases producing a high rate of false negatives. Thus, I developed DeepARG, a deep learning approach that uses the distribution of sequence alignments to predict over 30 antibiotic resistance categories with a high accuracy. Curation of ARGs is a labor intensive process where errors can be easily propagated. Thus, I developed ARGminer, a web platform dedicated to the annotation and inspection of ARGs by using crowdsourcing. Effective environmental monitoring tools should ideally capture not only ARGs, but also mobile genetic elements and indicators of co-selective forces, such as metal resistance genes. Here, I introduce NanoARG, an online computational resource that takes advantage of the long reads produced by nanopore sequencing technology to provide insights into mobility, co-selection, and pathogenicity. Sequence alignment has been one of the preferred methods for analyzing metagenomic data. However, it is slow and requires high computing resources. Therefore, I developed MetaMLP, a machine learning approach that uses a novel representation of protein sequences to perform classifications over protein functions. The method is accurate, is able to identify a larger number of hits compared to sequence alignments, and is >50 times faster than sequence alignment techniques.
Doctor of Philosophy
Antimicrobial resistance (AMR) is one of the biggest threats to human public health. It has been estimated that the number of deaths caused by AMR will surpass the ones caused by cancer on 2050. The seriousness of these projections requires urgent actions to understand and control the spread of AMR. In the last few years, metagenomics has stand out as a reliable tool for the analysis of the microbial diversity and the AMR. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics, a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. In particular, by the development of computational pipelines to process metagenomics data in the cloud and distributed systems, the development of machine learning and deep learning tools to ease the computational cost of detecting antibiotic resistance genes in metagenomic data, and the integration of crowdsourcing as a way to curate and validate antibiotic resistance genes.
APA, Harvard, Vancouver, ISO, and other styles
32

Plis, Kevin A. "The Effects of Novel Feature Vectors on Metagenomic Classification." Ohio University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1399578867.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Shtarkman, Yury M. "Metagenomic And Metatranscriptomic Analyses Of Lake Vostok Accretion Ice." Bowling Green State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1438867879.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Proal, Amy. "Autoimmune disease re-examined in light of metagenomic concepts." Thesis, Proal, Amy (2012) Autoimmune disease re-examined in light of metagenomic concepts. PhD thesis, Murdoch University, 2012. https://researchrepository.murdoch.edu.au/id/eprint/8484/.

Full text
Abstract:
The concept of autoantibodies was developed at a time when, due to the limitations of culturebased techniques, the human body was considered to be largely sterile. However, over the past few years, researchers in the emerging field of metagenomics have developed molecular tools that instead allow microbes to be identified by their genomic fingerprints. These tools have opened a door to an era of tremendous discovery. Homo sapiens has been shown to harbor thousands of species of microbes in tissue and blood that were previously undetectable. Today it is estimated that around 90% of the cells in the human body are microbial, and that the genes of these microbes outnumber our own by a factor of at least 10:1. The genomes of intracellular microbes can directly interact with our own genomes, meaning that humans may be best described as superorganisms. When populations of these microbes interfere too much with the metabolism of Homo sapiens, the resulting changes in the proteome can lead to disease. This suggests that the inflammation observed in "autoimmune" disease may instead result from an effort by the innate immune system to target pathogens and restore microbial homeostasis. Many intracellular microbes survive by dysregulating the expression of genes and antimicrobials via key nuclear receptors. The VDR nuclear receptor plays a critical role by expressing cathelicidin and TLR2, the primary intracellular defenses. It appears that the pathogens that cause autoimmune disease accumulate during a lifetime, with individuals increasingly accumulating microbes as the innate immune response becomes incrementally compromised. One reason that autoimmune disease is more common in women may be that they have an additional site of VDR expression, in the cycling endometrium. Thus, they may more easily acquire microbial loads than their male counterparts. The interaction of many different microbes acting in concert is more likely to cause a particular autoimmune condition rather than, as Koch suggested, a single organism. This helps account for the high levels of comorbidity observed amongst patients with autoimmune conditions. Autoantibodies are increasingly being identified as the body's response to specific pathogens, with collateral damage from these antibodies exacerbating the disease process. The possibility that microbes drive the autoimmune disease state calls for a re-evaluation of how these diseases are routinely treated. While the standard of care for autoimmune disease remains the use of medications that slow the immune response, treatments aimed at eradicating pathogens would attempt instead to stimulate the body's antimicrobial defenses. We have collaborated with American and international clinicians to research a therapy designed to reactivate the innate immune response in patients with autoimmune disease. Our case series demonstrate that patients generally report symptomatic improvement, but only after experiencing temporary increases in inflammation and disease symptoms. This is likely due to immunopathology - a reaction in which the release of cytokines and cellular debris accompany microbial death. Thus we must reconsider the long-term consequences of using immunosuppressive substances. For example, the secosteroid vitamin D reduces inflammation, but may do so at the expense of slowing the innate immune response and its ability to target underlying pathogens. Furthermore, the concept of vitamin D "deficiency" may itself be flawed. The low levels of 25-D in many patients with inflammatory conditions may be a result rather than a cause of the disease process. Conventional interpretation of other out- of-range metabolites must be similarly re-examined. This work offers a novel framework with which to understand and treat inflammatory disease, with broad implications across many disciplines. Efforts to further validate this model are needed, taking researchers down entirely new avenues of exploration.
APA, Harvard, Vancouver, ISO, and other styles
35

Rampelli, Simone <1985&gt. "Metagenomic trajectory of gut microbiome in the human lifespan." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6333/1/Rampelli_thesis_2014.pdf.

Full text
Abstract:
Co-evolving with the human host, gut microbiota establishes configurations, which vary under the pressure of inflammation, disease, ageing, diet and lifestyle. In order to describe the multi-stability of the microbiome-host relationship, we studied specific tracts of the bacterial trajectory during the human lifespan and we characterized peculiar deviations from the hypothetical development, caused by disease, using molecular techniques, such as phylogenetic microarray and next-generation sequencing. Firstly, we characterized the enterocyte-associated microbiota in breast-fed infants and adults, describing remarkable differences between the two groups of subjects. Successively, we investigated the impact of atopy on the development of the microbiome in Italian childrens, highlithing conspicuous deviations from the child-type microbiota of the Italian controls. To explore variation in the gut microbiota depending on geographical origins, which reflect different lifestyles, we compared the phylogenetic diversity of the intestinal microbiota of the Hadza hunter-gatherers of Tanzania and Italian adults. Additionally, we characterized the aged-type microbiome, describing the changes occurred in the metabolic potential of the gut microbiota of centenarians with respect to younger individuals, as a part of the pathophysiolology of the ageing process. Finally, we evaluated the impact of a probiotics intervention on the intestinal microbiota of elderly people, showing the repair of some age-related dysbioses. These studies contribute to elucidate several aspects of the intestinal microbiome development during the human lifespan, depicting the microbiota as an extremely plastic entity, capable of being reconfigured in response to different environmental factors and/or stressors of endogenous origin.
APA, Harvard, Vancouver, ISO, and other styles
36

Rampelli, Simone <1985&gt. "Metagenomic trajectory of gut microbiome in the human lifespan." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amsdottorato.unibo.it/6333/.

Full text
Abstract:
Co-evolving with the human host, gut microbiota establishes configurations, which vary under the pressure of inflammation, disease, ageing, diet and lifestyle. In order to describe the multi-stability of the microbiome-host relationship, we studied specific tracts of the bacterial trajectory during the human lifespan and we characterized peculiar deviations from the hypothetical development, caused by disease, using molecular techniques, such as phylogenetic microarray and next-generation sequencing. Firstly, we characterized the enterocyte-associated microbiota in breast-fed infants and adults, describing remarkable differences between the two groups of subjects. Successively, we investigated the impact of atopy on the development of the microbiome in Italian childrens, highlithing conspicuous deviations from the child-type microbiota of the Italian controls. To explore variation in the gut microbiota depending on geographical origins, which reflect different lifestyles, we compared the phylogenetic diversity of the intestinal microbiota of the Hadza hunter-gatherers of Tanzania and Italian adults. Additionally, we characterized the aged-type microbiome, describing the changes occurred in the metabolic potential of the gut microbiota of centenarians with respect to younger individuals, as a part of the pathophysiolology of the ageing process. Finally, we evaluated the impact of a probiotics intervention on the intestinal microbiota of elderly people, showing the repair of some age-related dysbioses. These studies contribute to elucidate several aspects of the intestinal microbiome development during the human lifespan, depicting the microbiota as an extremely plastic entity, capable of being reconfigured in response to different environmental factors and/or stressors of endogenous origin.
APA, Harvard, Vancouver, ISO, and other styles
37

Beghini, Francesco. "Integrative computational microbial genomics for large-scale metagenomic analyses." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/296396.

Full text
Abstract:
Advancements of DNA sequencing technologies and improvement of analytic methods changed the way we analyze complex microbial communities (metagenomics). In only a few years, these methods have evolved so far as to ease a more precise community profiling and to allow high-level strain resolution. A typical computational metagenomic analysis relies on mapping raw DNA sequencing reads against sets of “reference” microbial genomes usually obtained through single-isolate sequencing. With an almost exponential increase in the number of reference genomes deposited daily in public data sets, current computational methods are incapable of managing and exploiting such a rich reference set, limiting the potential of metagenomic investigations.In my doctoral thesis, I will present my contribution towards fully exploiting the available reference data for metagenomic analysis. I developed ChocoPhlAn, an integrated pipeline for automatic retrieval, organization, and annotation of reference genomes and gene families as the foundation for bioBakery 3, an improved set of computational methods for the analysis of shotgun metagenomics data. Using the latest set of microbial genomic reference data available and processed through ChocoPhlAn, the six bioBakery 3 tools that I updated resulted in more comprehensive and higher resolution taxonomic and functional profiling of microbiomes and allowed strain-level characterization of their constituent strains. After extensive benchmarks with previous versions and competitors, we applied those methods on more than 10,000 real metagenomes and showed how metagenomics can be a more powerful tool for identifying novel links between the gut microbiome and disease conditions such as colorectal cancer and Inflammatory Bowel Disease. Accurate strain-level phylogeny reconstruction and pangenomic analysis of 7,783 metagenomes revealed novel functional, phylogenetic, and geographic diversity of Ruminococcus bromii, a common and highprevalent gut inhabitant. We then focused on the influence of the Eukaryotic fraction of the human microbiome and its potential impact on human gut health, which is a frequently overlooked aspect of microbial communities. To this end, we assessed the presence of the Eukaryotic parasite Blastocystis spp., in more than 2,000 metagenomes from 5 continents for understanding associations with disease statuses and geographic conditions. We showed that Blastocystis is the most common Eukaryotic colonizer of the human gut, and it is particularly prevalent in healthy subjects and non-westernized populations. We further explored intra-subtype diversity by reconstructing and functionally profiling new metagenomic-assembled Blastocystis genomes, showing how metagenomics can be valuable to unravel protists' genomics and providing a genomic resource for additional integration of non-bacterial taxa in metagenomic pipelines.9 By developing and implementing ChocoPhlAn and the new bioBakery tools, we provided the community with improved and efficient microbiome profiling tools and started identifying novel patterns of association between host and niche-associated microbiomes and discovering previously uncharacterized species from human and non-human hosts.
APA, Harvard, Vancouver, ISO, and other styles
38

Beghini, Francesco. "Integrative computational microbial genomics for large-scale metagenomic analyses." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/296396.

Full text
Abstract:
Advancements of DNA sequencing technologies and improvement of analytic methods changed the way we analyze complex microbial communities (metagenomics). In only a few years, these methods have evolved so far as to ease a more precise community profiling and to allow high-level strain resolution. A typical computational metagenomic analysis relies on mapping raw DNA sequencing reads against sets of “reference” microbial genomes usually obtained through single-isolate sequencing. With an almost exponential increase in the number of reference genomes deposited daily in public data sets, current computational methods are incapable of managing and exploiting such a rich reference set, limiting the potential of metagenomic investigations.In my doctoral thesis, I will present my contribution towards fully exploiting the available reference data for metagenomic analysis. I developed ChocoPhlAn, an integrated pipeline for automatic retrieval, organization, and annotation of reference genomes and gene families as the foundation for bioBakery 3, an improved set of computational methods for the analysis of shotgun metagenomics data. Using the latest set of microbial genomic reference data available and processed through ChocoPhlAn, the six bioBakery 3 tools that I updated resulted in more comprehensive and higher resolution taxonomic and functional profiling of microbiomes and allowed strain-level characterization of their constituent strains. After extensive benchmarks with previous versions and competitors, we applied those methods on more than 10,000 real metagenomes and showed how metagenomics can be a more powerful tool for identifying novel links between the gut microbiome and disease conditions such as colorectal cancer and Inflammatory Bowel Disease. Accurate strain-level phylogeny reconstruction and pangenomic analysis of 7,783 metagenomes revealed novel functional, phylogenetic, and geographic diversity of Ruminococcus bromii, a common and highprevalent gut inhabitant. We then focused on the influence of the Eukaryotic fraction of the human microbiome and its potential impact on human gut health, which is a frequently overlooked aspect of microbial communities. To this end, we assessed the presence of the Eukaryotic parasite Blastocystis spp., in more than 2,000 metagenomes from 5 continents for understanding associations with disease statuses and geographic conditions. We showed that Blastocystis is the most common Eukaryotic colonizer of the human gut, and it is particularly prevalent in healthy subjects and non-westernized populations. We further explored intra-subtype diversity by reconstructing and functionally profiling new metagenomic-assembled Blastocystis genomes, showing how metagenomics can be valuable to unravel protists' genomics and providing a genomic resource for additional integration of non-bacterial taxa in metagenomic pipelines.9 By developing and implementing ChocoPhlAn and the new bioBakery tools, we provided the community with improved and efficient microbiome profiling tools and started identifying novel patterns of association between host and niche-associated microbiomes and discovering previously uncharacterized species from human and non-human hosts.
APA, Harvard, Vancouver, ISO, and other styles
39

Demozzi, Michele. "Identification of novel active Cas9 orthologs from metagenomic data." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/337709.

Full text
Abstract:
CRISPR-Cas is the state-of-the-art biological tool that allows precise and fast manipulation of the genetic information of cellular genomes. The translation of the CRISPR-Cas technology from in vitro studies into clinical applications highlighted a variety of limitations: the currently available systems are limited by their off-target activity, the availability of a Cas-specific PAM sequence next to the target and the size of the Cas protein. In particular, despite high levels of activity, the size of the CRISPR-SpCas9 editing machinery is not compatible with an all-in-one AAV delivery system and the genomic sequences that can be targeted are limited by the 3-NGG PAM-dependency of the SpCas9 protein. To further expand the CRISPR tools repertoire we turned to metagenomic data of the human microbiome to search for uncharacterized CRISPR-Cas9 systems and we identified a set of novel small Cas9 orthologs derived from the analysis of reconstructed bacterial metagenomes. In this thesis study, ten candidates were chosen according to their size (less than 1100aa). The PAM preference of all the ten orthologs was established exploiting a bacterial-based and an in vitro platform. We demonstrated that three of them are active nucleases in human cells and two out of the three showed robust editing levels at endogenous loci, outperforming SpCas9 at particular targets. We expect these new variants to be very useful in expanding the available genome editing tools both in vitro and in vivo. Knock-out-based Cas9 applications are very efficient but many times a precise control of the repair outcome through HDR-mediated gene targeting is required. To address this issue, we also developed an MS2-based reporter platform to measure the frequency of HDR events and evaluate novel HDR-modulating factors. The platform was validated and could allow the screening of libraries of proteins to assess their influence on the HDR pathway.
APA, Harvard, Vancouver, ISO, and other styles
40

Lysholm, Fredrik. "Bioinformatic methods for characterization of viral pathogens in metagenomic samples." Doctoral thesis, Linköpings universitet, Bioinformatik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-86194.

Full text
Abstract:
Virus infections impose a huge disease burden on humanity and new viruses are continuously found. As most studies of viral disease are limited to theinvestigation of known viruses, it is important to characterize all circulating viruses. Thus, a broad and unselective exploration of the virus flora would be the most productive development of modern virology. Fueled by the reduction in sequencing costs and the unbiased nature of shotgun sequencing, viral metagenomics has rapidly become the strategy of choice for this exploration. This thesis mainly focuses on improving key methods used in viral metagenomics as well as the complete viral characterization of two sets of samples using these methods. The major methods developed are an efficient automated analysis pipeline for metagenomics data and two novel, more accurate, alignment algorithms for 454 sequencing data. The automated pipeline facilitates rapid, complete and effortless analysis of metagenomics samples, which in turn enables detection of potential pathogens, for instance in patient samples. The two new alignment algorithms developed cover comparisons both against nucleotide and  protein databases, while retaining the underlying 454 data representation. Furthermore, a simulator for 454 data was developed in order to evaluate these methods. This simulator is currently the fastest and most complete simulator of 454 data, which enables further development of algorithms and methods. Finally, we have successfully used these methods to fully characterize a multitude of samples, including samples collected from children suffering from severe lower respiratory tract infections as well as patients diagnosed with chronic fatigue syndrome, both of which presented in this thesis. In these studies, a complete viral characterization has revealed the presence of both expected and unexpected viral pathogens as well as many potential novel viruses.
APA, Harvard, Vancouver, ISO, and other styles
41

Al-Absi, Thabit. "Efficient Characterization of Short Anelloviruses Fragments Found in Metagenomic Samples." Thesis, Linköpings universitet, Bioinformatik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-85813.

Full text
Abstract:
Some viral metagenomic serum samples contain a huge amount of Anellovirus, which is a genetically diverse family with a few conserved regions making it hard to efficiently characterize. Multiple sequence alignment of the Anelloviruses found in the sample must be constructed to get a clear picture of Anellovirus diversity and to identify stable regions. Using available multiple sequence alignment software directly on these fragments results in an MSA of a very poor quality due to their diversity, misaligned regions and low-quality regions present in the sequence. An efficient MSA must be constructed in order to characterize these Anellovirus present in the samples. Pairwise alignment is used to align one fragment to the database sequences at a time. The fragments are then aligned to the database sequences using the start and end position from the pairwise alignment results. The algorithm will also exclude non-aligned portions of the fragments, as these are very hard to handle properly and are often products of misassembly or chimeric sequenced fragments. Other tools to aid further analysis were developed, such as finding a non-overlapping window that contains the most fragments, find consensus of the alignment and extract any regions from the MSA for further analysis. An MSA was constructed with a high percent of correctly aligned bases compared to an MSA constructed using MSA softwares. The minimal number of genomes found in the sampled sequence was found as well as a distribution of the fragments along the database sequence. Moreover, highly conserved region and the window containing most fragments were extracted from the MSA and phylogenetic trees were constructed for these regions.
APA, Harvard, Vancouver, ISO, and other styles
42

Durno, W. Evan. "Precise correlation and metagenomic binning uncovers fine microbial community structure." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/62360.

Full text
Abstract:
Bacteria and Archaea represent the invisible majority of living things on Earth with an estimated numerical abundance exceeding 10^30 cells. This estimate surpasses the number of grains of sand on Earth and stars in the known universe. Interdependent microbial communities drive fluxes of matter and energy underlying biogeochemical processes, and provide essential ecosystem functions and services that help create the operating conditions for life. Despite their abundance and functional imperative, the vast majority of microorganisms remain uncultivated in laboratory settings, and therefore remain extremely difficult to study. Recent advances in high-throughput sequencing are opening a multi-omic (DNA and RNA) window to the structure and function of microbial communities providing new insights into coupled biogeochemical cycling and the metabolic problem solving power of otherwise uncultivated microbial dark matter (MDM). These technological advances have created bottlenecks with respect to information processing, and innovative bioinformatics solutions are required to analyze immense biological data sets. This is particularly apparent when dealing with metagenome assembly, population genome binning, and network analysis. This work investigates combined use of single-cell amplifed genomes (SAGs) and metagenomes to more precisely construct population genome bins and evaluates the use of covariance matrix regularization methods to identify putative metabolic interdependencies at the population and community levels of organization. Applying dimensional reduction with principal components and a Gaussian mixture model to k-mer statistics from SAGs and metagenomes is shown to bin more precisely, and has been implemented as a novel pipeline, SAG Extrapolator (SAGEX). Also, correlation networks derived from small subunit ribosomal RNA gene sequences are shown to be more precisely inferred through regularization with factor analysis models applied via Gaussian copula. SAGEX and regularized correlation are applied toward 368 SAGs and 91 metagenomes, postulating populations’ metabolic capabilities via binning, and constraining interpretations via correlation. The application describes coupled biogeochemical cycling in low-oxygen waters. Use of SAGEX leverages SAGs’ deep taxonomic descriptions and metagenomes’ breadth, produces precise population genome bins, and enables metabolic reconstruction and analysis of population dynamics over time. Regularizing correlation networks overcomes a known analytic bottleneck based in precision limitations.
Science, Faculty of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
43

Booyse, Dean. "Characterisation of a DNA ligase from an Antarctic metagenomic library." Thesis, University of the Western Cape, 2011. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_4236_1366182940.

Full text
Abstract:

A metagenomic gene library prepared from soil found beneath a mummified seal carcass in the Miers Valley, Antarctica, suggests an environment rich in uncharacterised biodiversity including enzymes with possible application to industrial processes. A sequence based gene mining investigation was performed on a clone, which archives a metagenomic sequence from this environment. The sequence was annotated using de novo bioinformatics and molecular biology techniques. A predicted NAD+-dependent DNA ligase, ligDB1 was selected for further characterisation. LigDB1 encodes a gene product that contains all the sequence features of a functional ligase. The protein was overexpressed in a heterologous E. coli host and purified to homogeneity. LigDB1 did not exhibit nick sealing activity, but was able to perform AMP-dependent DNA relaxation in the presence of high concentrations of enzyme. DNA modifying enzymes from cold environments perform optimally at low temperatures and may be of use as molecular tools in biotechnology. Complete characterisation of this enzyme is subject to further investigations.

APA, Harvard, Vancouver, ISO, and other styles
44

Spiegelman, Dan. "Exploring the fusion of metagenomic library and DNA microarray technologies." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98805.

Full text
Abstract:
We explored the combination of metagenomic library and DNA microarray technologies into a single platform as a novel way to rapidly screen metagenomic libraries for genetic targets. In the "metagenomic microarray" system, metagenomic library clone DNA is printed on a microarray surface, and clones of interest are detected by hybridization to single-gene probes. This study represents the initial steps in the development of this technology. We constructed two 5,000-clone large-insert metagenomic libraries from two diesel-contaminated Arctic soil samples. We developed and optimized an automated fosmid purification protocol to rapidly-extract clone DNA in a high-throughput 96-well format. We then created a series of small prototype arrays to optimize various parameters of microarray printing and hybridization, to identify and resolve technical challenges, and to provide proof-of-principle of this novel application. Our results suggest that this method shows promise, but more experimentation must be done to establish the feasibility of this approach.
APA, Harvard, Vancouver, ISO, and other styles
45

Meakin, Nicholas G. "Metagenomic analyses of marine new production under elevated CO2 conditions." Thesis, University of Stirling, 2009. http://hdl.handle.net/1893/1555.

Full text
Abstract:
A mesocosm experiment was carried out in a Norwegian fjord near Bergen in May 2006, with the main objective being the study of the effects of increasing concentrations of atmospheric CO2 (and associated effects such as increased acidification) on blooms of natural marine coastal plankton. Three mesocosms were bubbled with CO2(g) to achieve a high (~700ppm) CO2 concentration (pH ~7.8) to simulate predicted future conditions as a result of rising atmospheric CO2 concentrations. Another three mesocosms were treated as controls and bubbled with ambient air to represent a near pre-industrial scenario (atmospheric CO2 concentration ~300ppm, surface seawater pH ~8.15). Blooms in the mesocosms were stimulated by the addition of nutrients at a near-Redfield ratio ([N:P] ≈ [16:1]), and scientific measurements and analyses were carried out over the course of the blooms for approximately one month. Of particular interest in this study were the autotrophic plankton. The diversity and activities of these microorganisms under the two treatments was therefore investigated. By designing and using new degenerate primers specifically targeting ‘Green-type’ (Form IA and IB), ‘Red-type’ (Form IC and ID) and Form II RuBisCO, analysis of primary producers was carried out using PCR and either gDNA or cDNA (mRNA) templates from key time points spanning the complete duration of the blooms throughout the mesocosm experiment. Over 1250 novel RuBisCO large subunit sequences have been fully annotated and deposited in the NCBI GenBank® database. These sequences revealed distinct changes in the diversity of primary producers both over the courses of the blooms and between treatments. Particularly striking was the effect of acidification on the community structure of the eukaryotic picoplankton, Prasinophytes. A clade of prasinophytes closely related to Micromonas pusilla showed a distinct preference for the high CO2 conditions; a laboratory-based experiment confirmed the high tolerance of Micromonas pusilla to lower pH. Conversely, a clade related to Bathycoccus prasinos was almost entirely excluded from the high CO2 treatments. Clades of form II RuBisCO-containing dinoflagellates were also abundant throughout the experiment in both treatments. The high similarity of some of these clades to the toxin-producing species Heterocapsa triquetra and Gonyaulax polyedra, and apparent high tolerance of some clades to high CO2 conditions, is perhaps cause for concern in a high CO2 world and demands further research. In parallel with the RubisCO work, new primers were designed that target the gene encoding the Fe protein of nitrogenase (NifH). 82 Bergen genomic nifH sequences have been annotated and submitted to GenBank®. These sequences include those from organisms related to Alpha, Beta, and Gammaproteobacteria, and Cluster II and Cluster III sequences that align most closely with anaerobic Bacteria, Gram positive, and/or sulphur-reducing Bacteria. The biggest surprise, however, was the apparent abundance and significance of a Rhodobacter sphaeroides-like microorganism throughout the duration of the experiment in both treatments. Whilst this clade was unsurprisingly absent in the RuBisCO cDNA libraries, all but two of 128 nifH cDNA clones analysed were identical to the gene from Rhodobacter sphaeroides. This shows that this clade was potentially fixing N2 throughout the entire experiment, even in the presence of combined N added to both sets of mesocosms at the start of the experiment. A group of Rhodobacter sphaeroides-like microorganisms present at Bergen may therefore have been an unexpected source of new N during the experiment and contributed to the maintenance of the mesocosm communities as nutrients became depleted. One organism dominated the autotrophic communities after the blooms in both treatments. Synechococcus spp. Form IA rbcL clones most closely related to the coastal strain Synechococcus sp. strain CC9902 were recovered throughout the experiment but were particularly numerous toward the end of the experiment and dominated the “Green-type” libraries at this time. Initially, rbcL clones from these cyanobacteria were mostly derived from the ambient CO2 mesocosms but were equally distributed between treatments by the end of the experiment. This suggests that cyanobacteria related to strain CC9902 may be less tolerant of elevated CO2 (which was greatest at the beginning rather than the end of the experiment). However, despite the mesocosms being Pi-limited at the end of the experiment, several Synechococcus species (including those related to strain CC9902 and another coastal strain, CC9311) thrived. Following on from this observation, Pi uptake and assimilation mechanisms in a Synechococcus species were investigated in the laboratory. This led to the sequencing and characterisation of a pstS gene from the marine cyanobacterium Synechococcus sp. WH 8103. Unlike conventional pstS, it was discovered that the pstS II gene in this organism is constitutively expressed and unresponsive to or only weakly regulated by Pi supply. The use of PstS/pstS as a marker for P-limitation in natural samples, therefore, should be interpreted with caution.
APA, Harvard, Vancouver, ISO, and other styles
46

Booysen, Dean. "Characterisation of a DNA ligase from an Antarctic metagenomic library." Thesis, University of the Western Cape, 2011. http://hdl.handle.net/11394/3637.

Full text
Abstract:
A metagenomic gene library prepared from soil found beneath a mummified seal carcass in the Miers Valley, Antarctica, suggests an environment rich in uncharacterised biodiversity including enzymes with possible application to industrial processes. A sequence based gene mining investigation was performed on a clone, which archives a metagenomic sequence from this environment. The sequence was annotated using de novo bioinformatics and molecular biology techniques. A predicted NAD+-dependent DNA ligase, ligDB1 was selected for further characterisation. LigDB1 encodes a gene product that contains all the sequence features of a functional ligase. The protein was overexpressed in a heterologous E.coli host and purified to homogeneity. LigDB1 did not exhibit nick sealing activity, but was able to perform AMP-dependent DNA relaxation in the presence of high concentrations of enzyme. DNA modifying enzymes from cold environments perform optimally at low temperatures and may be of use as molecular tools in biotechnology. Complete characterisation of this enzyme is subject to further investigations.
Magister Scientiae - MSc
APA, Harvard, Vancouver, ISO, and other styles
47

Ainsworth, David. "Computational approaches for metagenomic analysis of high-throughput sequencing data." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/44070.

Full text
Abstract:
High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the nascent field of metagenomics has been built. This ability to cheaply sample billions of DNA reads directly from environments has democratised sequencing and allowed researchers to gain unprecedented insights into diverse microbial communities. These technologies however are not without their limitations: the short length of the reads requires the production of vast amounts of data to ensure all information is captured. This 'data deluge' has been a major bottleneck and has necessitated the development of new algorithms for analysis. Sequence alignment methods provide the most information about the composition of a sample as they allow both taxonomic and functional classification but algorithms are prohibitively slow. This inefficiency has led to the reliance on faster algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments against annotated genomes. This thesis will describe k-SLAM, a novel ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k -mer based method k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches, allowing a full taxonomic classification and gene identification to be tractable on modern large datasets. The alignments found by k-SLAM can also be used to find variants and identify genes, along with their nearest taxonomic origins. A novel pseudo-assembly method produces more specific taxonomic classifications on species which have high sequence identity within their genus. This provides a significant (up to 40%) increase in accuracy on these species. Also described is a re-analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes. k-SLAM has been used by a range of research projects including FLORINASH and is currently being used by a number of groups.
APA, Harvard, Vancouver, ISO, and other styles
48

Ricks, Nathan Joseph. "A Metagenomic Approach to Understand Stand Failure in Bromus tectorum." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/8549.

Full text
Abstract:
Bromus tectorum (cheatgrass) is an invasive annual grass that has colonized large portions of the Intermountain west. Cheatgrass stand failures have been observed throughout the invaded region, the cause of which may be related to the presence of several species of pathogenic fungi in the soil or surface litter. In this study, metagenomics was used to better understand and compare the fungal communities between sites that have and have not experienced stand failure. Samples were taken from the soil and surface litter in Winnemucca, Nevada and Skull Valley, Utah. Results show distinct fungal communities between Winnemucca and Skull Valley, as well as between soil and surface litter. In both the Winnemucca and Skull Valley surface litter, there was an elevated abundance of the endophyte Ramimonilia apicalis in samples that had experienced a stand failure. Winnemucca surface litter stand failure samples had increased abundance of the potential pathogen in the genus Comoclathris while the soils had increased abundance of the known cheatgrass pathogen Epicoccum nigrum. Skull Valley surface litter stand failure samples had increased abundance of the known cheatgrass pathogen Clarireedia capillus-albis while the soils had increased abundance of potential pathogens in the genera Olpidium and Monosporascus.
APA, Harvard, Vancouver, ISO, and other styles
49

Altabtbaei, Khaled. "METAGENOMIC ANALYSIS OF PERIODONTAL BACTERIA ASSOCIATED WITH GENERALIZED AGGRESSIVE PERIODONTITIS." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1466590877.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Nevondo, Walter. "Development of a high throughput cell-free metagenomic screening platform." University of the Western Cape, 2016. http://hdl.handle.net/11394/5451.

Full text
Abstract:
Philosophiae Doctor - PhD
The estimated 5 × 10³⁰ prokaryotic cells inhabiting our planet sequester some 350–550 Petagrams (1 Pg = 1015 g) of carbon, 85–130 Pg of nitrogen, and 9–14 Pg of phosphorous, making them the largest reservoir of those nutrients on Earth (Whitman et al. 1998). However, reports suggest that only less than 1% of these microscopic organisms are cultivable (Torsvik et al. 1990; Sleator et al. 2008). Until recently with the development of metagenomic techniques, the knowledge of microbial diversity and their metabolic capabilities has been limited to this small fraction of cultivable organisms (Handelsman et al. 1998). While metagenomics has undoubtedly revolutionised the field of microbiology and biotechnology it has been generally acknowledged that the current approaches for metagenomic bio- rospecting / screening have limitations which hinder this approach to fully access the metabolic potentials and genetic variations contained in microbial genomes (Beloqui et al. 2008). In particular, the construction of metagenomic libraries and heterologous expression are amongst the major obstacles. The aim of this study was to develop an ultra-high throughput approach for screening enzyme activities using uncloned metagenomic DNA, thereby eliminating cloning steps, and employing in vitro heterologous expression. To achieve this, three widely used techniques: cell-free transcription-translation, in vitro compartmentalisation (IVC) and Fluorescence Activated Cell Sorting (FACS) were combined to develop this robust technique called metagenomic in vitro compartmentalisation (mIVC-FACS). Moreover, the E. coli commercial cell-free system was used in parallel to a novel, in-house Rhodococcus erythropolis based cell-free system. The versatility of this technique was tested by identifying novel beta-xylosidase encoding genes derived from a thermophilic compost metagenome. In addition, the efficiency of mIVC-FACS was compared to the traditional metagenomic approaches; function-based (clone library screening) and sequence-based (shotgun sequencing and PCR screening). The results obtained here show that the R. erythropolis cell-free system was over thirty-fold more effective than the E. coli based system based on the number of hits obtained per million double emulsions (dE) droplets screened. Six beta-xylosidase encoding genes were isolated and confirmed from twenty-eight positive dE droplets. Most of the droplets that were isolated from the same gate encoded the same enzyme, indicating that this technique is highly selective. A comparison of the hit rate of this screening approach with the traditional E. coli based fosmid library method shows that mIVC-FACS is at least 2.5 times more sensitive. Although only a few hits from the mIVC-FACS screening were selected for confirmation of beta-xylosidase activity, the proposed hit rate suggests that a significant number of positive hits are left un-accessed through the traditional clone library screening system. In addition, these results also suggest that E. coli expression system might be intrinsically sub-optimal for screening for hemicellulases from environmental genomes compared to R. erythropolis system. The workflow required for screening one million clones in a fosmid library was estimated to be about 320 hours compared to 144 hours required via the mIVC-FACS screening platform. Some of the gene products obtained in both screening platforms show multiple substrate activities, suggesting that the microbial consortia of composting material consist of microorganisms that produce enzymes with multiple lignocellulytic activities. While this platform still requires optimisation, we have demonstrated that this technique can be used to isolate genes encoding enzymes from mixed microbial genomes. mIVC-FACS is a promising technology with the potential to take metagenomic studies to the second generation of novel natural products bio-prospecting. The astonishing sensitivity and ultra-high throughput capacity of this technology offer numerous advantages in metagenomic bio-prospecting.
National Research Foundation (NRF)
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography