Academic literature on the topic 'High-throughput sequencing data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'High-throughput sequencing data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "High-throughput sequencing data"

1

Campagne, Fabien, Kevin C. Dorff, Nyasha Chambwe, James T. Robinson, and Jill P. Mesirov. "Compression of Structured High-Throughput Sequencing Data." PLoS ONE 8, no. 11 (November 18, 2013): e79871. http://dx.doi.org/10.1371/journal.pone.0079871.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Parrish, Nathaniel, Benjamin Sudakov, and Eleazar Eskin. "Genome reassembly with high-throughput sequencing data." BMC Genomics 14, Suppl 1 (2013): S8. http://dx.doi.org/10.1186/1471-2164-14-s1-s8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Fonseca, Nuno A., Johan Rung, Alvis Brazma, and John C. Marioni. "Tools for mapping high-throughput sequencing data." Bioinformatics 28, no. 24 (October 11, 2012): 3169–77. http://dx.doi.org/10.1093/bioinformatics/bts605.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Maruki, Takahiro, and Michael Lynch. "Genotype-Frequency Estimation from High-Throughput Sequencing Data." Genetics 201, no. 2 (July 29, 2015): 473–86. http://dx.doi.org/10.1534/genetics.115.179077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Dalca, A. V., and M. Brudno. "Genome variation discovery with high-throughput sequencing data." Briefings in Bioinformatics 11, no. 1 (January 1, 2010): 3–14. http://dx.doi.org/10.1093/bib/bbp058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ares, Manuel. "Methods for Processing High-Throughput RNA Sequencing Data." Cold Spring Harbor Protocols 2014, no. 11 (November 2014): pdb.top083352. http://dx.doi.org/10.1101/pdb.top083352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

David, Matei, Harun Mustafa, and Michael Brudno. "Detecting Alu insertions from high-throughput sequencing data." Nucleic Acids Research 41, no. 17 (August 5, 2013): e169-e169. http://dx.doi.org/10.1093/nar/gkt612.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Numanagić, Ibrahim, James K. Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, and S. Cenk Sahinalp. "Comparison of high-throughput sequencing data compression tools." Nature Methods 13, no. 12 (October 24, 2016): 1005–8. http://dx.doi.org/10.1038/nmeth.4037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Fiume, M., V. Williams, A. Brook, and M. Brudno. "Savant: genome browser for high-throughput sequencing data." Bioinformatics 26, no. 16 (June 20, 2010): 1938–44. http://dx.doi.org/10.1093/bioinformatics/btq332.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Numanagić, Ibrahim, Salem Malikić, Victoria M. Pratt, Todd C. Skaar, David A. Flockhart, and S. Cenk Sahinalp. "Cypiripi: exact genotyping ofCYP2D6using high-throughput sequencing data." Bioinformatics 31, no. 12 (June 13, 2015): i27—i34. http://dx.doi.org/10.1093/bioinformatics/btv232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "High-throughput sequencing data"

1

Roguski, Łukasz 1987. "High-throughput sequencing data compression." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/565775.

Full text
Abstract:
Thanks to advances in sequencing technologies, biomedical research has experienced a revolution over recent years, resulting in an explosion in the amount of genomic data being generated worldwide. The typical space requirement for storing sequencing data produced by a medium-scale experiment lies in the range of tens to hundreds of gigabytes, with multiple files in different formats being produced by each experiment. The current de facto standard file formats used to represent genomic data are text-based. For practical reasons, these are stored in compressed form. In most cases, such storage methods rely on general-purpose text compressors, such as gzip. Unfortunately, however, these methods are unable to exploit the information models specific to sequencing data, and as a result they usually provide limited functionality and insufficient savings in storage space. This explains why relatively basic operations such as processing, storage, and transfer of genomic data have become a typical bottleneck of current analysis setups. Therefore, this thesis focuses on methods to efficiently store and compress the data generated from sequencing experiments. First, we propose a novel general purpose FASTQ files compressor. Compared to gzip, it achieves a significant reduction in the size of the resulting archive, while also offering high data processing speed. Next, we present compression methods that exploit the high sequence redundancy present in sequencing data. These methods achieve the best compression ratio among current state-of-the-art FASTQ compressors, without using any external reference sequence. We also demonstrate different lossy compression approaches to store auxiliary sequencing data, which allow for further reductions in size. Finally, we propose a flexible framework and data format, which allows one to semi-automatically generate compression solutions which are not tied to any specific genomic file format. To facilitate data management needed by complex pipelines, multiple genomic datasets having heterogeneous formats can be stored together in configurable containers, with an option to perform custom queries over the stored data. Moreover, we show that simple solutions based on our framework can achieve results comparable to those of state-of-the-art format-specific compressors. Overall, the solutions developed and described in this thesis can easily be incorporated into current pipelines for the analysis of genomic data. Taken together, they provide grounds for the development of integrated approaches towards efficient storage and management of such data.
Gràcies als avenços en el camp de les tecnologies de seqüenciació, en els darrers anys la recerca biomèdica ha viscut una revolució, que ha tingut com un dels resultats l'explosió del volum de dades genòmiques generades arreu del món. La mida típica de les dades de seqüenciació generades en experiments d'escala mitjana acostuma a situar-se en un rang entre deu i cent gigabytes, que s'emmagatzemen en diversos arxius en diferents formats produïts en cada experiment. Els formats estàndards actuals de facto de representació de dades genòmiques són en format textual. Per raons pràctiques, les dades necessiten ser emmagatzemades en format comprimit. En la majoria dels casos, aquests mètodes de compressió es basen en compressors de text de caràcter general, com ara gzip. Amb tot, no permeten explotar els models d'informació especifícs de dades de seqüenciació. És per això que proporcionen funcionalitats limitades i estalvi insuficient d'espai d'emmagatzematge. Això explica per què operacions relativament bàsiques, com ara el processament, l'emmagatzematge i la transferència de dades genòmiques, s'han convertit en un dels principals obstacles de processos actuals d'anàlisi. Per tot això, aquesta tesi se centra en mètodes d'emmagatzematge i compressió eficients de dades generades en experiments de sequenciació. En primer lloc, proposem un compressor innovador d'arxius FASTQ de propòsit general. A diferència de gzip, aquest compressor permet reduir de manera significativa la mida de l'arxiu resultant del procés de compressió. A més a més, aquesta eina permet processar les dades a una velocitat alta. A continuació, presentem mètodes de compressió que fan ús de l'alta redundància de seqüències present en les dades de seqüenciació. Aquests mètodes obtenen la millor ratio de compressió d'entre els compressors FASTQ del marc teòric actual, sense fer ús de cap referència externa. També mostrem aproximacions de compressió amb pèrdua per emmagatzemar dades de seqüenciació auxiliars, que permeten reduir encara més la mida de les dades. En últim lloc, aportem un sistema flexible de compressió i un format de dades. Aquest sistema fa possible generar de manera semi-automàtica solucions de compressió que no estan lligades a cap mena de format específic d'arxius de dades genòmiques. Per tal de facilitar la gestió complexa de dades, diversos conjunts de dades amb formats heterogenis poden ser emmagatzemats en contenidors configurables amb l'opció de dur a terme consultes personalitzades sobre les dades emmagatzemades. A més a més, exposem que les solucions simples basades en el nostre sistema poden obtenir resultats comparables als compressors de format específic de l'estat de l'art. En resum, les solucions desenvolupades i descrites en aquesta tesi poden ser incorporades amb facilitat en processos d'anàlisi de dades genòmiques. Si prenem aquestes solucions conjuntament, aporten una base sòlida per al desenvolupament d'aproximacions completes encaminades a l'emmagatzematge i gestió eficient de dades genòmiques.
APA, Harvard, Vancouver, ISO, and other styles
2

Durif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.

Full text
Abstract:
L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF
The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Xuekui. "Mixture models for analysing high throughput sequencing data." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/35982.

Full text
Abstract:
The goal of my thesis is to develop methods and software for analysing high-throughput sequencing data, emphasizing sonicated ChIP-seq. For this goal, we developed a few variants of mixture models for genome-wide profiling of transcription factor binding sites and nucleosome positions. Our methods have been implemented into Bioconductor packages, which are freely available to other researchers. For profiling transcription factor binding sites, we developed a method, PICS, and implemented it into a Bioconductor package. We used a simulation study to confirm that PICS compares favourably to rival methods, such as MACS, QuEST, CisGenome, and USeq. Using published GABP and FOXA1 data from human cell lines, we then show that PICS predicted binding sites were more consistent with computationally predicted binding motifs than the alternative methods. For motif discovery using transcription binding sites, we combined PICS with two other existing packages to create the first complete set of Bioconductor tools for peak-calling and binding motif analysis of ChIP-Seq and ChIP-chip data. We demonstrate the effectiveness of our pipeline on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, detecting co-occurring motifs that were consistent with the literature but not detected by other methods. For nucleosome positioning, we modified PICS into a method called PING. PING can handle MNase-Seq and MNase- or sonicated-ChIP-Seq data. It compares favourably to NPS and TemplateFilter in scalability, accuracy and robustness to low read density. To demonstrate that PING predictions from sonicated data can have sufficient spatial resolution to be biologically meaningful, we use H3K4me1 data to detect nucleosome shifts, discriminate functional and non-functional transcription factor binding sites, and confirm that Foxa2 associates with the accessible major groove of nucleosomal DNA. All of the above uses single-end sequencing data. At the end of the thesis, we briefly discuss the issue of processing paired-end data, which we are currently investigating.
APA, Harvard, Vancouver, ISO, and other styles
4

Hoffmann, Steve. "Genome Informatics for High-Throughput Sequencing Data Analysis." Doctoral thesis, Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-152643.

Full text
Abstract:
This thesis introduces three different algorithmical and statistical strategies for the analysis of high-throughput sequencing data. First, we introduce a heuristic method based on enhanced suffix arrays to map short sequences to larger reference genomes. The algorithm builds on the idea of an error-tolerant traversal of the suffix array for the reference genome in conjunction with the concept of matching statistics introduced by Chang and a bitvector based alignment algorithm proposed by Myers. The algorithm supports paired-end and mate-pair alignments and the implementation offers methods for primer detection, primer and poly-A trimming. In our own benchmarks as well as independent bench- marks this tool outcompetes other currently available tools with respect to sensitivity and specificity in simulated and real data sets for a large number of sequencing protocols. Second, we introduce a novel dynamic programming algorithm for the spliced alignment problem. The advantage of this algorithm is its capability to not only detect co-linear splice events, i.e. local splice events on the same genomic strand, but also circular and other non-collinear splice events. This succinct and simple algorithm handles all these cases at the same time with a high accuracy. While it is at par with other state- of-the-art methods for collinear splice events, it outcompetes other tools for many non-collinear splice events. The application of this method to publically available sequencing data led to the identification of a novel isoform of the tumor suppressor gene p53. Since this gene is one of the best studied genes in the human genome, this finding is quite remarkable and suggests that the application of our algorithm could help to identify a plethora of novel isoforms and genes. Third, we present a data adaptive method to call single nucleotide variations (SNVs) from aligned high-throughput sequencing reads. We demonstrate that our method based on empirical log-likelihoods automatically adjusts to the quality of a sequencing experiment and thus renders a \"decision\" on when to call an SNV. In our simulations this method is at par with current state-of-the-art tools. Finally, we present biological results that have been obtained using the special features of the presented alignment algorithm
Diese Arbeit stellt drei verschiedene algorithmische und statistische Strategien für die Analyse von Hochdurchsatz-Sequenzierungsdaten vor. Zuerst führen wir eine auf enhanced Suffixarrays basierende heuristische Methode ein, die kurze Sequenzen mit grossen Genomen aligniert. Die Methode basiert auf der Idee einer fehlertoleranten Traversierung eines Suffixarrays für Referenzgenome in Verbindung mit dem Konzept der Matching-Statistik von Chang und einem auf Bitvektoren basierenden Alignmentalgorithmus von Myers. Die vorgestellte Methode unterstützt Paired-End und Mate-Pair Alignments, bietet Methoden zur Erkennung von Primersequenzen und zum trimmen von Poly-A-Signalen an. Auch in unabhängigen Benchmarks zeichnet sich das Verfahren durch hohe Sensitivität und Spezifität in simulierten und realen Datensätzen aus. Für eine große Anzahl von Sequenzierungsprotokollen erzielt es bessere Ergebnisse als andere bekannte Short-Read Alignmentprogramme. Zweitens stellen wir einen auf dynamischer Programmierung basierenden Algorithmus für das spliced alignment problem vor. Der Vorteil dieses Algorithmus ist seine Fähigkeit, nicht nur kollineare Spleiß- Ereignisse, d.h. Spleiß-Ereignisse auf dem gleichen genomischen Strang, sondern auch zirkuläre und andere nicht-kollineare Spleiß-Ereignisse zu identifizieren. Das Verfahren zeichnet sich durch eine hohe Genauigkeit aus: während es bei der Erkennung kollinearer Spleiß-Varianten vergleichbare Ergebnisse mit anderen Methoden erzielt, schlägt es die Wettbewerber mit Blick auf Sensitivität und Spezifität bei der Vorhersage nicht-kollinearer Spleißvarianten. Die Anwendung dieses Algorithmus führte zur Identifikation neuer Isoformen. In unserer Publikation berichten wir über eine neue Isoform des Tumorsuppressorgens p53. Da dieses Gen eines der am besten untersuchten Gene des menschlichen Genoms ist, könnte die Anwendung unseres Algorithmus helfen, eine Vielzahl weiterer Isoformen bei weniger prominenten Genen zu identifizieren. Drittens stellen wir ein datenadaptives Modell zur Identifikation von Single Nucleotide Variations (SNVs) vor. In unserer Arbeit zeigen wir, dass sich unser auf empirischen log-likelihoods basierendes Modell automatisch an die Qualität der Sequenzierungsexperimente anpasst und eine \"Entscheidung\" darüber trifft, welche potentiellen Variationen als SNVs zu klassifizieren sind. In unseren Simulationen ist diese Methode auf Augenhöhe mit aktuell eingesetzten Verfahren. Schließlich stellen wir eine Auswahl biologischer Ergebnisse vor, die mit den Besonderheiten der präsentierten Alignmentverfahren in Zusammenhang stehen
APA, Harvard, Vancouver, ISO, and other styles
5

Stromberg, Michael Peter. "Enabling high-throughput sequencing data analysis with MOSAIK." Thesis, Boston College, 2010. http://hdl.handle.net/2345/1332.

Full text
Abstract:
Thesis advisor: Gabor T. Marth
During the last few years, numerous new sequencing technologies have emerged that require tools that can process large amounts of read data quickly and accurately. Regardless of the downstream methods used, reference-guided aligners are at the heart of all next-generation analysis studies. I have developed a general reference-guided aligner, MOSAIK, to support all current sequencing technologies (Roche 454, Illumina, Applied Biosystems SOLiD, Helicos, and Sanger capillary). The calibrated alignment qualities calculated by MOSAIK allow the user to fine-tune the alignment accuracy for a given study. MOSAIK is a highly configurable and easy-to-use suite of alignment tools that is used in hundreds of labs worldwide. MOSAIK is an integral part of our genetic variant discovery pipeline. From SNP and short-INDEL discovery to structural variation discovery, alignment accuracy is an essential requirement and enables our downstream analyses to provide accurate calls. In this thesis, I present three major studies that were formative during the development of MOSAIK and our analysis pipeline. In addition, I present a novel algorithm that identifies mobile element insertions (non-LTR retrotransposons) in the human genome using split-read alignments in MOSAIK. This algorithm has a low false discovery rate (4.4 %) and enabled our group to be the first to determine the number of mobile elements that differentially occur between any two individuals
Thesis (PhD) — Boston College, 2010
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
APA, Harvard, Vancouver, ISO, and other styles
6

Xing, Zhengrong. "Poisson multiscale methods for high-throughput sequencing data." Thesis, The University of Chicago, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10195268.

Full text
Abstract:

In this dissertation, we focus on the problem of analyzing data from high-throughput sequencing experiments. With the emergence of more capable hardware and more efficient software, these sequencing data provide information at an unprecedented resolution. However, statistical methods developed for such data rarely tackle the data at such high resolutions, and often make approximations that only hold under certain conditions.

We propose a model-based approach to dealing with such data, starting from a single sample. By taking into account the inherent structure present in such data, our model can accurately capture important genomic regions. We also present the model in such a way that makes it easily extensible to more complicated and biologically interesting scenarios.

Building upon the single-sample model, we then turn to the statistical question of detecting differences between multiple samples. Such questions often arise in the context of expression data, where much emphasis has been put on the problem of detecting differential expression between two groups. By extending the framework for a single sample to incorporate additional group covariates, our model provides a systematic approach to estimating and testing for such differences. We then apply our method to several empirical datasets, and discuss the potential for further applications to other biological tasks.

We also seek to address a different statistical question, where the goal here is to perform exploratory analysis to uncover hidden structure within the data. We incorporate the single-sample framework into a commonly used clustering scheme, and show that our enhanced clustering approach is superior to the original clustering approach in many ways. We then apply our clustering method to a few empirical datasets and discuss our findings.

Finally, we apply the shrinkage procedure used within the single-sample model to tackle a completely different statistical issue: nonparametric regression with heteroskedastic Gaussian noise. We propose an algorithm that accurately recovers both the mean and variance functions given a single set of observations, and demonstrate its advantages over state-of-the art methods through extensive simulation studies.

APA, Harvard, Vancouver, ISO, and other styles
7

Fritz, Markus Hsi-Yang. "Exploiting high throughput DNA sequencing data for genomic analysis." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Woolford, Julie Ruth. "Statistical analysis of small RNA high-throughput sequencing data." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610375.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kircher, Martin. "Understanding and improving high-throughput sequencing data production and analysis." Doctoral thesis, Universitätsbibliothek Leipzig, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-71102.

Full text
Abstract:
Advances in DNA sequencing revolutionized the field of genomics over the last 5 years. New sequencing instruments make it possible to rapidly generate large amounts of sequence data at substantially lower cost. These high-throughput sequencing technologies (e.g. Roche 454 FLX, Life Technology SOLiD, Dover Polonator, Helicos HeliScope and Illumina Genome Analyzer) make whole genome sequencing and resequencing, transcript sequencing as well as quantification of gene expression, DNA-protein interactions and DNA methylation feasible at an unanticipated scale. In the field of evolutionary genomics, high-throughput sequencing permitted studies of whole genomes from ancient specimens of different hominin groups. Further, it allowed large-scale population genetics studies of present-day humans as well as different types of sequence-based comparative genomics studies in primates. Such comparisons of humans with closely related apes and hominins are important not only to better understand human origins and the biological background of what sets humans apart from other organisms, but also for understanding the molecular basis for diseases and disorders, particularly those that affect uniquely human traits, such as speech disorders, autism or schizophrenia. However, while the cost and time required to create comparative data sets have been greatly reduced, the error profiles and limitations of the new platforms differ significantly from those of previous approaches. This requires a specific experimental design in order to circumvent these issues, or to handle them during data analysis. During the course of my PhD, I analyzed and improved current protocols and algorithms for next generation sequencing data, taking into account the specific characteristics of these new sequencing technologies. The presented approaches and algorithms were applied in different projects and are widely used within the department of Evolutionary Genetics at the Max Planck Institute of Evolutionary Anthropology. In this thesis, I will present selected analyses from the whole genome shotgun sequencing of two ancient hominins and the quantification of gene expression from short-sequence tags in five tissues from three primates.
APA, Harvard, Vancouver, ISO, and other styles
10

Ainsworth, David. "Computational approaches for metagenomic analysis of high-throughput sequencing data." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/44070.

Full text
Abstract:
High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the nascent field of metagenomics has been built. This ability to cheaply sample billions of DNA reads directly from environments has democratised sequencing and allowed researchers to gain unprecedented insights into diverse microbial communities. These technologies however are not without their limitations: the short length of the reads requires the production of vast amounts of data to ensure all information is captured. This 'data deluge' has been a major bottleneck and has necessitated the development of new algorithms for analysis. Sequence alignment methods provide the most information about the composition of a sample as they allow both taxonomic and functional classification but algorithms are prohibitively slow. This inefficiency has led to the reliance on faster algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments against annotated genomes. This thesis will describe k-SLAM, a novel ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k -mer based method k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches, allowing a full taxonomic classification and gene identification to be tractable on modern large datasets. The alignments found by k-SLAM can also be used to find variants and identify genes, along with their nearest taxonomic origins. A novel pseudo-assembly method produces more specific taxonomic classifications on species which have high sequence identity within their genus. This provides a significant (up to 40%) increase in accuracy on these species. Also described is a re-analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes. k-SLAM has been used by a range of research projects including FLORINASH and is currently being used by a number of groups.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "High-throughput sequencing data"

1

Rodríguez-Ezpeleta, Naiara, Michael Hackenberg, and Ana M. Aransay. Bioinformatics for high throughput sequencing. New York, NY: Springer, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rodríguez-Ezpeleta, Naiara, Ana M. Aransay, and Michael Hackenberg. Bioinformatics for High Throughput Sequencing. Springer, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Deep Sequencing Data Analysis. Humana Press Inc., 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shomron, Noam. Deep Sequencing Data Analysis. Springer, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Taberlet, Pierre, Aurélie Bonin, Lucie Zinger, and Eric Coissac. Environmental DNA. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198767220.001.0001.

Full text
Abstract:
Environmental DNA (eDNA), i.e. DNA released in the environment by any living form, represents a formidable opportunity to gather high-throughput and standard information on the distribution or feeding habits of species. It has therefore great potential for applications in ecology and biodiversity management. However, this research field is fast-moving, involves different areas of expertise and currently lacks standard approaches, which calls for an up-to-date and comprehensive synthesis. Environmental DNA for biodiversity research and monitoring covers current methods based on eDNA, with a particular focus on “eDNA metabarcoding”. Intended for scientists and managers, it provides the background information to allow the design of sound experiments. It revisits all steps necessary to produce high-quality metabarcoding data such as sampling, metabarcode design, optimization of PCR and sequencing protocols, as well as analysis of large sequencing datasets. All these different steps are presented by discussing the potential and current challenges of eDNA-based approaches to infer parameters on biodiversity or ecological processes. The last chapters of this book review how DNA metabarcoding has been used so far to unravel novel patterns of diversity in space and time, to detect particular species, and to answer new ecological questions in various ecosystems and for various organisms. Environmental DNA for biodiversity research and monitoring constitutes an essential reading for all graduate students, researchers and practitioners who do not have a strong background in molecular genetics and who are willing to use eDNA approaches in ecology and biomonitoring.
APA, Harvard, Vancouver, ISO, and other styles
6

Pezzella, Francesco, Mahvash Tavassoli, and David J. Kerr, eds. Oxford Textbook of Cancer Biology. Oxford University Press, 2019. http://dx.doi.org/10.1093/med/9780198779452.001.0001.

Full text
Abstract:
The study of the biology of tumours has grown to become markedly interdisciplinary, involving chemists, statisticians, epidemiologists, mathematicians, bioinformaticians, and computer scientists alongside medical scientists. Oxford Textbook of Cancer Biology brings together the developments from different branches of research into one volume. Structured in seven sections, the book starts with a review of the development and biology of multicellular organisms, how they maintain a healthy homeostasis in an individual, and a description of the molecular basis of cancer development. The book then illustrates how, once cells become neoplastic, their signalling network is altered and pathological behaviour follows. Changes that cancer cells can induce in nearby normal tissue are explored, and the new relationship established between them and the stroma is explicated. Finally, the authors illustrate the contribution provided by high throughput techniques to map cancer at different levels, from genomic sequencing to cellular metabolic functions, and how information technology with its vast amounts of data are integrated with traditional cell biology to provide a global view of the disease. The book concludes by summarizing what we know to date about cancer, and in what direction our understanding of cancer is moving.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "High-throughput sequencing data"

1

Glass, Elizabeth M., and Folker Meyer. "Analysis of Metagenomics Data." In Bioinformatics for High Throughput Sequencing, 219–29. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0782-9_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sexton, David. "Computational Infrastructure and Basic Data Analysis for High-Throughput Sequencing." In Bioinformatics for High Throughput Sequencing, 55–65. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0782-9_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Michael Q. "Dissecting Splicing Regulatory Network by Integrative Analysis of CLIP-Seq Data." In Bioinformatics for High Throughput Sequencing, 209–18. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0782-9_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Paszkiewicz, Konrad, and David J. Studholme. "High-Throughput Sequencing Data Analysis Software: Current State and Future Developments." In Bioinformatics for High Throughput Sequencing, 231–48. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0782-9_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mane, Shrinivasrao P., Thero Modise, and Bruno W. Sobral. "Analysis of High-Throughput Sequencing Data." In Methods in Molecular Biology, 1–11. Totowa, NJ: Humana Press, 2010. http://dx.doi.org/10.1007/978-1-60761-682-5_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Young, Matthew D., Davis J. McCarthy, Matthew J. Wakefield, Gordon K. Smyth, Alicia Oshlack, and Mark D. Robinson. "Differential Expression for RNA Sequencing (RNA-Seq) Data: Mapping, Summarization, Statistical Analysis, and Experimental Design." In Bioinformatics for High Throughput Sequencing, 169–90. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0782-9_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Weese, David, and Enrico Siragusa. "Full-Text Indexes for High-Throughput Sequencing." In Algorithms for Next-Generation Sequencing Data, 41–75. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-59826-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hoffmann, Steve. "Computational Analysis of High Throughput Sequencing Data." In Methods in Molecular Biology, 199–217. Totowa, NJ: Humana Press, 2011. http://dx.doi.org/10.1007/978-1-61779-027-0_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Välimäki, Niko, and Simon J. Puglisi. "Distributed String Mining for High-Throughput Sequencing Data." In Lecture Notes in Computer Science, 441–52. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-33122-0_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Rieder, Dietmar, and Francesca Finotello. "Analysis of High-Throughput RNA Bisulfite Sequencing Data." In Methods in Molecular Biology, 143–54. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4939-6807-7_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "High-throughput sequencing data"

1

Mangul, Serghei, and Alex Zelikovsky. "Poster: Haplotype discovery from high-throughput sequencing data." In 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS). IEEE, 2011. http://dx.doi.org/10.1109/iccabs.2011.5729908.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Holt, James, Shunping Huang, Leonard McMillan, and Wei Wang. "Read Annotation Pipeline for High-Throughput Sequencing Data." In BCB'13: ACM-BCB2013. New York, NY, USA: ACM, 2013. http://dx.doi.org/10.1145/2506583.2506645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chung, Wei-Chun, Yu-Jung Chang, Chien-Chih Chen, Der-Tsai Lee, and Jan-Ming Ho. "Optimizing a MapReduce module of preprocessing high-throughput DNA sequencing data." In 2013 IEEE International Conference on Big Data. IEEE, 2013. http://dx.doi.org/10.1109/bigdata.2013.6691694.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jiangyu, Li, Wang Xiaolei, Zhao Dongsheng, Mao Yiqing, and Qian Cheng. "A fast microbial detection algorithm based on high-throughput sequencing data." In ICBCB '17: 2017 5th International Conference on Bioinformatics and Computational Biology. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3035012.3035014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Xin, Mingxiang Teng, Guohua Wang, Yuming Zhao, Xu Han, Weixing Feng, Lang Li, Jeremy Sanford, and Yunlong Liu. "xIP-seq Platform: An Integrative Framework for High-Throughput Sequencing Data Analysis." In 2009 Ohio Collaborative Conference on Bioinformatics (OCCBIO). IEEE, 2009. http://dx.doi.org/10.1109/occbio.2009.20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Puljiz, Zrinka, and Haris Vikalo. "Iterative learning of single individual haplotypes from high-throughput DNA sequencing data." In 2014 8th International Symposium on Turbo Codes and Iterative Information Processing (ISTC). IEEE, 2014. http://dx.doi.org/10.1109/istc.2014.6955103.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Milicchio, Franco, Iain E. Buchan, and Mattia C. F. Prosperi. "A* fast and scalable high-throughput sequencing data error correction via oligomers." In 2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, 2016. http://dx.doi.org/10.1109/cibcb.2016.7758117.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Chien-Chih, Yu-Jung Chang, Wei-Chun Chung, Der-Tsai Lee, and Jan-Ming Ho. "CloudRS: An error correction algorithm of high-throughput sequencing data based on scalable framework." In 2013 IEEE International Conference on Big Data. IEEE, 2013. http://dx.doi.org/10.1109/bigdata.2013.6691642.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chung, Wei-Chun, Yu-Jung Chang, D. T. Lee, and Jan-Ming Ho. "Using geometric structures to improve the error correction algorithm of high-throughput sequencing data on MapReduce framework." In 2014 IEEE International Conference on Big Data (Big Data). IEEE, 2014. http://dx.doi.org/10.1109/bigdata.2014.7004306.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Xiaodong Zhang, Chong Chu, Yao Zhang, Yufeng Wu, and Jingyang Gao. "Concod: Accurate consensus-based approach of calling deletions from high-throughput sequencing data." In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2016. http://dx.doi.org/10.1109/bibm.2016.7822495.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography