To see the other types of publications on this topic, follow the link: Genome sequence assembly.

Journal articles on the topic 'Genome sequence assembly'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Genome sequence assembly.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Taylor, D. Leland, A. Malcolm Campbell, and Laurie J. Heyer. "Illuminating the Black Box of Genome Sequence Assembly." American Biology Teacher 75, no. 8 (October 1, 2013): 572–77. http://dx.doi.org/10.1525/abt.2013.75.8.9.

Full text
Abstract:
Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of “reads.” A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules (http://gcat.davidson.edu/phast) designed to teach advanced high school and college students the genome assembly process. PHAST allows users to assemble phage genomes in real time and includes tutorials detailing the complexities of genome assembly. With PHAST, students learn concepts behind genome assembly and understand how mathematics solves biological problems such as genome assembly.
APA, Harvard, Vancouver, ISO, and other styles
2

Udall, Joshua A., Evan Long, Chris Hanson, Daojun Yuan, Thiruvarangan Ramaraj, Justin L. Conover, Lei Gong, et al. "De Novo Genome Sequence Assemblies of Gossypium raimondii and Gossypium turneri." G3: Genes|Genomes|Genetics 9, no. 10 (August 28, 2019): 3079–85. http://dx.doi.org/10.1534/g3.119.400392.

Full text
Abstract:
Cotton is an agriculturally important crop. Because of its importance, a genome sequence of a diploid cotton species (Gossypium raimondii, D-genome) was first assembled using Sanger sequencing data in 2012. Improvements to DNA sequencing technology have improved accuracy and correctness of assembled genome sequences. Here we report a new de novo genome assembly of G. raimondii and its close relative G. turneri. The two genomes were assembled to a chromosome level using PacBio long-read technology, HiC, and Bionano optical mapping. This report corrects some minor assembly errors found in the Sanger assembly of G. raimondii. We also compare the genome sequences of these two species for gene composition, repetitive element composition, and collinearity. Most of the identified structural rearrangements between these two species are due to intra-chromosomal inversions. More inversions were found in the G. turneri genome sequence than the G. raimondii genome sequence. These findings and updates to the D-genome sequence will improve accuracy and translation of genomics to cotton breeding and genetics.
APA, Harvard, Vancouver, ISO, and other styles
3

Ghosh, Tarini Shankar, Varun Mehra, and Sharmila S. Mande. "Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly." Journal of Bioinformatics and Computational Biology 13, no. 03 (May 15, 2015): 1541004. http://dx.doi.org/10.1142/s0219720015410048.

Full text
Abstract:
Metagenomics approach involves extraction, sequencing and characterization of the genomic content of entire community of microbes present in a given environment. In contrast to genomic data, accurate assembly of metagenomic sequences is a challenging task. Given the huge volume and the diverse taxonomic origin of metagenomic sequences, direct application of single genome assembly methods on metagenomes are likely to not only lead to an immense increase in requirements of computational infrastructure, but also result in the formation of chimeric contigs. A strategy to address the above challenge would be to partition metagenomic sequence datasets into clusters and assemble separately the sequences in individual clusters using any single-genome assembly method. The current study presents such an approach that uses tetranucleotide usage patterns to first represent sequences as points in a three dimensional (3D) space. The 3D space is subsequently partitioned into "Grids". Sequences within overlapping grids are then progressively assembled using any available assembler. We demonstrate the applicability of the current Grid-Assembly method using various categories of assemblers as well as different simulated metagenomic datasets. Validation results indicate that the Grid-Assembly approach helps in improving the overall quality of assembly, in terms of the purity and volume of the assembled contigs.
APA, Harvard, Vancouver, ISO, and other styles
4

Collins, Andrew. "The Challenge of Genome Sequence Assembly." Open Bioinformatics Journal 11, no. 1 (October 17, 2018): 231–39. http://dx.doi.org/10.2174/1875036201811010231.

Full text
Abstract:
Background: Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs. Objective: Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs. Results and Conclusion: A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.
APA, Harvard, Vancouver, ISO, and other styles
5

Sharma, Priyanka, Othman Al-Dossary, Bader Alsubaie, Ibrahim Al-Mssallem, Onkar Nath, Neena Mitter, Gabriel Rodrigues Alves Margarido, et al. "Improvements in the sequencing and assembly of plant genomes." Gigabyte 2021 (June 4, 2021): 1–10. http://dx.doi.org/10.46471/gigabyte.24.

Full text
Abstract:
Advances in DNA sequencing have made it easier to sequence and assemble plant genomes. Here, we extend an earlier study, and compare recent methods for long read sequencing and assembly. Updated Oxford Nanopore Technology software improved assemblies. Using more accurate sequences produced by repeated sequencing of the same molecule (Pacific Biosciences HiFi) resulted in less fragmented assembly of sequencing reads. Using data for increased genome coverage resulted in longer contigs, but reduced total assembly length and improved genome completeness. The original model species, Macadamia jansenii, was also compared with three other Macadamia species, as well as avocado (Persea americana) and jojoba (Simmondsia chinensis). In these angiosperms, increasing sequence data volumes caused a linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity influenced the success of assembly. Advances in long read sequencing technology continue to improve plant genome sequencing and assembly. However, results were improved by greater genome coverage, with the amount needed to achieve a particular level of assembly being species dependent.
APA, Harvard, Vancouver, ISO, and other styles
6

Jackman, Shaun D., Lauren Coombe, René L. Warren, Heather Kirk, Eva Trinh, Tina MacLeod, Stephen Pleasance, et al. "Complete Mitochondrial Genome of a Gymnosperm, Sitka Spruce (Picea sitchensis), Indicates a Complex Physical Structure." Genome Biology and Evolution 12, no. 7 (May 25, 2020): 1174–79. http://dx.doi.org/10.1093/gbe/evaa108.

Full text
Abstract:
Abstract Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.
APA, Harvard, Vancouver, ISO, and other styles
7

Rihtman, Branko, Sean Meaden, Martha R. J. Clokie, Britt Koskella, and Andrew D. Millard. "Assessing Illumina technology for the high-throughput sequencing of bacteriophage genomes." PeerJ 4 (June 1, 2016): e2055. http://dx.doi.org/10.7717/peerj.2055.

Full text
Abstract:
Bacteriophages are the most abundant biological entities on the planet, playing crucial roles in the shaping of bacterial populations. Phages have smaller genomes than their bacterial hosts, yet there are currently fewer fully sequenced phage than bacterial genomes. We assessed the suitability of Illumina technology for high-throughput sequencing and subsequent assembly of phage genomes. In silico datasets reveal that 30× coverage is sufficient to correctly assemble the complete genome of ˜98.5% of known phages, with experimental data confirming that the majority of phage genomes can be assembled at 30× coverage. Furthermore, in silico data demonstrate it is possible to co-sequence multiple phages from different hosts, without introducing assembly errors.
APA, Harvard, Vancouver, ISO, and other styles
8

Tanaka, Mami, Sayaka Mino, Yoshitoshi Ogura, Tetsuya Hayashi, and Tomoo Sawabe. "Availability of Nanopore sequences in the genome taxonomy for Vibrionaceae systematics: Rumoiensis clade species as a test case." PeerJ 6 (June 18, 2018): e5018. http://dx.doi.org/10.7717/peerj.5018.

Full text
Abstract:
Whole genome sequence comparisons have become essential for establishing a robust scheme in bacterial taxonomy. To generalize this genome-based taxonomy, fast, reliable, and cost-effective genome sequencing methodologies are required. MinION, the palm-sized sequencer from Oxford Nanopore Technologies, enables rapid sequencing of bacterial genomes using minimal laboratory resources. Here we tested the ability of Nanopore sequences for the genome-based taxonomy of Vibrionaceae and compared Nanopore-only assemblies to complete genomes of five Rumoiensis clade species: Vibrio aphrogenes, V. algivorus, V. casei, V. litoralis, and V. rumoiensis. Comparison of overall genome relatedness indices (OGRI) and multilocus sequence analysis (MLSA) based on Nanopore-only assembly and Illumina or hybrid assemblies revealed that errors in Nanopore-only assembly do not influence average nucleotide identity (ANI), in silico DNA-DNA hybridization (DDH), G+C content, or MLSA tree topology in Vibrionaceae. Our results show that the genome sequences from Nanopore-based approach can be used for rapid species identification based on the OGRI and MLSA.
APA, Harvard, Vancouver, ISO, and other styles
9

Mascher, Martin, Thomas Wicker, Jerry Jenkins, Christopher Plott, Thomas Lux, Chu Shin Koh, Jennifer Ens, et al. "Long-read sequence assembly: a technical evaluation in barley." Plant Cell 33, no. 6 (March 12, 2021): 1888–906. http://dx.doi.org/10.1093/plcell/koab077.

Full text
Abstract:
Abstract Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
APA, Harvard, Vancouver, ISO, and other styles
10

Buza, Krisztian, Bartek Wilczynski, and Norbert Dojer. "RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes." International Journal of Genomics 2015 (2015): 1–10. http://dx.doi.org/10.1155/2015/563482.

Full text
Abstract:
Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used.Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge.Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.
APA, Harvard, Vancouver, ISO, and other styles
11

Braich, Shivraj, Rebecca C. Baillie, German C. Spangenberg, and Noel O. I. Cogan. "A new and improved genome sequence of Cannabis sativa." Gigabyte 2020 (December 23, 2020): 1–13. http://dx.doi.org/10.46471/gigabyte.10.

Full text
Abstract:
Cannabis is a diploid species (2n = 20), the estimated haploid genome sizes of the female and male plants using flow cytometry are 818 and 843 Mb respectively. Although the genome of Cannabis has been sequenced (from hemp, wild and high-THC strains), all assemblies have significant gaps. In addition, there are inconsistencies in the chromosome numbering which limits their use. A new comprehensive draft genome sequence assembly (∼900 Mb) has been generated from the medicinal cannabis strain Cannbio-2, that produces a balanced ratio of cannabidiol and delta-9-tetrahydrocannabinol using long-read sequencing. The assembly was subsequently analysed for completeness by ordering the contigs into chromosome-scale pseudomolecules using a reference genome assembly approach, annotated and compared to other existing reference genome assemblies. The Cannbio-2 genome sequence assembly was found to be the most complete genome sequence available based on nucleotides assembled and BUSCO evaluation in Cannabis sativa with a comprehensive genome annotation. The new draft genome sequence is an advancement in Cannabis genomics permitting pan-genome analysis, genomic selection as well as genome editing.
APA, Harvard, Vancouver, ISO, and other styles
12

Sutton, John M., Joshua D. Millwood, A. Case McCormack, and Janna L. Fierst. "Optimizing experimental design for genome sequencing and assembly with Oxford Nanopore Technologies." Gigabyte 2021 (July 13, 2021): 1–26. http://dx.doi.org/10.46471/gigabyte.27.

Full text
Abstract:
High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences, but has high error rates, which make sequence assembly and analysis difficult as genome size and complexity increases. Robust experimental design is necessary for ONT genome sequencing and assembly, but few studies have addressed eukaryotic organisms. Here, we present novel results using simulated and empirical ONT and DNA libraries to identify best practices for sequencing and assembly for several model species. We find that the unique error structure of ONT libraries causes errors to accumulate and assembly statistics plateau as sequence depth increases. High-quality assembled eukaryotic sequences require high-molecular-weight DNA extractions that increase sequence read length, and computational protocols that reduce error through pre-assembly correction and read selection. Our quantitative results will be helpful for researchers seeking guidance for de novo assembly projects.
APA, Harvard, Vancouver, ISO, and other styles
13

Perkin, Lindsey C., Timothy P. L. Smith, and Brenda Oppert. "Variants in the Mitochondrial Genome Sequence of Rhyzopertha dominica (Fabricius) (Coleoptera: Bostrycidae)." Insects 12, no. 5 (April 27, 2021): 387. http://dx.doi.org/10.3390/insects12050387.

Full text
Abstract:
The lesser grain borer, Rhyzopertha dominica, is a coleopteran pest of stored grains and is mainly controlled by phosphine fumigation, but the increase in phosphine-resistant populations threatens efficacy. Some phosphine-resistant insects have reduced respiration, and thus studying the mitochondrial genome may provide additional information regarding resistance. Genomic DNA from an inbred laboratory strain of R. dominica was extracted and sequenced with both short (Illumina) and long (Pacific Biosciences) read technologies for whole genome sequence assembly and annotation. Short read sequences were assembled and annotated by open software to identify mitochondrial sequences, and the assembled sequence was manually annotated and verified by long read sequences. The mitochondrial genome sequence for R. dominica had a total length of 15,724 bp and encoded 22 trna genes, 2 rRNA genes, 13 protein coding genes (7 nad subunits, 3 cox, 2 atp, and 1 cytB), flanked by a long control region. We compared our predicted mitochondrial genome to that of another from a R. dominica strain from Jingziguan (China). While there was mostly agreement between the two assemblies, key differences will be further examined to determine if mutations in populations are related to insecticide control pressure, mainly that of phosphine. Differences in sequence data, assembly, and annotation also may result in different genome interpretations.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Xiaozhu, Xiao Xiong, Wenqi Cao, Chao Zhang, John H. Werren, and Xu Wang. "Genome Assembly of the A-Group Wolbachia in Nasonia oneida Using Linked-Reads Technology." Genome Biology and Evolution 11, no. 10 (October 1, 2019): 3008–13. http://dx.doi.org/10.1093/gbe/evz223.

Full text
Abstract:
Abstract Wolbachia are obligate intracellular bacteria which commonly infect various nematode and arthropod species. Genome sequences have been generated from arthropod samples following enrichment for the intracellular bacteria, and genomes have also been assembled from arthropod whole-genome sequencing projects. However, these methods remain challenging for infections that occur at low titers in hosts. Here we report the first Wolbachia genome assembled from host sequences using 10× Genomics linked-reads technology. The high read depth attainable by this method allows for recovery of intracellular bacteria that are at low concentrations. Based on the depth differences (714× for the insect and 59× for the bacterium), we assembled the genome of a Wolbachia in the parasitoid jewel wasp species Nasonia oneida. The final draft assembly consists of 1,293, 06 bp in 47 scaffolds with 1,114 coding genes and 97.01% genome completeness assessed by checkM. Comparisons of the five Multi Locus Sequence Typing genes revealed that the sequenced Wolbachia genome is the A1 strain (henceforth wOneA1) previously reported in N. oneida. Pyrosequencing confirms that the wasp strain lacks A2 and B types previously detected in this insect, which were likely lost during laboratory culturing. Assembling bacterial genomes from host genome projects can provide an effective method for sequencing bacterial genomes, even when the infections occur at low density in sampled tissues.
APA, Harvard, Vancouver, ISO, and other styles
15

Taylor, Gregory A., Heather Kirk, Lauren Coombe, Shaun D. Jackman, Justin Chu, Kane Tse, Dean Cheng, et al. "The Genome of the North American Brown Bear or Grizzly: Ursus arctos ssp. horribilis." Genes 9, no. 12 (November 30, 2018): 598. http://dx.doi.org/10.3390/genes9120598.

Full text
Abstract:
The grizzly bear (Ursus arctos ssp. horribilis) represents the largest population of brown bears in North America. Its genome was sequenced using a microfluidic partitioning library construction technique, and these data were supplemented with sequencing from a nanopore-based long read platform. The final assembly was 2.33 Gb with a scaffold N50 of 36.7 Mb, and the genome is of comparable size to that of its close relative the polar bear (2.30 Gb). An analysis using 4104 highly conserved mammalian genes indicated that 96.1% were found to be complete within the assembly. An automated annotation of the genome identified 19,848 protein coding genes. Our study shows that the combination of the two sequencing modalities that we used is sufficient for the construction of highly contiguous reference quality mammalian genomes. The assembled genome sequence and the supporting raw sequence reads are available from the NCBI (National Center for Biotechnology Information) under the bioproject identifier PRJNA493656, and the assembly described in this paper is version QXTK01000000.
APA, Harvard, Vancouver, ISO, and other styles
16

Rahman, Tasnim, Hasnain Heickal, Shamira Tabrejee, Md Miraj Kobad Chowdhury, Sheikh Muhammad Sarwar, and Mohammad Shoyaib. "SeqDev: An Algorithm for Constructing Genetic Elements Using Comparative Assembly." Plant Tissue Culture and Biotechnology 26, no. 1 (September 27, 2016): 105–21. http://dx.doi.org/10.3329/ptcb.v26i1.29772.

Full text
Abstract:
With the availability of recent next generation sequencing technologies and their low cost, genomes of different organisms are being sequenced frequently. Therefore, quick assembly of genome, transcriptome, and target contigs from the raw data generated through the sequencing technologies has become necessary for better understanding of different biological systems. This article proposes an algorithm, namely SeqDev (Sequence Developer) for constructing contigs from raw reads using reference sequences. For this, we considered a weighted frequency?based consensus mechanism named BlastAssemb for primary construction of a sequence with gaps. Then, we adopted suffix array and proposed a gap filling search (GFS) algorithm for searching the missing sequences in the primary construct. For evaluating our algorithm, we have chosen Pokkali (rice) raw genome and Japonica (rice) as our reference data. Experimental results demonstrated that our proposed algorithm accurately constructs promoter sequences of Pokkali from its raw genome data. These constructed promoter sequences were 93 ? 100% identical with the reference and also aligned with 96 ? 100% of corresponding reference sequences with eValue ranging from 0.0 ? 2e-14. All these results indicated that our proposed method could be a potential algorithm to construct target contigs from raw sequences with the help of reference sequences. Further wet lab validation with specific Pokkali promoter sequence will boost this method as a robust algorithm for target contig assembly.Plant Tissue Cult. & Biotech. 26(1): 105-121, 2016 (June)
APA, Harvard, Vancouver, ISO, and other styles
17

Kleffe, Jürgen, Robert Weißmann, and Florian F. Schmitzberger. "Single Nucleotide Polymorphisms Caused by Assembly Errors." Genomics Insights 3 (January 2010): GEI.S3653. http://dx.doi.org/10.4137/gei.s3653.

Full text
Abstract:
We compare the results of three different assembler programs, Celera, Phrap and Mira2, for the same set of about a hundred thousand Sanger reads derived from an unknown bacterial genome. In difference to previous assembly comparisons we do not focus on speed of computation and numbers of assembled contigs but on how the different sequence assemblies agree by content. Threefold consistently assembled genome regions are identified in order to estimate a lower bound of erroneously identified single nucleotide polymorphisms (SNP) caused by nothing but the process of mathematical sequence assembly. We identified 509 sequence triplets common to all three de-novo assemblies spanning only 34% (3.3 Mb) of the bacterial genome with 175 of these regions (~1.5 Mb) including erroneous SNPs and insertion/deletions. Within these triplets this on average leads to one error per 7,155 base pairs. Replacing the assembler Mira2 by the most recent version Mira3, the letter number even drops to 5,923. Our results therefore suggest that a considerably high number of erroneous SNPs may be present in current sequence data and mathematicians should urgently take up research on numerical stability of sequence assembly algorithms. Furthermore, even the latest versions of currently used assemblers produce erroneous SNPs that depend on the order reads are used as input. Such errors will severely hamper molecular diagnostics as well as relating genome variation and disease. This issue needs to be addressed urgently as the field is moving fast into clinical applications.
APA, Harvard, Vancouver, ISO, and other styles
18

Howe, Kerstin, Melinda Dwinell, Mary Shimoyama, Craig Corton, Emma Betteridge, Alexander Dove, Michael A. Quail, et al. "The genome sequence of the Norway rat, Rattus norvegicus Berkenhout 1769." Wellcome Open Research 6 (May 18, 2021): 118. http://dx.doi.org/10.12688/wellcomeopenres.16854.1.

Full text
Abstract:
We present a genome assembly from an individual male Rattus norvegicus (the Norway rat; Chordata; Mammalia; Rodentia; Muridae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled. This genome assembly, mRatBN7.2, represents the new reference genome for R. norvegicus and has been adopted by the Genome Reference Consortium.
APA, Harvard, Vancouver, ISO, and other styles
19

Jansen, Hans J., Ron P. Dirks, Michael Liem, Christiaan V. Henkel, G. Paul H. van Heusden, Richard J. L. F. Lemmers, Trifa Omer, Shuai Shao, Peter J. Punt, and Herman P. Spaink. "De novo whole-genome assembly of a wild type yeast isolate using nanopore sequencing." F1000Research 6 (May 3, 2017): 618. http://dx.doi.org/10.12688/f1000research.11146.1.

Full text
Abstract:
Background: The introduction of the MinIONTM sequencing device by Oxford Nanopore Technologies may greatly accelerate whole genome sequencing. It has been shown that the nanopore sequence data, in combination with other sequencing technologies, is highly useful for accurate annotation of all genes in the genome. However, it also offers great potential for de novo assembly of complex genomes without using other technologies. In this manuscript we used nanopore sequencing as a tool to classify yeast strains. Methods: We compared various technical and software developments for the nanopore sequencing protocol, showing that the R9 chemistry is, as predicted, higher in quality than R7.3 chemistry. The R9 chemistry is an essential improvement for assembly of the extremely AT-rich mitochondrial genome. Results: In this study, we used this new technology to sequence and de novo assemble the genome of a recently isolated ethanologenic yeast strain, and compared the results with those obtained by classical Illumina short read sequencing. This strain was originally named Candida vartiovaarae (Torulopsis vartiovaarae) based on ribosomal RNA sequencing. We show that the assembly using nanopore data is much more contiguous than the assembly using short read data. Conclusions: The mitochondrial and chromosomal genome sequences showed that our strain is clearly distinct from other yeast taxons and most closely related to published Cyberlindnera species. In conclusion, MinION-mediated long read sequencing can be used for high quality de novo assembly of new eukaryotic microbial genomes.
APA, Harvard, Vancouver, ISO, and other styles
20

Araki, Kazuo, Jun-ya Aokic, Junya Kawase, Kazuhisa Hamada, Akiyuki Ozaki, Hiroshi Fujimoto, Ikki Yamamoto, and Hironori Usuki. "Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing." International Journal of Genomics 2018 (2018): 1–12. http://dx.doi.org/10.1155/2018/7984292.

Full text
Abstract:
Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence.
APA, Harvard, Vancouver, ISO, and other styles
21

Fiedler, Gregor, Anna-Delia Herbstmann, Etienne Doll, Mareike Wenning, Erik Brinks, Jan Kabisch, Franziska Breitenwieser, Martin Lappann, Christina Böhnlein, and Charles M. A. P. Franz. "Taxonomic Evaluation of the Heyndrickxia (Basonym Bacillus) sporothermodurans Group (H. sporothermodurans, H. vini, H. oleronia) Based on Whole Genome Sequences." Microorganisms 9, no. 2 (January 26, 2021): 246. http://dx.doi.org/10.3390/microorganisms9020246.

Full text
Abstract:
The genetic heterogeneity of Heyndrickxia sporothermodurans (formerly Bacillussporothermodurans) was evaluated using whole genome sequencing. The genomes of 29 previously identified Heyndrickxiasporothermodurans and two Heyndrickxia vini strains isolated from ultra-high-temperature (UHT)-treated milk were sequenced by short-read (Illumina) sequencing. After sequence analysis, the two H. vini strains could be reclassified as H. sporothermodurans. In addition, the genomes of the H.sporothermodurans type strain (DSM 10599T) and the closest phylogenetic neighbors Heyndrickxiaoleronia (DSM 9356T) and Heyndrickxia vini (JCM 19841T) were also sequenced using both long (MinION) and short-read (Illumina) sequencing. By hybrid sequence assembly, the genome of the H. sporothermodurans type strain was enlarged by 15% relative to the short-read assembly. This noticeable increase was probably due to numerous mobile elements in the genome that are presumptively related to spore heat tolerance. Phylogenetic studies based on 16S rDNA gene sequence, core genome, single-nucleotide polymorphisms and ANI/dDDH, showed that H. vini is highly related to H. sporothermodurans. When examining the genome sequences of all H.sporothermodurans strains from this study, together with 4 H. sporothermodurans genomes available in the GenBank database, the majority of the 36 strains examined occurred in a clonal lineage with less than 100 SNPs. These data substantiate previous reports on the existence and spread of a genetically highly homogenous and heat resistant spore clone, i.e., the HRS-clone.
APA, Harvard, Vancouver, ISO, and other styles
22

Liem, Michael, Hans J. Jansen, Ron P. Dirks, Christiaan V. Henkel, G. Paul H. van Heusden, Richard J. L. F. Lemmers, Trifa Omer, Shuai Shao, Peter J. Punt, and Herman P. Spaink. "De novo whole-genome assembly of a wild type yeast isolate using nanopore sequencing." F1000Research 6 (August 3, 2018): 618. http://dx.doi.org/10.12688/f1000research.11146.2.

Full text
Abstract:
Background: The introduction of the MinION sequencing device by Oxford Nanopore Technologies may greatly accelerate whole genome sequencing. Nanopore sequence data offers great potential for de novo assembly of complex genomes without using other technologies. Furthermore, Nanopore data combined with other sequencing technologies is highly useful for accurate annotation of all genes in the genome. In this manuscript we used nanopore sequencing as a tool to classify yeast strains. Methods: We compared various technical and software developments for the nanopore sequencing protocol, showing that the R9 chemistry is, as predicted, higher in quality than R7.3 chemistry. The R9 chemistry is an essential improvement for assembly of the extremely AT-rich mitochondrial genome. We double corrected assemblies from four different assemblers with PILON and assessed sequence correctness before and after PILON correction with a set of 290 Fungi genes using BUSCO. Results: In this study, we used this new technology to sequence and de novo assemble the genome of a recently isolated ethanologenic yeast strain, and compared the results with those obtained by classical Illumina short read sequencing. This strain was originally named Candida vartiovaarae (Torulopsis vartiovaarae) based on ribosomal RNA sequencing. We show that the assembly using nanopore data is much more contiguous than the assembly using short read data. We also compared various technical and software developments for the nanopore sequencing protocol, showing that nanopore-derived assemblies provide the highest contiguity. Conclusions: The mitochondrial and chromosomal genome sequences showed that our strain is clearly distinct from other yeast taxons and most closely related to published Cyberlindnera species. In conclusion, MinION-mediated long read sequencing can be used for high quality de novo assembly of new eukaryotic microbial genomes.
APA, Harvard, Vancouver, ISO, and other styles
23

Li, Haoxing, Fan Jiang, Ping Wu, Ke Wang, and Yangrong Cao. "A High-Quality Genome Sequence of Model Legume Lotus japonicus (MG-20) Provides Insights into the Evolution of Root Nodule Symbiosis." Genes 11, no. 5 (April 29, 2020): 483. http://dx.doi.org/10.3390/genes11050483.

Full text
Abstract:
Lotus japonicus is an important model legume for studying symbiotic nitrogen fixation as well as plant development. A genomic sequence of L. japonicus (MG20) has been available for more than ten years. However, the low quality of the genome limits its application in functional genomic studies. Therefore, it is necessary to assemble high-quality chromosome sequences of L. japonicus using new sequencing technology to facilitate the study of functional genomics. In this report, we used the third-generation sequencing combined with the Illumina HiSeq platform to sequence the genome of L. japonicus (MG20). We obtained 544 Mb of genomic sequence using third-generation assembly. Based on sequence analysis, 357 Mb of repeats, 28,251 genes, 626 tRNAs, 1409 rRNAs, and 1233 pseudogenes were predicted in the genome. A total of 27,991 genes were annotated into databases. Compared to the previously published data, the new genome database contains complete L. japonicus sequences in the proper order and orientation with a contig N50 2.81Mb and an excellent genome coverage, which provides more accurate genome information and more precise assembly for functional genomic study.
APA, Harvard, Vancouver, ISO, and other styles
24

Jaffe, D. B. "Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2." Genome Research 13, no. 1 (January 1, 2003): 91–96. http://dx.doi.org/10.1101/gr.828403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Hanna, Zachary R., James B. Henderson, Anna B. Sellas, Jérôme Fuchs, Rauri C. K. Bowie, and John P. Dumbacher. "Complete mitochondrial genome sequences of the northern spotted owl (Strix occidentalis caurina) and the barred owl (Strix varia; Aves: Strigiformes: Strigidae) confirm the presence of a duplicated control region." PeerJ 5 (October 10, 2017): e3901. http://dx.doi.org/10.7717/peerj.3901.

Full text
Abstract:
We report here the successful assembly of the complete mitochondrial genomes of the northern spotted owl (Strix occidentalis caurina) and the barred owl (S. varia). We utilized sequence data from two sequencing methodologies, Illumina paired-end sequence data with insert lengths ranging from approximately 250 nucleotides (nt) to 9,600 nt and read lengths from 100–375 nt and Sanger-derived sequences. We employed multiple assemblers and alignment methods to generate the final assemblies. The circular genomes of S. o. caurina and S. varia are comprised of 19,948 nt and 18,975 nt, respectively. Both code for two rRNAs, twenty-two tRNAs, and thirteen polypeptides. They both have duplicated control region sequences with complex repeat structures. We were not able to assemble the control regions solely using Illumina paired-end sequence data. By fully spanning the control regions, Sanger-derived sequences enabled accurate and complete assembly of these mitochondrial genomes. These are the first complete mitochondrial genome sequences of owls (Aves: Strigiformes) possessing duplicated control regions. We searched the nuclear genome of S. o. caurina for copies of mitochondrial genes and found at least nine separate stretches of nuclear copies of gene sequences originating in the mitochondrial genome (Numts). The Numts ranged from 226–19,522 nt in length and included copies of all mitochondrial genes except tRNAPro, ND6, and tRNAGlu. Strix occidentalis caurina and S. varia exhibited an average of 10.74% (8.68% uncorrected p-distance) divergence across the non-tRNA mitochondrial genes.
APA, Harvard, Vancouver, ISO, and other styles
26

Seemann, Stefan E., Christian Anthon, Oana Palasca, and Jan Gorodkin. "Quality Assessment of Domesticated Animal Genome Assemblies." Bioinformatics and Biology Insights 9S4 (January 2015): BBI.S29333. http://dx.doi.org/10.4137/bbi.s29333.

Full text
Abstract:
The era of high-throughput sequencing has made it relatively simple to sequence genomes and transcriptomes of individuals from many species. In order to analyze the resulting sequencing data, high-quality reference genome assemblies are required. However, this is still a major challenge, and many domesticated animal genomes still need to be sequenced deeper in order to produce high-quality assemblies. In the meanwhile, ironically, the extent to which RNA seq and other next-generation data is produced frequently far exceeds that of the genomic sequence. Furthermore, basic comparative analysis is often affected by the lack of genomic sequence. Herein, we quantify the quality of the genome assemblies of 20 domesticated animals and related species by assessing a range of measurable parameters, and we show that there is a positive correlation between the fraction of mappable reads from RNAseq data and genome assembly quality. We rank the genomes by their assembly quality and discuss the implications for genotype analyses.
APA, Harvard, Vancouver, ISO, and other styles
27

Pop, M., S. L. Salzberg, and M. Shumway. "Genome sequence assembly: algorithms and issues." Computer 35, no. 7 (July 2002): 47–54. http://dx.doi.org/10.1109/mc.2002.1016901.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Li, Guangwei, Lijian Wang, Jianping Yang, Hang He, Huaibing Jin, Xuming Li, Tianheng Ren, et al. "A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes." Nature Genetics 53, no. 4 (March 18, 2021): 574–84. http://dx.doi.org/10.1038/s41588-021-00808-z.

Full text
Abstract:
AbstractRye is a valuable food and forage crop, an important genetic resource for wheat and triticale improvement and an indispensable material for efficient comparative genomic studies in grasses. Here, we sequenced the genome of Weining rye, an elite Chinese rye variety. The assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size (7.86 Gb), with 93.67% of the contigs (7.25 Gb) assigned to seven chromosomes. Repetitive elements constituted 90.31% of the assembled genome. Compared to previously sequenced Triticeae genomes, Daniela, Sumaya and Sumana retrotransposons showed strong expansion in rye. Further analyses of the Weining assembly shed new light on genome-wide gene duplications and their impact on starch biosynthesis genes, physical organization of complex prolamin loci, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions and loci in rye. This genome sequence promises to accelerate genomic and breeding studies in rye and related cereal crops.
APA, Harvard, Vancouver, ISO, and other styles
29

Biederstedt, Evan, Jeffrey C. Oliver, Nancy F. Hansen, Aarti Jajoo, Nathan Dunn, Andrew Olson, Ben Busby, and Alexander T. Dilthey. "NovoGraph: Human genome graph construction from multiple long-read de novo assemblies." F1000Research 7 (December 10, 2018): 1391. http://dx.doi.org/10.12688/f1000research.15895.2.

Full text
Abstract:
Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.
APA, Harvard, Vancouver, ISO, and other styles
30

Boyes, Douglas, and Peter W. H. Holland. "The genome sequence of the snout, Hypena proboscidalis (Linnaeus, 1758)." Wellcome Open Research 6 (September 15, 2021): 236. http://dx.doi.org/10.12688/wellcomeopenres.17189.1.

Full text
Abstract:
We present a genome assembly from an individual female Hypena proboscidalis (the snout; Arthropoda; Insecta; Lepidoptera; Erebidae). The genome sequence is 637 megabases in span. The majority of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled.
APA, Harvard, Vancouver, ISO, and other styles
31

Threlfall, Jonathan, and Mark Blaxter. "Launching the Tree of Life Gateway." Wellcome Open Research 6 (May 21, 2021): 125. http://dx.doi.org/10.12688/wellcomeopenres.16913.1.

Full text
Abstract:
The Tree of Life Gateway uses Genome Note publications to announce the completion of genomes assembled by the Tree of Life programme, based at the Wellcome Sanger Institute and involving numerous partner organisations and institutes. Tree of Life participates in the Darwin Tree of Life Project, which aims to sequence the genomes of all 70,000+ eukaryotic species in the Atlantic archipelago of Britain and Ireland, the Aquatic Symbiosis Genomics Project, which will sequence 1000 species involved in 500 symbioses between eukaryotic hosts and their microbial 'cobionts', and other initiatives, such as the Vertebrate Genome Project. These Genome Notes report the origins of ethically sourced samples used for sequencing, give the methods used to generate the sequence and use statistics and interactive figures to demonstrate the quality of the genome sequences. In addition to describing the production of these sequences, each Genome Note gives citeable credit to those who participated in producing the genome assembly and announces the availability of the data for reuse by all. It is through the use and reuse of this openly and publicly released data that we hope effective and lasting solutions to the ongoing biodiversity crisis can be found.
APA, Harvard, Vancouver, ISO, and other styles
32

Marra, Marco A., Martin Krzywinski, Readman Chiu, Matthew Field, Inanc Birol, Brian D’Souza, Ian Bosdet, et al. "Towards the Human Cancer Genome Project: A Sequence-Ready Physical Map of a Follicular Lymphoma Genome." Blood 106, no. 11 (November 16, 2005): 605. http://dx.doi.org/10.1182/blood.v106.11.605.605.

Full text
Abstract:
Abstract With the aim of identifying and sequencing mutations in follicular lymphoma genomes, we have begun a project to generate at least 24 deeply redundant sequence-ready Bacterial Artificial Clone (BAC) - based whole genome maps, each from a different individual’s lymphoma. BAC-array CGH and Affymetrix whole-genome sampling assays (WGSA) will be used along with the mapping data to identify genomic amplifications and losses in the lymphomas. Results from the mapping and array studies will be used to prioritize BAC clones for sequence analysis. Because each map will span essentially the entire genome of the corresponding lymphoma, we anticipate that essentially all regions of each tumor genome will be represented in easily sequenced BAC clones. This approach facilitates targeted sequencing of genomic regions of interest, including those containing genes relevant to cancer or harboring amplifications or deletions. Our mapping strategy hinges on the successful creation of deeply redundant high quality BAC libraries from primary lymphomas and large scale high throughput restriction enzyme fingerprinting of individual BACs with a version of the technology we used to map the human, mouse, rat and other genomes. The effort is large-scale, and will result in the generation of at least 2.5 million fingerprinted BAC clones over the next three years. Using the fingerprints, we will align the BACs to the reference human genome to assess genome coverage and to identify candidate genome rearrangements. In parallel, we will assemble the fingerprints into genome maps, looking for larger-scale genome variations between the lymphoma maps and the reference genome sequence. To test the feasibility of our approach, we obtained two restriction digest fingerprints from each of 140,000 individual BAC clones. BACs were sampled from a 7-fold redundant BAC library that had been created from genomic DNA purified from a primary follicular lymphoma sample. The fingerprints are being assembled into a clone map with the intent of reconstructing the entire tumor genome. 90,377 fingerprinted clones with unambiguous single alignments to the reference sequence were automatically assembled into 15,538 contigs. Subsequent rounds of semi-automatic contig merging further reduced the number of contigs to 5,433. Only 1,241 clones remained unassembled. We anchored the tumor genome map to the reference human genome sequence by aligning the clone fingerprints to the restriction map computed from the reference sequence assembly. As a result of this, we identified a BAC that captured the canonical t(14;18) translocation characteristic of follicular lymphomas. We sequenced this BAC and confirmed that it contains the expected translocation. Almost 2.6 gigabases (~91%) of the reference genome are represented in the evolving map, with an additional 50,000 clone fingerprints awaiting incorporation into the map assembly. Among these are repeat-rich and other clones that may well harbor genome rearrangements. Additional prioritization of sequencing targets will be undertaken when map construction and analysis of genome copy number alterations are complete.
APA, Harvard, Vancouver, ISO, and other styles
33

Vine, Christopher, Emma C. Teeling, Michelle Smith, Craig Corton, Karen Oliver, Jason Skelton, Emma Betteridge, et al. "The genome sequence of the common pipistrelle, Pipistrellus pipistrellus Schreber 1774." Wellcome Open Research 6 (May 17, 2021): 117. http://dx.doi.org/10.12688/wellcomeopenres.16895.1.

Full text
Abstract:
We present a genome assembly from an individual female Pipistrellus pipistrellus (the common pipistrelle; Chordata; Mammalia; Chiroptera; Vespertilionidae). The genome sequence is 1.76 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal pseudomolecules, with the X sex chromosome assembled.
APA, Harvard, Vancouver, ISO, and other styles
34

Challis, Richard, Edward Richards, Jeena Rajan, Guy Cochrane, and Mark Blaxter. "BlobToolKit – Interactive Quality Assessment of Genome Assemblies." G3: Genes|Genomes|Genetics 10, no. 4 (February 18, 2020): 1361–74. http://dx.doi.org/10.1534/g3.119.400908.

Full text
Abstract:
Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view. We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.
APA, Harvard, Vancouver, ISO, and other styles
35

Pasini, Erica M., Ulrike Böhme, Gavin G. Rutledge, Annemarie Voorberg-Van der Wel, Mandy Sanders, Matt Berriman, Clemens HM Kocken, and Thomas D. Otto. "An improved Plasmodium cynomolgi genome assembly reveals an unexpected methyltransferase gene expansion." Wellcome Open Research 2 (June 16, 2017): 42. http://dx.doi.org/10.12688/wellcomeopenres.11864.1.

Full text
Abstract:
Background: Plasmodium cynomolgi, a non-human primate malaria parasite species, has been an important model parasite since its discovery in 1907. Similarities in the biology of P. cynomolgi to the closely related, but less tractable, human malaria parasite P. vivax make it the model parasite of choice for liver biology and vaccine studies pertinent to P. vivax malaria. Molecular and genome-scale studies of P. cynomolgi have relied on the current reference genome sequence, which remains highly fragmented with 1,649 unassigned scaffolds and little representation of the subtelomeres. Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated a new reference genome sequence, PcyM, sourced from an Indian rhesus monkey. We compare the newly assembled genome sequence with those of several other Plasmodium species, including a re-annotated P. coatneyi assembly. Results: The new PcyM genome assembly is of significantly higher quality than the existing reference, comprising only 56 pieces, no gaps and an improved average gene length. Detailed manual curation has ensured a comprehensive annotation of the genome with 6,632 genes, nearly 1,000 more than previously attributed to P. cynomolgi. The new assembly also has an improved representation of the subtelomeric regions, which account for nearly 40% of the sequence. Within the subtelomeres, we identified more than 1300 Plasmodium interspersed repeat (pir) genes, as well as a striking expansion of 36 methyltransferase pseudogenes that originated from a single copy on chromosome 9. Conclusions: The manually curated PcyM reference genome sequence is an important new resource for the malaria research community. The high quality and contiguity of the data have enabled the discovery of a novel expansion of methyltransferase in the subtelomeres, and illustrates the new comparative genomics capabilities that are being unlocked by complete reference genomes.
APA, Harvard, Vancouver, ISO, and other styles
36

Birol, Inanç, Justin Chu, Hamid Mohamadi, Shaun D. Jackman, Karthika Raghavan, Benjamin P. Vandervalk, Anthony Raymond, and René L. Warren. "Spaced Seed Data Structures forDe NovoAssembly." International Journal of Genomics 2015 (2015): 1–8. http://dx.doi.org/10.1155/2015/196591.

Full text
Abstract:
De novoassembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.
APA, Harvard, Vancouver, ISO, and other styles
37

Dunn, Jenny C., Miriam Liedvogel, Michelle Smith, Craig Corton, Karen Oliver, Jason Skelton, Emma Betteridge, et al. "The genome sequence of the European robin, Erithacus rubecula Linnaeus 1758." Wellcome Open Research 6 (July 2, 2021): 172. http://dx.doi.org/10.12688/wellcomeopenres.16988.1.

Full text
Abstract:
We present a genome assembly from an individual female Erithacus rubecula (the European robin; Chordata; Aves; Passeriformes; Turdidae). The genome sequence is 1.09 gigabases in span. The majority of the assembly is scaffolded into 36 chromosomal pseudomolecules, with both W and Z sex chromosomes assembled.
APA, Harvard, Vancouver, ISO, and other styles
38

Lohse, Konrad, Dominik Laetsch, and Roger Vila. "The genome sequence of the large tortoiseshell, Nymphalis polychloros (Linnaeus, 1758)." Wellcome Open Research 6 (September 16, 2021): 238. http://dx.doi.org/10.12688/wellcomeopenres.17196.1.

Full text
Abstract:
We present a genome assembly from an individual female Nymphalis polychloros (the large tortoiseshell; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 398 megabases in span. The majority of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled.
APA, Harvard, Vancouver, ISO, and other styles
39

Halo, Julia V., Amanda L. Pendleton, Feichen Shen, Aurélien J. Doucet, Thomas Derrien, Christophe Hitte, Laura E. Kirby, et al. "Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes." Proceedings of the National Academy of Sciences 118, no. 11 (March 8, 2021): e2016274118. http://dx.doi.org/10.1073/pnas.2016274118.

Full text
Abstract:
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.
APA, Harvard, Vancouver, ISO, and other styles
40

Miller, Jason R., Sergey Koren, Kari A. Dilley, Derek M. Harkins, Timothy B. Stockwell, Reed S. Shabman, and Granger G. Sutton. "A draft genome sequence for the Ixodes scapularis cell line, ISE6." F1000Research 7 (March 8, 2018): 297. http://dx.doi.org/10.12688/f1000research.13635.1.

Full text
Abstract:
Background:The tick cell line ISE6, derived fromIxodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples.Methods:To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology.Results:The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to theIxodes scapularisreference genome sequence.Conclusions:The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.
APA, Harvard, Vancouver, ISO, and other styles
41

Boyes, Douglas H., and Peter W. H. Holland. "The genome sequence of the yellow-tail moth, Euproctis similis (Fuessly, 1775)." Wellcome Open Research 6 (September 13, 2021): 227. http://dx.doi.org/10.12688/wellcomeopenres.17188.1.

Full text
Abstract:
We present a genome assembly from an individual male Euproctis similis (the yellow-tail; Arthropoda; Insecta; Lepidoptera; Lymantriidae). The genome sequence is 508 megabases in span. The majority of the assembly is scaffolded into 22 chromosomal pseudomolecules, with the Z sex chromosome assembled.
APA, Harvard, Vancouver, ISO, and other styles
42

Gültekin, Visam, and Jens Allmer. "Novel perspectives for SARS-CoV-2 genome browsing." Journal of Integrative Bioinformatics 18, no. 1 (March 1, 2021): 19–26. http://dx.doi.org/10.1515/jib-2021-0001.

Full text
Abstract:
Abstract SARS-CoV-2 has spread worldwide and caused social, economic, and health turmoil. The first genome assembly of SARS-CoV-2 was produced in Wuhan, and it is widely used as a reference. Subsequently, more than a hundred additional SARS-CoV-2 genomes have been sequenced. While the genomes appear to be mostly identical, there are variations. Therefore, an alignment of all available genomes and the derived consensus sequence could be used as a reference, better serving the science community. Variations are significant, but representing them in a genome browser can become, especially if their sequences are largely identical. Here we summarize the variation in one track. Other information not currently found in genome browsers for SARS-CoV-2, such as predicted miRNAs and predicted TRS as well as secondary structure information, were also added as tracks to the consensus genome. We believe that a genome browser based on the consensus sequence is better suited when considering worldwide effects and can become a valuable resource in the combating of COVID-19. The genome browser is available at http://cov.iaba.online.
APA, Harvard, Vancouver, ISO, and other styles
43

Hormozdiari, Farhad, and Eleazar Eskin. "Memory efficient assembly of human genome." Journal of Bioinformatics and Computational Biology 13, no. 02 (April 2015): 1550008. http://dx.doi.org/10.1142/s0219720015500080.

Full text
Abstract:
The ability to detect the genetic variations between two individuals is an essential component for genetic studies. In these studies, obtaining the genome sequence of both individuals is the first step toward variation detection problem. The emergence of high-throughput sequencing (HTS) technology has made DNA sequencing practical, and is widely used by diagnosticians to increase their knowledge about the casual factor in genetic related diseases. As HTS advances, more data are generated every day than the amount that scientists can process. Genome assembly is one of the existing methods to tackle the variation detection problem. The de Bruijn graph formulation of the assembly problem is widely used in the field. Furthermore, it is the only method which can assemble any genome in linear time. However, it requires an enormous amount of memory in order to assemble any mammalian size genome. The high demands of sequencing more individuals and the urge to assemble them are the driving forces for a memory efficient assembler. In this work, we propose a novel method which builds the de Bruijn graph while consuming lower memory. Moreover, our proposed method can reduce the memory usage by 37% compared to the existing methods. In addition, we used a real data set (chromosome 17 of A/J strain) to illustrate the performance of our method.
APA, Harvard, Vancouver, ISO, and other styles
44

Dunn, Jenny C., Keith C. Hamer, Antony J. Morris, Philip V. Grice, Michelle Smith, Craig Corton, Karen Oliver, et al. "The genome sequence of the European turtle dove, Streptopelia turtur Linnaeus 1758." Wellcome Open Research 6 (July 27, 2021): 191. http://dx.doi.org/10.12688/wellcomeopenres.17060.1.

Full text
Abstract:
We present a genome assembly from an individual female Streptopelia turtur (the European turtle dove; Chordata; Aves; Columbidae). The genome sequence is 1.18 gigabases in span. The majority of the assembly is scaffolded into 35 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled.
APA, Harvard, Vancouver, ISO, and other styles
45

Yu, Menghao, Jugpreet Singh, Awais Khan, George W. Sundin, and Youfu Zhao. "Complete Genome Sequence of the Fire Blight Pathogen Strain Erwinia amylovora Ea1189." Molecular Plant-Microbe Interactions® 33, no. 11 (November 2020): 1277–79. http://dx.doi.org/10.1094/mpmi-06-20-0158-a.

Full text
Abstract:
Erwinia amylovora causes fire blight, the most devastating bacterial disease of apples and pears in the United States and worldwide. The model strain E. amylovora Ea1189 has been extensively used to understand bacterial pathogenesis and molecular mechanisms of bacterial-plant interactions. In this work, we sequenced and assembled the de novo genome of Ea1189, using a combination of long Oxford Nanopore Technologies and short Illumina sequence reads. A complete gapless genome assembly of Ea1189 consists of a 3,797,741-bp circular chromosome and a 28,259-bp plasmid with 3,472 predicted genes, including 78 transfer RNAs, 22 ribosomal RNAs, and 20 noncoding RNAs. A comparison of the Ea1189 genome to previously sequenced E. amylovora complete genomes showed 99.94 to 99.97% sequence similarity with 314 to 946 single nucleotide polymorphisms. We believe that the availability of the complete genome sequence of strain Ea1189 will further support studies to understand evolution, diversity and structural variations of Erwinia strains, as well as the molecular basis of E. amylovora pathogenesis and its interactions with host plants, thus facilitating the development of effective management strategies for this important disease.
APA, Harvard, Vancouver, ISO, and other styles
46

Bowers, Robert M., Nikos C. Kyrpides, Ramunas Stepanauskas, Miranda Harmon-Smith, Devin Doud, T. B. K. Reddy, Frederik Schulz, et al. "Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea." Nature Biotechnology 35, no. 8 (August 2017): 725–31. http://dx.doi.org/10.1038/nbt.3893.

Full text
Abstract:
AbstractWe present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.
APA, Harvard, Vancouver, ISO, and other styles
47

Mead, Dan, Frank Hailer, Elisabeth Chadwick, Roberto Portela Miguez, Michelle Smith, Craig Corton, Karen Oliver, et al. "The genome sequence of the Eurasian river otter, Lutra lutra Linnaeus 1758." Wellcome Open Research 5 (February 19, 2020): 33. http://dx.doi.org/10.12688/wellcomeopenres.15722.1.

Full text
Abstract:
We present a genome assembly from an individual male Lutra lutra (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled.
APA, Harvard, Vancouver, ISO, and other styles
48

Boyes, Douglas, and Peter W. H. Holland. "The genome sequence of the poplar hawk-moth, Laothoe populi (Linnaeus, 1758)." Wellcome Open Research 6 (September 16, 2021): 237. http://dx.doi.org/10.12688/wellcomeopenres.17191.1.

Full text
Abstract:
We present a genome assembly from an individual female Laothoe populi (the poplar hawk-moth; Arthropoda; Insecta; Lepidoptera; Sphingidae). The genome sequence is 576 megabases in span. The majority of the assembly is scaffolded into 29 chromosomal pseudomolecules, with the W and Z sex chromosome assembled.
APA, Harvard, Vancouver, ISO, and other styles
49

Zapata, Luis, Jia Ding, Eva-Maria Willing, Benjamin Hartwig, Daniela Bezdan, Wen-Biao Jiao, Vipul Patel, et al. "Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms." Proceedings of the National Academy of Sciences 113, no. 28 (June 27, 2016): E4052—E4060. http://dx.doi.org/10.1073/pnas.1607532113.

Full text
Abstract:
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana. Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
APA, Harvard, Vancouver, ISO, and other styles
50

Coughlan, Simone, Ali Shirley Taylor, Eoghan Feane, Mandy Sanders, Gabriele Schonian, James A. Cotton, and Tim Downing. "Leishmania naiffi and Leishmania guyanensis reference genomes highlight genome structure and gene evolution in the Viannia subgenus." Royal Society Open Science 5, no. 4 (April 2018): 172212. http://dx.doi.org/10.1098/rsos.172212.

Full text
Abstract:
The unicellular protozoan parasite Leishmania causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the Viannia subgenus predominates, so far only L. ( Viannia ) braziliensis and L. ( V. ) panamensis have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, L. ( V. ) naiffi and L. ( V. ) guyanensis genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for L. braziliensis M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of L. guyanensis and L. naiffi with 10 other Viannia genomes revealed four traits common to all Viannia : aneuploidy, 22 orthologous groups of genes absent in other Leishmania subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the Viannia , there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar L. lainsoni , L. naiffi had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate L. shawi M8408 had a possible minichromosome derived from the 3’ end of chromosome 34 . This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse Viannia has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography