To see the other types of publications on this topic, follow the link: Genome sequence assembly.

Dissertations / Theses on the topic 'Genome sequence assembly'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 16 dissertations / theses for your research on the topic 'Genome sequence assembly.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Nasser, Sara. "Fuzzy methods for meta-genome sequence classification and assembly." abstract and full text PDF (free order & download UNR users only), 2008. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3307706.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Freeman, Alex J. "The Genome Sequence of Gossypium herbaceum (A1), a Domesticated Diploid Cotton." BYU ScholarsArchive, 2018. https://scholarsarchive.byu.edu/etd/7329.

Full text
Abstract:
Gossypium herbaceum is a species of cotton native to Africa and Asia. As part of a larger effort to investigate structural variation in assorted diploid and polyploid cotton genomes we have sequenced and assembled the genome of G. herbaceum. Cultivated G. herbaceum is an A1-genome diploid from the Old World (Africa) with a genome size of approximately 1.7 Gb. Long range information is essential in constructing a high-quality assembly, especially when the genome is expected to be highly repetitive. Here we present a quality draft genome of G. herbaceum (cv. Wagad) using a multi-platform sequencing strategy (PacBio RS II, Dovetail Genomics, Phase Genomics, BioNano Genomics). PacBio RS II (60X) long reads were de novo assembled using the CANU assembler. Illumina sequence reads generated from the PROXIMO library method from Phase Genomics, and BioNano high-fidelity whole genome maps were used to further scaffolding. Finally, the assembly was polished using PILON. This multi-platform long range sequencing strategy will help greatly in attaining high quality de novo reconstructions of genomes. This assembly will be used towards comparative analysis with G. arboreum, which is also a domesticated A2-genome diploid. Not only will this provide a quality reference genome for G. herbaceum, it also provides an opportunity to assess recent technologies such as Dovetail Genomics, Phase Genomics, and Bionano Genomics. The G. herbaceum genome sequence serves as an example to the plant genomics community for those who have an interest in using multi-platform sequencing technologies for de novo genome sequencing.
APA, Harvard, Vancouver, ISO, and other styles
3

Lee, Rebekah Ann. "Assembly, Annotation and Optical Mapping of the A Subgenome of Avena." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/7238.

Full text
Abstract:
Common oat (Avena) has held a significant place within the global crop community for centuries; although its cultivation has decreased over the past century, its nutritional benefits have recently garnered increased interest for human consumption. No published reference sequences are available for any of the three oat subgenomes. Here we report a quality sequence assembly, annotation and hybrid optical map of the A-genome diploid Avena atlantica Baum and Fedak. The assembly is composed of a total of 3,417 contigs with an N50 of 11.86 Mb and an estimated completeness of 97.6%. This genome sequence will be a valuable research tool within the oat community.
APA, Harvard, Vancouver, ISO, and other styles
4

Bodily, Paul Mark. "Inverted Sequence Identification in Diploid Genomic Scaffold Assembly via Weighted MAX-CUT Reduction." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/3793.

Full text
Abstract:
Virtually all genome assemblers to date are designed for use with data from haploid or homozygous diploid genomes. Their use on heterozygous genomic datasets generally results in highly-fragmented, error-prone assemblies, owing to the violation of assumptions during both the contigging and scaffolding phases. Of the two phases, scaffolding is more particularly impacted and algorithms to facilitate the scaffolding of heterozygous data are lacking. We present a stand-alone scaffolding algorithm, ScaffoldScaffolder, designed specifically for scaffolding diploid genomes. A fundamental step in the scaffolding phase is the assignment of sequence orientations to contigs within scaffolds. Deciding such an assignment in the presence of ambiguous evidence is what is termed the contig orientation problem. We define this problem using bidirected graph theory and show that it is equivalent to the weighted MAX-CUT problem. We present a greedy heuristic solution which we comparatively assess with other solutions to the contig orientation problem, including an advanced MAX-CUT heuristic. We illustrate how a solution to this problem provides a simple means of simultaneously identifying inverted haplotypes, which are uniquely found in diploid genomes and which have been shown to be involved in the genetic mechanisms of several diseases. Ultimately our findings show that due to the inherent biases in the underlying biological model, a greedy heuristic algorithm performs very well in practice, retaining a higher total percent of edge weight than a branch-and-bound semidefinite programming heuristic. This application exemplifies how existing graph theory algorithms can be applied in the development of new algorithms for more accurate assembly of heterozygous diploid genomes.
APA, Harvard, Vancouver, ISO, and other styles
5

Sharp, Aaron Robert. "Improving Cotton Agronomics with Diverse Genomic Technologies." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5845.

Full text
Abstract:
Agronomic outcomes are the product of a plant's genotype and its environment. Genomic technologies allow farmers and researchers new avenues to explore the genetic component of agriculture. These technologies can also enhance understanding of environmental effects. With a growing world population, a wide variety of tools will be necessary to increase the agronomic productivity. Here I use massively parallel, deep sequencing of RNA (RNA-Seq) to measure changes in cotton gene expression levels in response to a change in the plant's surroundings caused by conservation tillage. Conservation tillage is an environmentally friendly, agricultural practice characterized by little or no inversion of the soil prior to planting. In addition to changes in cotton gene expression and biological pathway activity, I assay the transcriptional activity of microbial symbiotes living in and around the cotton roots. I found a large degree of similarity between cotton individuals in different treatments. However, under conventional disk tillage I did find significantly greater activity of cotton phosphatase and sulfate transport genes, as well as greater abundance of the microbes Candidatus Burkholderia brachynathoides and Arthrobacter species L77. This study also includes the use of high-throughput physical mapping of DNA to examine the genomic structure of a wild cotton species, Gossypium raimondii, which is closely related to the economically significant crop species Gossypium hirsutum. This technology characterizes genomic regions by assembling large input DNA molecules labeled at restriction enzyme recognition sites. I created an efficient algorithm and generated 812 whole genome assemblies from two datasets. The best of these assemblies allowed us to detect 3,806 potential misassemblies in the current release of the G. raimondii genome sequence assembly.
APA, Harvard, Vancouver, ISO, and other styles
6

Childers, Christopher P. "Sequence assembly and annotation of the bovine major histocompatibility complex (BoLA) class IIb region, and in silico detection of sequence polymorphisms in BoLA IIb." Texas A&M University, 2006. http://hdl.handle.net/1969.1/4821.

Full text
Abstract:
Cattle are vitally important to American agriculture industry, generating over 24.6 billion pounds of beef (by carcass weight), and 79.5 billion dollars in 2005, and over 27 billion dollars in milk sales in 2004. As of July 2006, the U.S. beef and dairy industry is comprised of 104.5 million head of cattle, 32.4 million of which were processed in 2005. The health of the animals has always been an important concern for breeders, as healthy animals grow faster and are more likely to reach market weight. Animals that exhibit natural resistance to disease do not require chemicals to stimulate normal weight gain, and are less prone to disease related wasting. The major histocompatibility complex (MHC) is a collection of genes, many of which function in antigen processing and presentation. The bovine MHC (BoLA) differs from typical mammalian MHCs in that the class II region was disrupted by a chromosomal inversion into two subregions, designated BoLA IIa and BoLA IIb. BoLA IIb was transposed to a position near the centromere on bovine chromosome 23,while BoLA IIa retains its position in BoLA. Comparative sequence analysis of BoLA IIb with the human MHC revealed the location of the region containing the proximal inversion breakpoint. Gene content, order and orientation of BoLA IIb are consistent with the single inversion hypothesis when compared to the corresponding region of the human class II MHC (HLA class II). BoLA IIb spans approximately 450 kb. The genomic sequence of BoLA IIb was used to detect sequence variation through comparison to other bovine sequences, including data from the bovine genome project, and two regions in the BAC scaffold used to develop the BoLA IIb sequence. Analysis of the bovine genome project sequence revealed a total of 10,408 mismatching bases, 30 out of 231 polymorphic microsatellites, and 15 sequences corresponding to the validated SNP panel generated by the bovine genome sequencing project. The two overlapping regions in the BoLA IIb BAC scaffold were found to have 888 polymorphisms, including a total of 6 out of 42 polymorphic microsatellites indicating that each BAC derived from a different chromosome.
APA, Harvard, Vancouver, ISO, and other styles
7

Savel, Daniel M. "Towards a Human Genomic Coevolution Network." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1524241451267546.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Jäger, Sarah Christina [Verfasser]. "Hybrid Assembly of Whole Genome Shotgun Sequences of Two Sugar Beet (Beta vulgaris L.) Translocation Lines Carrying the Beet Cyst Nematode Resistance Gene Hs1-2 and Functional Analysis of Candidate Genes / Sarah Christina Jäger." Kiel : Universitätsbibliothek Kiel, 2013. http://d-nb.info/1054661898/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Huang, Chih-Chang, and 黃至昶. "Establishing a Computational Pipeline of Genome Projects: Sequence Assembly, Gene Annotation and Metabolic Pathway Reconstruction." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/30090000954245955675.

Full text
Abstract:
碩士
國立交通大學
生物資訊及系統生物研究所
98
Human Genome Project had been completed in 2003. It provides gigantic resources for biological research. In recent years, next generation sequencing technique dramatically reduces the sequencing cost and time. Thus, completely sequencing new organisms will be popular and universal, and the genomes of these organisms also include huge research resources. The demands of comprehensive genomic annotation will be more urgent and necessary. Thus, it is necessary a computational pipeline. In order to assembly complete genome sequences, this pipeline uses several assembly tools which designed for assembling traditional sequencing and next generate sequencing raw data. It also integrates ab initio and evidence-based gene prediction approaches to predict genes. In addition, this pipeline can reconstruct metabolic pathways from the gene annotation results. This computational pipeline can assemble sequencing data from various platforms and provide the service of genomic annotation including: gene annotation and metabolic pathway reconstruction. This computational pipeline can be a crucial part of pipeline in the high throughput genomic annotation.
APA, Harvard, Vancouver, ISO, and other styles
10

Andere, Anne A. "De novo genome assembly of the blow fly Phormia regina (Diptera: Calliphoridae)." Thesis, 2014. http://hdl.handle.net/1805/5630.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)
Phormia regina (Meigen), commonly known as the black blow fly is a dipteran that belongs to the family Calliphoridae. Calliphorids play an important role in various research fields including ecology, medical studies, veterinary and forensic sciences. P. regina, a non-model organism, is one of the most common forensically relevant insects in North America and is typically used to assist in estimating postmortem intervals (PMI). To better understand the roles P. regina plays in the numerous research fields, we re-constructed its genome using next generation sequencing technologies. The focus was on generating a reference genome through de novo assembly of high-throughput short read sequences. Following assembly, genetic markers were identified in the form of microsatellites and single nucleotide polymorphisms (SNPs) to aid in future population genetic surveys of P. regina. A total 530 million 100 bp paired-end reads were obtained from five pooled male and female P. regina flies using the Illumina HiSeq2000 sequencing platform. A 524 Mbp draft genome was assembled using both sexes with 11,037 predicted genes. The draft reference genome assembled from this study provides an important resource for investigating the genetic diversity that exists between and among blow fly species; and empowers the understanding of their genetic basis in terms of adaptations, population structure and evolution. The genomic tools will facilitate the analysis of genome-wide studies using modern genomic techniques to boost a refined understanding of the evolutionary processes underlying genomic evolution between blow flies and other insect species.
APA, Harvard, Vancouver, ISO, and other styles
11

HUNG, YU-KAI, and 洪裕凱. "Genome Polishing of Nanopore-Only Assembly Using Coding Sequences." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/9r9vu4.

Full text
Abstract:
碩士
國立中正大學
資訊工程研究所
107
Third-generation sequencing (TGS) produces longer reads, unbiased coverage, and faster sequencing speed in comparison with next-generation sequencing (NGS). Genome assembly based on TGS can usually assemble complete genome but its accuracy is lower than NGS assembly due to high error rate. Although hybrid NGS and TGS can generate complete and high-quality genomes, the sequencing cost is too high to be practical. This thesis aims to polish Oxford Nanopore (ONT)-assembled genomes using TGS reads only. By using the conservation of coding sequence (CDS) across bacterial species, we develop a suite of polishing methods for correcting indel errors leading to frameshift. The polished genome can achieve 97-100\% completeness, which significantly outperform previous polishing methods (65-90\%). The remaining uncorrected errors are mainly due to incompleteness of CDS in the current database, which can be improved as more sequencing data are collected in the future.
APA, Harvard, Vancouver, ISO, and other styles
12

Sun, Yu-Ting, and 孫于婷. "A Semi-Assembly Approach for Genome Reconstruction Using Closely-Related Reference Sequences." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/53f7f4.

Full text
Abstract:
碩士
國立中正大學
資訊工程研究所
103
In recent years, as many genomes have been sequenced and assembled, the newly-sequenced genomes are often closely-related to an existing genome. However, owing to complex repeat structures in the genome, the genomes assembled by existing methods are often highly fragmented. In this thesis, we design a semi-assembly approach (called SemiAssembler) which integrate reference-mapping approaches and de novo assembly to reconstruct a newly-sequenced genome using closely-related genome sequences. A draft genome is first created by adding (removing) inter-species insertions (deletions) to (from) the related genome, respectively. Subsequently, the draft genome sequence is replaced with the contig sequences assembled from short reads, which aims to reflect inter-species SNPs and small-sized indels. Simulation results indicated our method has high precision and recall rates. The program is used to assemble two O. Sativa genomes. A substantial amount of large insertions/deletions and small indels found by our method were validated by PCR.
APA, Harvard, Vancouver, ISO, and other styles
13

Liang, Wei-Che, and 梁維哲. "Conversion of Mate-Pair Reads into Long Sequences for Improving Genome Assembly." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/8x8fz2.

Full text
Abstract:
碩士
國立中正大學
資訊工程研究所
104
In recent years, high-throughput sequencing technologies have been widely used for assembling genomes of many species. The short reads are first assembled into con- secutive sequences called contigs. Subsequently, these contigs are grouped into larger units called scaffolds on the basis of mate-pair reads. However, the fragmentation during contig assembly and chimeric mate-pair reads during sequencing pose challenges in the scaffolding stage. This thesis presents a method which converts mate-pair reads into long reads in order to overcome limitations of mate-pair reads. Each mate-pair read is first mapped to a contig graph, and the most-likely path between two ends of the read is found. We test our methods on three data sets, validate the accuracy of converted long reads, and show the scaffolding results of using long reads in comparison with mate-pair reads.
APA, Harvard, Vancouver, ISO, and other styles
14

Cheng-HungTsai and 蔡正宏. "Improving De Novo Genome Assembly by Using Longer Sequences Constructed from Short Paired-end Reads." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/11752902749752628647.

Full text
Abstract:
碩士
國立成功大學
資訊工程學系碩博士班
100
Genome sequencing and assembly are the fundamentals toward understanding the secrets behind DNA. The sequencing techniques were pioneered by Sanger and coworkers more than 30 years ago. Only recently, a series of the so-called next generation sequencing (NGS) techniques, such as 454 and Illumina have emerged and provided much a higher data throughput, thus a much lower data cost compared with Sanger sequencing. However, the sequence data, often called read, by NGS are shorter (~400 bp in 454) or much shorter (~125 bp in Illumina) than the Sanger reads (800~1000 bp). The NGS data introduces new computational challenges to genome assembly. Therefore, the purpose of this research is to develop a new computational method to achieve a better assembly with the NGS data. In this research, we proposed a method that increases the length of Illumina reads by closing the gaps between the two reads of Illumina paired-ends (PEs). This method can not only merge the two reads of overlapping PEs into longer reads, but also construct a longer read from the two reads of PE that do not overlap. This will significantly increase the possibility of a much better assembly with Illumina data alone, which is cheaper than the 454 data. We developed a computational program, called PE-Closer (Paired-End Closer), for this task. We tested the performance of PE-Closer on the simulated and real Illumina data of several bacterial species (Rhodobacter sphaeroides, Spirochaeta smaragdinae, Planctomyces brasiliensis, Cyclobacterium marinum, Streptomyces violaceusniger and Escherichia coli). PE-Closer was able to close 〉90% of the gaps of Illumina PEs in all cases, and increase the read length from 100 bp to 500 bp on average. It also corrects errors in the original reads, reducing the error rate from 1% to 0.01%. Using the longer reads obtained by PE-Closer, we improved the de novo genome assembly in terms of both statistics and quality. To conclude, our program PE-Closer is efficient in increasing the length of Illumina reads. Our experiments indicated that using the longer reads obtained by PE-Closer resulted in better de novo genome assemblies.
APA, Harvard, Vancouver, ISO, and other styles
15

"MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data." Thesis, 2014. http://hdl.handle.net/10388/ETD-2014-11-1878.

Full text
Abstract:
The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs.
APA, Harvard, Vancouver, ISO, and other styles
16

Hefer, Charles Amadeus. "Assembly, annotation and polymorphism analysis of a draft transcriptome sequence for a fast-growing Eucalyptus plantation tree." Thesis, 2011. http://hdl.handle.net/2263/28833.

Full text
Abstract:
Ultra-high throughput DNA sequencing technologies have rapidly changed the face of genomic research projects. Technologies such as mRNA-Seq have the potential to rapidly profile the expressed gene-catalog of non-model organisms, albeit with significant bioinformatics related costs and support required. This study developed automated data analysis workflows focused on the quality evaluation of mRNA-Seq reads, de novo transcriptome assembly, transcriptome annotation and digital gene expression profiling making use of data analysis tools available in the public domain and novel tools developed for this purpose. The developed workflows were made available in a private instance of the Galaxy workflow management system. The developed workflows were used to perform the de novo assembly of a gene-catalog of a Eucalyptus plantation tree. The fast growing and good wood properties of Eucalyptus tree species and their hybrids make them excellent renewable resources of fiber for pulp and paper, and woody biomass for bioenergy production. We produced an expressed gene-catalog of 18 894 de novo assembled contigs from Illumina deep mRNA-Seq of six sampled plant tissues. Using a novel coverage-assisted re-assembly approach, we were able to assemble near full-length biologically relevant transcripts. The assembly was evaluated in terms of contig quality and contiguity, and functional annotations were assigned. Digital expression profiling (FPKM values) of each contig across the tissues were calculated, which was used to identify of tissue-specific sets of expressed genes. Polymorphism analysis of 13 806 high-confidence contigs revealed a combined exon and untranslated region SNP density of 0.534 SNPs/100 bp, which provides a good opportunity for designing high-density SNP assays in the expressed regions of the Eucalyptus genome. The assembled and annotated gene catalog was made available for public use in a user-friendly, web-based interface as the Eucspresso database (http://eucspresso.bi.up.ac.za). The developed database acts as a prelude to a more comprehensive mRNA-Seq whole-transcriptome repository, the Eucalyptus Genome Intergrative Explorer (EucGenIE), a resource that will focus on identifying transcriptional networks active during woody biomass development. Results from the study proved that current bioinformatics software tools and approaches can be used to successfully assemble and characterize a large proportion of the transcriptome of a complex eukaryotic organism. This approach can be used to characterise the gene catalog of a wide range of non-model organisms using only data derived from uHTS experiments.
Thesis (PhD)--University of Pretoria, 2011.
Biochemistry
unrestricted
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography