Dissertations / Theses on the topic 'Data sequence processing'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Data sequence processing.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hansson, Andreas. "Sequence Processing from A Connectionist View." Thesis, University of Skövde, Department of Computer Science, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-481.
Full textIn this work we explore how close the artificial intelligence community has come to model the human mind regarding representation and processing of sequences. We analyse results produced by cognitive psychologists, who explore real minds, for features exhibited by human short- and long-term memory when representing and processing sequences. We compare these features with theories and models from the AI community divided into two types of theories: intrinsic and extrinsic theories. We conclude that the intrinsic theories have managed to explain most of the features, whereas the extrinsic theories still have a lot to do before exhibiting all features. We also present several suggestions for continued research to the AI community within the area of sequence representation and processing in the human mind.
Dameh, Mustafa, and n/a. "Insights into gene interactions using computational methods for literature and sequence resources." University of Otago. Department of Anatomy & Structural Biology, 2008. http://adt.otago.ac.nz./public/adt-NZDU20090109.095349.
Full textHung, Rong-I. "Computational studies of protein sequence and structure." Thesis, University of Oxford, 1999. http://ora.ox.ac.uk/objects/uuid:9905c946-86dd-4bb3-8824-7c50df136913.
Full textLi, Yaoman, and 李耀满. "Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/195977.
Full textpublished_or_final_version
Computer Science
Master
Master of Philosophy
Wang, Yi, and 王毅. "Binning and annotation for metagenomic next-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/208040.
Full textpublished_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
Liu, Kai. "Detecting stochastic motifs in network and sequence data for human behavior analysis." HKBU Institutional Repository, 2014. https://repository.hkbu.edu.hk/etd_oa/60.
Full textPeng, Yu, and 彭煜. "Iterative de Bruijn graph assemblers for second-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B50534051.
Full textpublished_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
Kutlu, Mucahid. "Parallel Processing of Large Scale Genomic Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436355132.
Full textBao, Suying, and 鲍素莹. "Deciphering the mechanisms of genetic disorders by high throughput genomic data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/196471.
Full textpublished_or_final_version
Biochemistry
Doctoral
Doctor of Philosophy
Chan, Pui-yee, and 陳沛儀. "A study on predicting gene relationship from a computational perspective." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30461352.
Full textHo, Ngai-lam, and 何毅林. "Algorithms on constrained sequence alignment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30201949.
Full textPowell, David Richard 1973. "Algorithms for sequence alignment." Monash University, School of Computer Science and Software Engineering, 2001. http://arrow.monash.edu.au/hdl/1959.1/8051.
Full textYim, Cheuk-hon Terence, and 嚴卓漢. "Approximate string alignment and its application to ESTs, mRNAs and genome mapping." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31455736.
Full text高銘謙 and Ming-him Ko. "A multi-agent model for DNA analysis." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31222778.
Full textLeung, Chi-ming, and 梁志銘. "Motif discovery for DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B3859755X.
Full textCamerlengo, Terry Luke. "Techniques for Storing and Processing Next-Generation DNA Sequencing Data." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159.
Full text桂宏胜 and Hongsheng Gui. "Data mining of post genome-wide association studies and next generation sequencing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/193431.
Full textLabuschagne, Jan Phillipus Lourens. "Development of a data processing toolkit for the analysis of next-generation sequencing data generated using the primer ID approach." University of the Western Cape, 2018. http://hdl.handle.net/11394/6736.
Full textSequencing an HIV quasispecies with next generation sequencing technologies yields a dataset with significant amplification bias and errors resulting from both the PCR and sequencing steps. Both the amplification bias and sequencing error can be reduced by labelling each cDNA (generated during the reverse transcription of the viral RNA to DNA prior to PCR) with a random sequence tag called a Primer ID (PID). Processing PID data requires additional computational steps, presenting a barrier to the uptake of this method. MotifBinner is an R package designed to handle PID data with a focus on resolving potential problems in the dataset. MotifBinner groups sequences into bins by their PID tags, identifies and removes false unique bins, produced from sequencing errors in the PID tags, as well as removing outlier sequences from within a bin. MotifBinner produces a consensus sequence for each bin, as well as a detailed report for the dataset, detailing the number of sequences per bin, the number of outlying sequences per bin, rates of chimerism, the number of degenerate letters in the final consensus sequences and the most divergent consensus sequences (potential contaminants). We characterized the ability of the PID approach to reduce the effect of sequencing error, to detect minority variants in viral quasispecies and to reduce the rates of PCR induced recombination. We produced reference samples with known variants at known frequencies to study the effectiveness of increasing PCR elongation time, decreasing the number of PCR cycles, and sample partitioning, by means of dPCR (droplet PCR), on PCR induced recombination. After sequencing these artificial samples with the PID approach, each consensus sequence was compared to the known variants. There are complex relationships between the sample preparation protocol and the characteristics of the resulting dataset. We produce a set of recommendations that can be used to inform sample preparation that is the most useful the particular study. The AMP trial infuses HIV-negative patients with the VRC01 antibody and monitors for HIV infections. Accurately timing the infection event and reconstructing the founder viruses of these infections are critical for relating infection risk to antibody titer and homology between the founder virus and antibody binding sites. Dr. Paul Edlefsen at the Fred Hutch Cancer Research Institute developed a pipeline that performs infection timing and founder reconstruction. Here, we document a portion of the pipeline, produce detailed tests for that portion of the pipeline and investigate the robustness of some of the tools used in the pipeline to violations of their assumptions.
Ye, Lin, and 叶林. "Exploring microbial community structures and functions of activated sludge by high-throughput sequencing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B48079649.
Full textpublished_or_final_version
Civil Engineering
Doctoral
Doctor of Philosophy
Dalke, Trevor. "Data Chunking in Quasi-Synchronous DS-CDMA." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1187.
Full textZeng, Shuai, and 曾帥. "Predicting functional impact of nonsynonymous mutations by quantifying conservation information and detect indels using split-read approach." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/198818.
Full textpublished_or_final_version
Paediatrics and Adolescent Medicine
Doctoral
Doctor of Philosophy
Murrel, Benjamin. "Improved models of biological sequence evolution." Thesis, Stellenbosch : Stellenbosch University, 2012. http://hdl.handle.net/10019.1/71870.
Full textENGLISH ABSTRACT: Computational molecular evolution is a field that attempts to characterize how genetic sequences evolve over phylogenetic trees – the branching processes that describe the patterns of genetic inheritance in living organisms. It has a long history of developing progressively more sophisticated stochastic models of evolution. Through a probabilist’s lens, this can be seen as a search for more appropriate ways to parameterize discrete state continuous time Markov chains to better encode biological reality, matching the historical processes that created empirical data sets, and creating useful tools that allow biologists to test specific hypotheses about the evolution of the organisms or the genes that interest them. This dissertation is an attempt to fill some of the gaps that persist in the literature, solving what we see as existing open problems. The overarching theme of this work is how to better model variation in the action of natural selection at multiple levels: across genes, between sites, and over time. Through four published journal articles and a fifth in preparation, we present amino acid and codon models that improve upon existing approaches, providing better descriptions of the process of natural selection and better tools to detect adaptive evolution.
AFRIKAANSE OPSOMMING: Komputasionele molekulêre evolusie is ’n navorsingsarea wat poog om die evolusie van genetiese sekwensies oor filogenetiese bome – die vertakkende prosesse wat die patrone van genetiese oorerwing in lewende organismes beskryf – te karakteriseer. Dit het ’n lang geskiedenis waartydens al hoe meer gesofistikeerde waarskynlikheidsmodelle van evolusie ontwikkel is. Deur die lens van waarskynlikheidsleer kan hierdie proses gesien word as ’n soektog na meer gepasde metodes om diskrete-toestand kontinuë-tyd Markov kettings te parametriseer ten einde biologiese realiteit beter te enkodeer – op so ’n manier dat die historiese prosesse wat tot die vorming van biologiese sekwensies gelei het nageboots word, en dat nuttige metodes geskep word wat bioloë toelaat om spesifieke hipotesisse met betrekking tot die evolusie van belanghebbende organismes of gene te toets. Hierdie proefskrif is ’n poging om sommige van die gapings wat in die literatuur bestaan in te vul en bestaande oop probleme op te los. Die oorkoepelende tema is verbeterde modellering van variasie in die werking van natuurlike seleksie op verskeie vlakke: variasie van geen tot geen, variasie tussen posisies in gene en variasie oor tyd. Deur middel van vier gepubliseerde joernaalartikels en ’n vyfde artikel in voorbereiding, bied ons aminosuur- en kodon-modelle aan wat verbeter op bestaande benaderings – hierdie modelle verskaf beter beskrywings van die proses van natuurlike seleksie sowel as beter metodes om gevalle van aanpassing in evolusie te vind.
Siu, Man-hung, and 蕭文鴻. "Finding motif pairs from protein interaction networks." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B40987760.
Full textZhang, Yue. "Detection copy number variants profile by multiple constrained optimization." HKBU Institutional Repository, 2017. https://repository.hkbu.edu.hk/etd_oa/439.
Full textSo, Wai-ki, and 蘇慧琪. "Shadow identification in traffic video sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B32045967.
Full textKüchler, Andreas. "Adaptive processing of structural data: from sequences to trees and beyond." Ulm : Universität Ulm, Fakultät für Informatik, 1999. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB8541389.
Full textWu, Qinyi. "Partial persistent sequences and their applications to collaborative text document editing and processing." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/44916.
Full textChapple, Charles E. "Finding a needle in haystack: the Eukaryotic selenoproteome." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7184.
Full textSelenoproteins are a diverse family of proteins containing the trace element Selenium (Se)in the form of the non-canonical amino acid selenocysteine (Sec). Selenocysteine, the 21st amino acid, is similar to cysteine (Cys)but with Se replacing Sulphur. In many cases the homologous gene of a known selenoprotein is present with cysteine in the place of Sec in a different genome. Selenoproteins are believed to be the effectors of the biological functions of Selenium and have been implicated in male infertility, cancer and heart diseases, viral expression and ageing. Selenocysteine is coded by the opal STOP codon (TGA). A number of factors combine to achieve the co-translational recoding of TGA to Sec. The 3' Untranslated regions (UTRs) of eukaryotic selenoprotein transcripts contain a stem-loop structure called a Sec Insertion Sequence (SECIS) element. This is recognised by the Secis Binding Protein 2 (SBP2), which binds to both the SECIS element and the ribosome. SBP2, in turn, recruits the Sec-specific Elongation Factor EFsec, and the selenocysteine transfer RNA, tRNASec. The dual meaning of the TGA codon means that selenoprotein genes are often mispredicted by the standard annotation pipelines. The correct prediction of these genes, therefore, requires the development of specific methods. In the past few years we have contributed significally to the description of the eukaryotic selenoproteome2 with the discovery of novel families (Castellano et al., 2005), the elaboration of novel methods (Taskov et al., 2005; Chapple et al., 2009) and the annotation of different genomes (Jaillon et al., 2004; Drosophila 12 genomes Consortium, 2007; Bovine Genome Sequencing and Analysis Consortium, 2009). Finally, and perhaps most importantly, we have identified the first animal to lack selenoprotein genes (Drosophila 12 genomes Consortium, 2007; Chapple and Guigó, 2008). This last finding is particularly surprising because it had previously been believed that selenoproteins were essential for animal life.
Cheng, Lok-lam, and 鄭樂霖. "Approximate string matching in DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.
Full textPatterson, Joel E. "The porting of the MCC Extensible Software Platform to the Sequent Symmetry." Master's thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-04272010-020129/.
Full textKüchler, Andreas [Verfasser]. "Adaptive processing of structural data: from sequences to trees and beyond / Andreas Küchler, Andreas." Ulm : Universität Ulm. Fakultät für Informatik, 2000. http://d-nb.info/1015211518/34.
Full textLundberg, Jesper, and Ronja Mehtonen. "Utvärdering och analys av batchstorlekar, produktsekvenser och omställningstider." Thesis, Högskolan i Skövde, Institutionen för ingenjörsvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11859.
Full textKrejčí, Michal. "Komprese dat." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-217934.
Full textRenzullo, Luigi John. "Radiometric processing of multitemporal sequences of satellite imagery for surface reflectance retrievals in change detection studies." Curtin University of Technology, Department of Applied Physics, 2004. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=15737.
Full textThe ability of the LVIK procedure to standardise sequences of image data was further demonstrated in the study of vegetation change. The normalised difference vegetation index (NDVI) was calculated from LVIN estimates of surface reflectance for a selection of sites around the township of Mt. Barker, Western Australia. NDVI data had characteristics consistent with data that have been corrected for atmospheric effects. A modification to the LVIN procedure was also proposed based on an investigation of some empirically-derived vegetation reflectance relationships. Research into the robustness of the relationships for a greater range of vegetation types is recommended.
Kornfeil, Vojtěch. "Soubor úloh pro kurs Sběr, analýza a zpracování dat." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217707.
Full textOliveira, Fábio Borges de. "Analysis of the cryptography security and steganography in images sequences." Laboratório Nacional de Computação Científica, 2007. http://www.lncc.br/tdmc/tde_busca/arquivo.php?codArquivo=134.
Full textA segurança da informação vem sendo considerada de grande importância para as instituições privadas e governamentais. Por este motivo, optamos em realizar um estudo sobre segurança nesta dissertação. Iniciamos com uma introdução à teoria da informação, partimos para métodos de criptografia onde propomos um novo tipo de Segredo Perfeito e finalmente fazemos um estudo de esteganografia em uma sequência de imagens, onde propomos uma esteganografia mais agressiva nos coeficientes da transformada discreta de cosseno.
Battikh, Dalia. "Sécurité de l’information par stéganographie basée sur les séquences chaotiques." Thesis, Rennes, INSA, 2015. http://www.theses.fr/2015ISAR0013/document.
Full textSteganography is the art of the dissimulation of a secret message in a cover medium such that the resultant medium (stego) is almost identical to the cover medium. Nowadays, with the globalization of the exchanges (Internet, messaging and e-commerce), using diverse mediums (sound, embellish with images, video), modern steganography is widely expanded. In this manuscript, we studied adaptive LSB methods of stéganography in spatial domain and frequency domain (DCT, and DWT), allowing of hiding the maximum of useful information in a cover image, such that the existence of the secret message in the stégo image is imperceptible and practically undetectable. Security of the message contents, in the case of its detection by an opponent, is not really insured by the methods proposed in the literature. To solve this question, we adapted and implemented two (known) methods of adaptive stéganographie LSB, by adding a strong chaotic system allowing a quasi-chaotic insertion of the bits of the secret message. The proposed chaotic system consists of a generator of strong chaotic sequences, supplying the dynamic keys of a modified chaotic 2D Cat map. Universal steganalysis (classification) of the developed methods of stéganography, is studied. On this question, we used the linear discriminating analysis of Fisher as classifier of the characteristic vectors of Farid, Shi and Wang. This choice is based on the wide variety of tested characteristic vectors that give an information about the properties of the image before and after message insertion. An analysis of the performances of three developed methods of steganalysis, applied to the produced stego images by the proposed adaptive methods of stéganography, is realized. Performance evaluation of the classification is realized by using the parameters: sensibility, specificity, precision and coefficient Kappa
Matulík, Martin. "Modelování a animace biologických struktur." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-377662.
Full textWang, Shu 1973. "On multiple sequence alignment." Thesis, 2007. http://hdl.handle.net/2152/3715.
Full text"Computational models for extracting structural signals from noisy high-throughput sequencing data: 通过计算模型来提取高通量测序数据中的分子结构信息." 2015. http://repository.lib.cuhk.edu.hk/en/item/cuhk-1291576.
Full textThesis Ph.D. Chinese University of Hong Kong 2015.
Includes bibliographical references (leaves 147-161).
Abstracts also in Chinese.
Title from PDF title page (viewed on 26, October, 2016).
Hu, Xihao.
Morris, Joseph P. "An analysis pipeline for the processing, annotation, and dissemination of expressed sequence tags." 2009. http://etd.louisville.edu/data/UofL0482t2009.pdf.
Full textTitle and description from thesis home page (viewed May 22, 2009). Department of Computer Engineering and Computer Science. Vita. "May 2009." Includes bibliographical references (p. 39-41).
Evans, Patricia Anne. "Algorithms and complexity for annotated sequence analysis." Thesis, 1999. https://dspace.library.uvic.ca//handle/1828/8864.
Full textGraduate
Xu, Weijia. "On integrating biological sequence analysis with metric distance based database management systems." Thesis, 2006. http://hdl.handle.net/2152/2955.
Full textZwickl, Derrick Joel. "Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion." Thesis, 2006. http://hdl.handle.net/2152/2666.
Full text"Bioinformatics analyses for next-generation sequencing of plasma DNA." 2012. http://library.cuhk.edu.hk/record=b5549423.
Full text隨著靶標富集測序出現,測序價格急劇下降。第一部分運用母親父親的多態位點基因型的組合加上測序的資訊可以計算出胎兒DNA在母體外周血中的濃度。但是該方法的局限是要利用母親父親的多態位點的基因型,而不能直接從測序的資訊中推測胎兒DNA在母體外周血中的濃度。本論文的第二部分,我開發了基於二項分佈的混合模型直接預測胎兒DNA在母體外周血中的濃度。當混合模型的似然值達到最大的時候,胎兒DNA在母體外周血中的濃度得到最優估算。由於靶標富集測序可以提供高倍覆蓋的測序資訊,從而有機會直接根據概率模型識別出母親是純和而且胎兒是雜合的有特異信息量的位點。
除了母體外周血DNA水準分析推動產前無創診斷外,表觀遺傳學的分析也不容忽視。 在本論文的第三部分,我開發了Methyl-Pipe軟體,專門用於全基因組的甲基化的分析。甲基化測序數據分析比一般的基因組測序分析更加複雜。由於重亞硫酸鹽測序文庫的沒有甲基化的胞嘧啶轉化成尿嘧啶,最後以胸腺嘧啶的形式存在PCR產物中, 但是對於甲基化的胞嘧啶則保持不變。 因此,為了實現將重亞硫酸鹽處理過的測序序列比對到參考基因組。首先,分別將Watson和Crick鏈的參考基因組中胞嘧啶轉化成全部轉化為胸腺嘧啶,同時也將測序序列中的胞嘧啶轉化成胸腺嘧啶。然後將轉化後的測序序列比對到參考基因組上。最後根據比對到基因組上的測序序列中的胞嘧啶和胸腺嘧啶的含量推到全基因組的甲基化水準和甲基化特定模式。Methyl-Pipe可以用於識別甲基化水平顯著性差異的基因組區別,因此它可以用於識別潛在的胎兒特異的甲基化位點用於產前無創診斷。
The presence of fetal DNA in the cell-free plasma of pregnant women was first described in 1997. The initial clinical applications of this phenomenon focused on the detection of paternally inherited traits such as sex and rhesus D blood group status. The development of massively parallel sequencing technologies has allowed more sophisticated analyses on circulating cell-free DNA in maternal plasma. For example, through the determination of the proportional representation of chromosome 21 sequences in maternal plasma, noninvasive prenatal diagnosis of fetal Down syndrome can be achieved with an accuracy of >98%. In the first part of my thesis, I have developed bioinformatics algorithms to perform genome-wide construction of the fetal genetic map from the massively parallel sequencing data of the maternal plasma DNA sample of a pregnant woman. The construction of the fetal genetic map through the maternal plasma sequencing data is very challenging because fetal DNA only constitutes approximately 10% of the maternal plasma DNA. Moreover, as the fetal DNA in maternal plasma exists as short fragments of less than 200 bp, existing bioinformatics techniques for genome construction are not applicable for this purpose. For the construction of the genome-wide fetal genetic map, I have used the genome of the father and the mother as scaffolds and calculated the fractional fetal DNA concentration. First, I looked at the paternal specific sequences in maternal plasma to determine which portions of the father’s genome had been passed on to the fetus. For the determination of the maternal inheritance, I have developed the Relative Haplotype Dosage (RHDO) approach. This method is based on the principle that the portion of maternal genome inherited by the fetus would be present in slightly higher concentration in the maternal plasma. The use of haplotype information can enhance the efficacy of using the sequencing data. Thus, the maternal inheritance can be determined with a much lower sequencing depth than just looking at individual loci in the genome. This algorithm makes it feasible to use genome-wide scanning to diagnose fetal genetic disorders prenatally in a noninvasive way.
As the emergence of targeted massively parallel sequencing, the sequencing cost per base is reducing dramatically. Even though the first part of the thesis has already developed a method to estimate fractional fetal DNA concentration using parental genotype informations, it still cannot be used to deduce the fractional fetal DNA concentration directly from sequencing data without prior knowledge of genotype information. In the second part of this thesis, I propose a statistical mixture model based method, FetalQuant, which utilizes the maximum likelihood to estimate the fractional fetal DNA concentration directly from targeted massively parallel sequencing of maternal plasma DNA. This method allows fetal DNA concentration estimation superior to the existing methods in term of obviating the need of genotype information without loss of accuracy. Furthermore, by using Bayes’ rule, this method can distinguish the informative SNPs where mother is homozygous and fetus is heterozygous, which is potential to detect dominant inherited disorder.
Besides the genetic analysis at the DNA level, epigenetic markers are also valuable for noninvasive diagnosis development. In the third part of this thesis, I have also developed a bioinformatics algorithm to efficiently analyze genomewide DNA methylation status based on the massively parallel sequencing of bisulfite-converted DNA. DNA methylation is one of the most important mechanisms for regulating gene expression. The study of DNA methylation for different genes is important for the understanding of the different physiological and pathological processes. Currently, the most popular method for analyzing DNA methylation status is through bisulfite sequencing. The principle of this method is based on the fact that unmethylated cytosine residues would be chemically converted to uracil on bisulfite treatment whereas methylated cytosine would remain unchanged. The converted uracil and unconverted cytosine can then be discriminated on sequencing. With the emergence of massively parallel sequencing platforms, it is possible to perform this bisulfite sequencing analysis on a genome-wide scale. However, the bioinformatics analysis of the genome-wide bisulfite sequencing data is much more complicated than analyzing the data from individual loci. Thus, I have developed Methyl-Pipe, a bioinformatics program for analyzing the DNA methylation status of genome-wide methylation status of DNA samples based on massively parallel sequencing. In the first step of this algorithm, an in-silico converted reference genome is produced by converting all the cytosine residues to thymine residues. Then, the sequenced reads of bisulfite-converted DNA sequences are aligned to this modified reference sequence. Finally, post-processing of the alignments removes non-unique and low-quality mappings and characterizes the methylation pattern in genome-wide manner. Making use of this new program, potential fetal-specific hypomethylated regions which can be used as blood biomarkers can be identified in a genome-wide manner.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Jiang, Peiyong.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 100-105).
Abstracts also in Chinese.
Chapter SECTION I : --- BACKGROUND --- p.1
Chapter CHAPTER 1: --- Circulating nucleic acids and Next-generation sequencing --- p.2
Chapter 1.1 --- Circulating nucleic acids --- p.2
Chapter 1.2 --- Next-generation sequencing --- p.3
Chapter 1.3 --- Bioinformatics analyses --- p.9
Chapter 1.4 --- Applications of the NGS --- p.11
Chapter 1.5 --- Aims of this thesis --- p.12
Chapter SECTION II : --- Mathematically decoding fetal genome in maternal plasma --- p.14
Chapter CHAPTER 2: --- Characterizing the maternal and fetal genome in plasma at single base resolution --- p.15
Chapter 2.1 --- Introduction --- p.15
Chapter 2.2 --- SNP categories and principle --- p.17
Chapter 2.3 --- Clinical cases and SNP genotyping --- p.20
Chapter 2.4 --- Sequencing depth and fractional fetal DNA concentration determination --- p.24
Chapter 2.5 --- Filtering of genotyping errors for maternal genotypes --- p.26
Chapter 2.6 --- Constructing fetal genetic map in maternal plasma --- p.27
Chapter 2.7 --- Sequencing error estimation --- p.36
Chapter 2.8 --- Paternal-inherited alleles --- p.38
Chapter 2.9 --- Maternally-derived alleles by RHDO analysis --- p.39
Chapter 2.1 --- Recombination breakpoint simulation and detection --- p.49
Chapter 2.11 --- Prenatal diagnosis of β- thalassaemia --- p.51
Chapter 2.12 --- Discussion --- p.53
Chapter SECTION III : --- Statistical model for fractional fetal DNA concentration estimation --- p.56
Chapter CHAPTER 3: --- FetalQuant: deducing the fractional fetal DNA concentration from massively parallel sequencing of maternal plasma DNA --- p.57
Chapter 3.1 --- Introduction --- p.57
Chapter 3.2 --- Methods --- p.60
Chapter 3.2.1 --- Maternal-fetal genotype combinations --- p.60
Chapter 3.2.2 --- Binomial mixture model and likelihood --- p.64
Chapter 3.2.3 --- Fractional fetal DNA concentration fitting --- p.66
Chapter 3.3 --- Results --- p.71
Chapter 3.3.1 --- Datasets --- p.71
Chapter 3.3.2 --- Evaluation of FetalQuant algorithm --- p.75
Chapter 3.3.3 --- Simulation --- p.78
Chapter 3.3.4 --- Sequencing depth and the number of SNPs required by FetalQuant --- p.81
Chapter 3.5 --- Discussion --- p.85
Chapter SECTION IV : --- NGS-based data analysis pipeline development --- p.88
Chapter CHAPTER 4: --- Methyl-Pipe: Methyl-Seq bioinformatics analysis pipeline --- p.89
Chapter 4.1 --- Introduction --- p.89
Chapter 4.2 --- Methods --- p.89
Chapter 4.2.1 --- Overview of Methyl-Pipe --- p.90
Chapter 4.3 --- Results and discussion --- p.96
Chapter SECTION V : --- CONCLUDING REMARKS --- p.97
Chapter CHAPTER 5: --- Conclusion and future perspectives --- p.98
Chapter 5.1 --- Conclusion --- p.98
Chapter 5.2 --- Future perspectives --- p.99
Reference --- p.100
"The analysis of cDNA sequences: an algorithm for alignment." 1997. http://library.cuhk.edu.hk/record=b5889263.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references (leaves 45-47).
Chapter CHAPTER 1 --- INTRODUCTION --- p.1
Chapter CHAPTER 2 --- BACKGROUND --- p.4
Section 2.1 DNA Cloning --- p.5
Section 2.1.1 Principles of cell-based DNA cloning --- p.5
Section 2.1.2. Polymerase Chain Reaction --- p.8
Section 2.2 DNA Libraries --- p.10
Section 2.3. Expressed Sequence Tags --- p.11
"Section 2.4 dbEST - Database for ""Expressed Sequence Tag""" --- p.13
Chapter CHAPTER 3 --- REDUCTION OF PARTIAL SEQUENCE REDUNDANCY AND CDNA ALIGNMENT --- p.15
Section 3.1 Materials --- p.15
Section 3.2 Our Algorithm --- p.16
Section 3.3 Data Storage --- p.24
Section 3.4 Criterion of Alignment --- p.27
Section 3.5 Pairwise Alignment --- p.29
Chapter CHAPTER 4 --- RESULTS AND DISCUSSION --- p.32
Chapter CHAPTER 5 --- CONCLUSION AND FUTURE DEVELOPMENT --- p.42
REFERENCES --- p.45
APPENDIX --- p.i
"Applications of evolutionary algorithms on biomedical systems." 2007. http://library.cuhk.edu.hk/record=b5893179.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 95-104).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.v
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Motivation --- p.1
Chapter 1.1.1 --- Basic Concepts and Definitions --- p.2
Chapter 1.2 --- Evolutionary Algorithms --- p.5
Chapter 1.2.1 --- Chromosome Encoding --- p.6
Chapter 1.2.2 --- Selection --- p.7
Chapter 1.2.3 --- Crossover --- p.9
Chapter 1.2.4 --- Mutation --- p.10
Chapter 1.2.5 --- Elitism --- p.11
Chapter 1.2.6 --- Niching --- p.11
Chapter 1.2.7 --- Population Manipulation --- p.13
Chapter 1.2.8 --- Building Blocks --- p.13
Chapter 1.2.9 --- Termination Conditions --- p.14
Chapter 1.2.10 --- Co-evolution --- p.14
Chapter 1.3 --- Local Search --- p.15
Chapter 1.4 --- Memetic Algorithms --- p.16
Chapter 1.5 --- Objective --- p.17
Chapter 1.6 --- Summary --- p.17
Chapter 2 --- Background --- p.18
Chapter 2.1 --- Multiple Drugs Tumor Chemotherapy --- p.18
Chapter 2.2 --- Bioinformatics --- p.22
Chapter 2.2.1 --- Basics of Bioinformatics --- p.24
Chapter 2.2.2 --- Applications on Biomedical Systems --- p.26
Chapter 3 --- A New Drug Administration Dynamic Model --- p.29
Chapter 3.1 --- Three Drugs Mathematical Model --- p.31
Chapter 3.1.1 --- Rate of Change of Different Subpopulations --- p.32
Chapter 3.1.2 --- Rate of Change of Different Drug Concen- trations --- p.35
Chapter 3.1.3 --- Toxicity Effects --- p.35
Chapter 3.1.4 --- Summary --- p.36
Chapter 4 --- Memetic Algorithm - Iterative Dynamic Program- ming (MA-IDP) --- p.38
Chapter 4.1 --- Problem Formulation: Optimal Control Problem (OCP) for Mutlidrug Optimization --- p.38
Chapter 4.2 --- Proposed Memetic Optimization Algorithm --- p.40
Chapter 4.2.1 --- Iterative Dynamic Programming (IDP) . . --- p.40
Chapter 4.2.2 --- Adaptive Elitist-population-based Genetic Algorithm (AEGA) --- p.44
Chapter 4.2.3 --- Memetic Algorithm 一 Iterative Dynamic Programming (MA-IDP) --- p.50
Chapter 4.3 --- Summary --- p.56
Chapter 5 --- MA-IDP: Experiments and Results --- p.57
Chapter 5.1 --- Experiment Settings --- p.57
Chapter 5.2 --- Optimization Results --- p.61
Chapter 5.3 --- Extension to Other Mutlidrug Scheduling Model . --- p.62
Chapter 5.4 --- Summary --- p.65
Chapter 6 --- DNA Sequencing by Hybridization (SBH) --- p.66
Chapter 6.1 --- Problem Formulation: Reconstructing a DNA Sequence from Hybridization Data --- p.70
Chapter 6.2 --- Proposed Memetic Optimization Algorithm --- p.71
Chapter 6.2.1 --- Chromosome Encoding --- p.71
Chapter 6.2.2 --- Fitness Function --- p.73
Chapter 6.2.3 --- Crossover --- p.74
Chapter 6.2.4 --- Hill Climbing Local Search for Sequencing by Hybridization --- p.76
Chapter 6.2.5 --- Elitism and Diversity --- p.79
Chapter 6.2.6 --- Outline of Algorithm: MA-HC-SBH --- p.81
Chapter 6.3 --- Summary --- p.82
Chapter 7 --- DNA Sequencing by Hybridization (SBH): Experiments and Results --- p.83
Chapter 7.1 --- Experiment Settings --- p.83
Chapter 7.2 --- Experiment Results --- p.85
Chapter 7.3 --- Summary --- p.89
Chapter 8 --- Conclusion --- p.90
Chapter 8.1 --- Multiple Drugs Cancer Chemotherapy Schedule Optimization --- p.90
Chapter 8.2 --- Use of the MA-IDP --- p.91
Chapter 8.3 --- DNA Sequencing by Hybridization (SBH) --- p.92
Chapter 8.4 --- Use of the MA-HC-SBH --- p.92
Chapter 8.5 --- Future Work --- p.93
Chapter 8.6 --- Item Learned --- p.93
Chapter 8.7 --- Papers Published --- p.94
Bibliography --- p.95
Zhao, Huiying. "Protein function prediction by integrating sequence, structure and binding affinity information." Thesis, 2014. http://hdl.handle.net/1805/3913.
Full textProteins are nano-machines that work inside every living organism. Functional disruption of one or several proteins is the cause for many diseases. However, the functions for most proteins are yet to be annotated because inexpensive sequencing techniques dramatically speed up discovery of new protein sequences (265 million and counting) and experimental examinations of every protein in all its possible functional categories are simply impractical. Thus, it is necessary to develop computational function-prediction tools that complement and guide experimental studies. In this study, we developed a series of predictors for highly accurate prediction of proteins with DNA-binding, RNA-binding and carbohydrate-binding capability. These predictors are a template-based technique that combines sequence and structural information with predicted binding affinity. Both sequence and structure-based approaches were developed. Results indicate the importance of binding affinity prediction for improving sensitivity and precision of function prediction. Application of these methods to the human genome and structure genome targets demonstrated its usefulness in annotating proteins of unknown functions and discovering moon-lighting proteins with DNA,RNA, or carbohydrate binding function. In addition, we also investigated disruption of protein functions by naturally occurring genetic variations due to insertions and deletions (INDELS). We found that protein structures are the most critical features in recognising disease-causing non-frame shifting INDELs. The predictors for function predictions are available at http://sparks-lab.org/spot, and the predictor for classification of non-frame shifting INDELs is available at http://sparks-lab.org/ddig.
Krishnadev, O. "Inferences On The Function Of Proteins And Protein-Protein Interactions Using Large Scale Sequence And Structure Analysis." Thesis, 2005. http://etd.iisc.ernet.in/handle/2005/1503.
Full text"Generalized pattern matching applied to genetic analysis." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075184.
Full textIn the first part of my research work, we propose a novel deterministic pattern matching algorithm which applies Agrep, a well-known bit-parallel matching algorithm, to a truncated suffix array. Due to the linear cost of Agrep, the cost of our approach is linear to the number of characters processed in the truncated suffix array. We analyze the matching cost theoretically, and .obtain empirical costs from experiments. We carry out experiments using both synthetic and real DNA sequence data (queries) and search them in Chromosome-X of a reference human genome. The experimental results show that our approach achieves a speed-up of several magnitudes over standard Agrep algorithm.
In the fourth part, we focus on the seeding strategies for alternative splicing detection. We review the history of seeding-and-extending (SAE), and assess both theoretically and empirically the seeding strategies adopted in existing splicing detection tools, including Bowtie's heuristic and ABMapper's exact seedings, against the novel complementary quad-seeding strategy we proposed and the corresponding novel splice detection tool called CS4splice, which can handle inexact seeding (with errors) and all 3 types of errors including mismatch (substitution), insertion, and deletion. We carry out experiments using short reads (queries) of length 105bp comprised of several data sets consisting of various levels of errors, and align them back to a reference human genome (hg18). On average, CS4splice can align 88. 44% (recall rate) of 427,786 short reads perfectly back to the reference; while the other existing tools achieve much smaller recall rates: SpliceMap 48.72%, MapSplice 58.41%, and ABMapper 51.39%. The accuracies of CS4splice are also the highest or very close to the highest in all the experiments carried out. But due to the complementary quad-seeding that CS4splice use, it takes more computational resources, about twice (or more) of the other alternative splicing detection tools, which we think is practicable and worthy.
In the second part, we define a novel generalized pattern (query) and a framework of generalized pattern matching, for which we propose a heuristic matching algorithm. Simply speaking, a generalized pattern is Q 1G1Q2 ... Qc--1Gc--1 Qc, which consists of several substrings Q i and gaps Gi occurring in-between two substrings. The prototypes of the generalized pattern come from several real Biological problems that can all be modeled as generalized pattern matching problems. Based on a well-known seeding-and-extending heuristic, we propose a dual-seeding strategy, with which we solve the matching problem effectively and efficiently. We also develop a specialized matching tool called Gpattern-match. We carry out experiments using 10,000 generalized patterns and search them in a reference human genome (hg18). Over 98.74% of them can be recovered from the reference. It takes 1--2 seconds on average to recover a pattern, and memory peak goes to a little bit more than 1G.
In the third part, a natural extension of the second part, we model a real biological problem, alternative splicing detection, into a generalized pattern matching problem, and solve it using a proposed bi-directional seeding-and-extending algorithm. Different from all the other tools which depend on third-party tools, our mapping tool, ABMapper, is not only stand-alone but performs unbiased alignments. We carry out experiments using 427,786 real next-generation sequencing short reads data (queries) and align them back to a reference human genome (hg18). ABMapper achieves 98.92% accuracy and 98.17% recall rate, and is much better than the other state-of-the-art tools: SpliceMap achieves 94.28% accuracy and 78.13% recall rate;while TopHat 88.99% accuracy and 76.33% recall rate. When the seed length is set to 12 in ABMapper, the whole searching and alignment process takes about 20 minutes, and memory peak goes to a little bit more than 2G.
Ni, Bing.
Adviser: Kwong-Sak Leung.
Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical referencesTexture mapping (leaves 151-161).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.