To see the other types of publications on this topic, follow the link: Data sequence processing.

Dissertations / Theses on the topic 'Data sequence processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Data sequence processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Hansson, Andreas. "Sequence Processing from A Connectionist View." Thesis, University of Skövde, Department of Computer Science, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-481.

Full text
Abstract:

In this work we explore how close the artificial intelligence community has come to model the human mind regarding representation and processing of sequences. We analyse results produced by cognitive psychologists, who explore real minds, for features exhibited by human short- and long-term memory when representing and processing sequences. We compare these features with theories and models from the AI community divided into two types of theories: intrinsic and extrinsic theories. We conclude that the intrinsic theories have managed to explain most of the features, whereas the extrinsic theories still have a lot to do before exhibiting all features. We also present several suggestions for continued research to the AI community within the area of sequence representation and processing in the human mind.

APA, Harvard, Vancouver, ISO, and other styles
2

Dameh, Mustafa, and n/a. "Insights into gene interactions using computational methods for literature and sequence resources." University of Otago. Department of Anatomy & Structural Biology, 2008. http://adt.otago.ac.nz./public/adt-NZDU20090109.095349.

Full text
Abstract:
At the beginning of this century many sequencing projects were finalised. As a result, overwhelming amount of literature and sequence data have been available to biologist via online bioinformatics databases. This biological data lead to better understanding of many organisms and have helped identify genes. However, there is still much to learn about the functions and interactions of genes. This thesis is concerned with predicting gene interactions using two main online resources: biomedical literature and sequence data. The biomedical literature is used to explore and refine a text mining method, known as the "co-occurrence method", which is used to predict gene interactions. The sequence data are used in an analysis to predict an upper bound of the number of genes involved in gene interactions. The co-occurrence method of text mining was extensively explored in this thesis. The effects of certain computational parameters on influencing the relevancy of documents in which two genes co-occur were critically examined. The results showed that indeed some computational parameters do have an impact on the outcome of the co-occurrence method, and if taken into consideration, can lead to better identification of documents that describe gene interactions. To explore the co-occurrence method of text mining, a prototype system was developed, and as a result, it contains unique functions that are not present in currently available text mining systems. Sequence data were used to predict the upper bound of the number of genes involved in gene interactions within a tissue. A novel approach was undertaken that used an analysis of SAGE and EST sequence libraries using ecological estimation methods. The approach proves that the species accumulation theory used in ecology can be applied to tag libraries (SAGE or EST) to predict an upper bound to the number of mRNA transcript species in a tissue. The novel computational analysis provided in this study can be used to extend the body of knowledge and insights relating to gene interactions and, hence, provide better understanding of genes and their functions.
APA, Harvard, Vancouver, ISO, and other styles
3

Hung, Rong-I. "Computational studies of protein sequence and structure." Thesis, University of Oxford, 1999. http://ora.ox.ac.uk/objects/uuid:9905c946-86dd-4bb3-8824-7c50df136913.

Full text
Abstract:
This thesis explores aspects protein function, structure and sequence by computational approaches. A comparative study of definitions of protein secondary structures was performed. Disagreements in assignment resulting from three different algorithms were observed. The causes of inaccuracies in structure assignments were discussed and possibilities of projecting protein secondary structures by different structural descriptors were tested. The investigation of inconsistent assignments of protein secondary structure led to a study of a more specific issue concerning protein structure/function relationships, namely cis/trans isomerisation of a peptide bond. Surveys were carried out at the level of protein molecules to detect the occurrences of the cis peptide bond, and at the level of protein domains to explore the possible biological implications of the occurrences of the structural motif. Research was then focussed on andalpha;-helical integral membrane proteins. A detailed analysis of sequences and putative transmembrane helical structures was conducted on the ABC transporters from different organisms. Interesting relationships between protein sequences, putative a-helical structures and transporter functions were identified. Applications of molecular dynamics simulations to the transmembrane helices of a specific human ABC transporter, cystic flbrosis transmembrane conductance regulator (CFTR), explored some of these relationships at the atomic resolution. Functional and structural implications of individual residues within membrane-spanning helices were revealed by these simulations studies.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Yaoman, and 李耀满. "Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/195977.

Full text
Abstract:
RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq align tools, the existing gene information database and how to improve the accuracy of alignment and predict RNA secondary structure. The known gene information database contains a lot of reliable gene information that has been discovered. And we note most DNA align tools are well developed. They can run much faster than existing RNA-Seq align tools and have higher sensitivity and accuracy. Combining with the known gene information database, we present a method to align RNA-Seq data by using DNA align tools. I.e. we use the DNA align tools to do alignment and use the gene information to convert the alignment to genome based. The gene information database, though updated daily, there are still a lot of genes and alternative splicings that hadn't been discovered. If our RNA align tool only relies on the known gene database, then there may be a lot reads that come from unknown gene or alternative splicing cannot be aligned. Thus, we show a combinational method that can cover potential alternative splicing junction sites. Combining with the original gene database, the new align tools can cover most alignments which are reported by other RNA-Seq align tools. Recently a lot of RNA-Seq align tools have been developed. They are more powerful and faster than the old generation tools. However, the RNA read alignment is much more complicated than other sequence alignment. The alignments reported by some RNA-Seq align tools have low accuracy. We present a simple and efficient filter method based on the quality score of the reads. It can filter most low accuracy alignments. At last, we present a RNA secondary prediction method that can predict pseudoknot(a type of RNA secondary structure) with high sensitivity and specificity.
published_or_final_version
Computer Science
Master
Master of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Yi, and 王毅. "Binning and annotation for metagenomic next-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/208040.

Full text
Abstract:
The development of next-generation sequencing technology enables us to obtain a vast number of short reads from metagenomic samples. In metagenomic samples, the reads from different species are mixed together. So, metagenomic binning has been introduced to cluster reads from the same or closely related species and metagenomic annotation is introduced to predict the taxonomic information of each read. Both metagenomic binning and annotation are critical steps in downstream analysis. This thesis discusses the difficulties of these two computational problems and proposes two algorithmic methods, MetaCluster 5.0 and MetaAnnotator, as solutions. There are six major challenges in metagenomic binning: (1) the lack of reference genomes; (2) uneven abundance ratios; (3) short read lengths; (4) a large number of species; (5) the existence of species with extremely-low-abundance; and (6) recovering low-abundance species. To solve these problems, I propose a two-round binning method, MetaCluster 5.0. The improvement achieved by MetaCluster 5.0 is based on three major observations. First, the short q-mer (length-q substring of the sequence with q = 4, 5) frequency distributions of individual sufficiently long fragments sampled from the same genome are more similar than those sampled from different genomes. Second, sufficiently long w-mers (length-w substring of the sequence with w ≈ 30) are usually unique in each individual genome. Third, the k-mer (length-k substring of the sequence with k ≈ 16) frequencies from reads of a species are usually linearly proportional to that of the species’ abundance. The metagenomic annotation methods in the literatures often suffer from five major drawbacks: (1) unable to annotate many reads; (2) less precise annotation for reads and more incorrect annotation for contigs; (3) unable to deal with novel clades with limited references genomes well; (4) performance affected by variable genome sequence similarities between different clades; and (5) high time complexity. In this thesis, a novel tool, MetaAnnotator, is proposed to tackle these problems. There are four major contributions of MetaAnnotator. Firstly, instead of annotating reads/contigs independently, a cluster of reads/contigs are annotated as a whole. Secondly, multiple reference databases are integrated. Thirdly, for each individual clade, quadratic discriminant analysis is applied to capture the similarities between reference sequences in the clade. Fourthly, instead of using alignment tools, MetaAnnotator perform annotation using k-mer exact match which is more efficient. Experiments on both simulated datasets and real datasets show that MetaCluster 5.0 and MetaAnnotator outperform existing tools with higher accuracy as well as less time and space cost.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Kai. "Detecting stochastic motifs in network and sequence data for human behavior analysis." HKBU Institutional Repository, 2014. https://repository.hkbu.edu.hk/etd_oa/60.

Full text
Abstract:
With the recent advent of Web 2.0, mobile computing, and pervasive sensing technologies, human activities can readily be logged, leaving digital traces of di.erent forms. For instance, human communication activities recorded in online social networks allow user interactions to be represented as “network” data. Also, human daily activities can be tracked in a smart house, where the log of sensor triggering events can be represented as “sequence” data. This thesis research aims to develop computational data mining algorithms using the generative modeling approach to extract salient patterns (motifs) embedded in such network and sequence data, and to apply them for human behavior analysis. Motifs are de.ned as the recurrent over-represented patterns embedded in the data, and have been known to be e.ective for characterizing complex networks. Many motif extraction methods found in the literature assume that a motif is either present or absent. In real practice, such salient patterns can appear partially due to their stochastic nature and/or the presence of noise. Thus, the probabilistic approach is adopted in this thesis to model motifs. For network data, we use a probability matrix to represent a network motif and propose a mixture model to extract network motifs. A component-wise EM algorithm is adopted where the optimal number of stochastic motifs is automatically determined with the help of a minimum message length criterion. Considering also the edge occurrence ordering within a motif, we model a motif as a mixture of .rst-order Markov chains for the extraction. Using a probabilistic approach similar to the one for network motif, an optimal set of stochastic temporal network motifs are extracted. We carried out rigorous experiments to evaluate the performance of the proposed motif extraction algorithms using both synthetic data sets and real-world social network data sets and mobile phone usage data sets, and obtained promising results. Also, we found that some of the results can be interpreted using the social balance and social status theories which are well-known in social network analysis. To evaluate the e.ectiveness of adopting stochastic temporal network motifs for not only characterizing human behaviors, we incorporate stochastic temporal network motifs as local structural features into a factor graph model for followee recommendation prediction (essentially a link prediction problem) in online social networks. The proposed motif-based factor graph model is found to outperform signi.cantly the existing state-of-the-art methods for the prediction task. For extract motifs from sequence data, the probabilistic framework proposed for the stochastic temporal network motif extraction is also applicable. One possible way is to make use of the edit distance in the probabilistic framework so that the subsequences with minor ordering variations can .rst be grouped to form the initial set of motif candidates. A mixture model can then be used to determine the optimal set of temporal motifs. We applied this approach to extract sequence motifs from a smart home data set which contains sensor triggering events corresponding to some activities performed by residents in the smart home. The unique behavior extracted for each resident based on the detected motifs is also discussed. Keywords: Stochastic network motifs, .nite mixture models, expectation maxi­mization algorithms, social networks, stochastic temporal network motifs, mixture of Markov chains, human behavior analysis, followee recommendation, signed social networks, activity of daily living, smart environments
APA, Harvard, Vancouver, ISO, and other styles
7

Peng, Yu, and 彭煜. "Iterative de Bruijn graph assemblers for second-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B50534051.

Full text
Abstract:
The recent advance of second-generation sequencing technologies has made it possible to generate a vast amount of short read sequences from a DNA (cDNA) sample. Current short read assemblers make use of the de Bruijn graph, in which each vertex is a k-mer and each edge connecting vertex u and vertex v represents u and v appearing in a read consecutively, to produce contigs. There are three major problems for de Bruijn graph assemblers: (1) branch problem, due to errors and repeats; (2) gap problem, due to low or uneven sequencing depth; and (3) error problem, due to sequencing errors. A proper choice of k value is a crucial tradeoff in de Bruijn graph assemblers: a low k value leads to fewer gaps but more branches; a high k value leads to fewer branches but more gaps. In this thesis, I first analyze the fundamental genome assembly problem and then propose an iterative de Bruijn graph assembler (IDBA), which iterates from low to high k values, to construct a de Bruijn graph with fewer branches and fewer gaps than any other de Bruijn graph assembler using a fixed k value. Then, the second-generation sequencing data from metagenomic, single-cell and transcriptome samples is investigated. IDBA is then tailored with special treatments to handle the specific issues for each kind of data. For metagenomic sequencing data, a graph partition algorithm is proposed to separate de Bruijn graph into dense components, which represent similar regions in subspecies from the same species, and multiple sequence alignment is used to produce consensus of each component. For sequencing data with highly uneven depth such as single-cell and metagenomic sequencing data, a method called local assembly is designed to reconstruct missing k-mers in low-depth regions. Then, based on the observation that short and relatively low-depth contigs are more likely erroneous, progressive depth on contigs is used to remove errors in both low-depth and high-depth regions iteratively. For transcriptome sequencing data, a variant of the progressive depth method is adopted to decompose the de Bruijn graph into components corresponding to transcripts from the same gene, and then the transcripts are found in each component by considering the reads and paired-end reads support. Plenty of experiments on both simulated and real data show that IDBA assemblers outperform the existing assemblers by constructing longer contigs with higher completeness and similar or better accuracy. The running time of IDBA assemblers is comparable to existing algorithms, while the memory cost is usually less than the others.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
8

Kutlu, Mucahid. "Parallel Processing of Large Scale Genomic Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436355132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bao, Suying, and 鲍素莹. "Deciphering the mechanisms of genetic disorders by high throughput genomic data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/196471.

Full text
Abstract:
A new generation of non-Sanger-based sequencing technologies, so called “next-generation” sequencing (NGS), has been changing the landscape of genetics at unprecedented speed. In particular, our capacity in deciphering the genotypes underlying phenotypes, such as diseases, has never been greater. However, before fully applying NGS in medical genetics, researchers have to bridge the widening gap between the generation of massively parallel sequencing output and the capacity to analyze the resulting data. In addition, even a list of candidate genes with potential causal variants can be obtained from an effective NGS analysis, to pinpoint disease genes from the long list remains a challenge. The issue becomes especially difficult when the molecular basis of the disease is not fully elucidated. New NGS users are always bewildered by a plethora of options in mapping, assembly, variant calling and filtering programs and may have no idea about how to compare these tools and choose the “right” ones. To get an overview of various bioinformatics attempts in mapping and assembly, a series of performance evaluation work was conducted by using both real and simulated NGS short reads. For NGS variant detection, the performances of two most widely used toolkits were assessed, namely, SAM tools and GATK. Based on the results of systematic evaluation, a NGS data processing and analysis pipeline was constructed. And this pipeline was proved a success with the identification of a mutation (a frameshift deletion on Hnrnpa1, p.Leu181Valfs*6) related to congenital heart defect (CHD) in procollagen type IIA deficient mice. In order to prioritize risk genes for diseases, especially those with limited prior knowledge, a network-based gene prioritization model was constructed. It consists of two parts: network analysis on known disease genes (seed-based network strategy)and network analysis on differential expression (DE-based network strategy). Case studies of various complex diseases/traits demonstrated that the DE-based network strategy can greatly outperform traditional gene expression analysis in predicting disease-causing genes. A series of simulation work indicated that the DE-based strategy is especially meaningful to diseases with limited prior knowledge, and the model’s performance can be further advanced by integrating with seed-based network strategy. Moreover, a successful application of the network-based gene prioritization model in influenza host genetic study further demonstrated the capacity of the model in identifying promising candidates and mining of new risk genes and pathways not biased toward our current knowledge. In conclusion, an efficient NGS analysis framework from the steps of quality control and variant detection, to those of result analysis and gene prioritization has been constructed for medical genetics. The novelty in this framework is an encouraging attempt to prioritize risk genes for not well-characterized diseases by network analysis on known disease genes and differential expression data. The successful applications in detecting genetic factors associated with CHD and influenza host resistance demonstrated the efficacy of this framework. And this may further stimulate more applications of high throughput genomic data in dissecting the genetic components of human disorders in the near future.
published_or_final_version
Biochemistry
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
10

Chan, Pui-yee, and 陳沛儀. "A study on predicting gene relationship from a computational perspective." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30461352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Ho, Ngai-lam, and 何毅林. "Algorithms on constrained sequence alignment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30201949.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Powell, David Richard 1973. "Algorithms for sequence alignment." Monash University, School of Computer Science and Software Engineering, 2001. http://arrow.monash.edu.au/hdl/1959.1/8051.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Yim, Cheuk-hon Terence, and 嚴卓漢. "Approximate string alignment and its application to ESTs, mRNAs and genome mapping." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31455736.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

高銘謙 and Ming-him Ko. "A multi-agent model for DNA analysis." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31222778.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Leung, Chi-ming, and 梁志銘. "Motif discovery for DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B3859755X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Camerlengo, Terry Luke. "Techniques for Storing and Processing Next-Generation DNA Sequencing Data." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1388502159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

桂宏胜 and Hongsheng Gui. "Data mining of post genome-wide association studies and next generation sequencing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/193431.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Labuschagne, Jan Phillipus Lourens. "Development of a data processing toolkit for the analysis of next-generation sequencing data generated using the primer ID approach." University of the Western Cape, 2018. http://hdl.handle.net/11394/6736.

Full text
Abstract:
Philosophiae Doctor - PhD
Sequencing an HIV quasispecies with next generation sequencing technologies yields a dataset with significant amplification bias and errors resulting from both the PCR and sequencing steps. Both the amplification bias and sequencing error can be reduced by labelling each cDNA (generated during the reverse transcription of the viral RNA to DNA prior to PCR) with a random sequence tag called a Primer ID (PID). Processing PID data requires additional computational steps, presenting a barrier to the uptake of this method. MotifBinner is an R package designed to handle PID data with a focus on resolving potential problems in the dataset. MotifBinner groups sequences into bins by their PID tags, identifies and removes false unique bins, produced from sequencing errors in the PID tags, as well as removing outlier sequences from within a bin. MotifBinner produces a consensus sequence for each bin, as well as a detailed report for the dataset, detailing the number of sequences per bin, the number of outlying sequences per bin, rates of chimerism, the number of degenerate letters in the final consensus sequences and the most divergent consensus sequences (potential contaminants). We characterized the ability of the PID approach to reduce the effect of sequencing error, to detect minority variants in viral quasispecies and to reduce the rates of PCR induced recombination. We produced reference samples with known variants at known frequencies to study the effectiveness of increasing PCR elongation time, decreasing the number of PCR cycles, and sample partitioning, by means of dPCR (droplet PCR), on PCR induced recombination. After sequencing these artificial samples with the PID approach, each consensus sequence was compared to the known variants. There are complex relationships between the sample preparation protocol and the characteristics of the resulting dataset. We produce a set of recommendations that can be used to inform sample preparation that is the most useful the particular study. The AMP trial infuses HIV-negative patients with the VRC01 antibody and monitors for HIV infections. Accurately timing the infection event and reconstructing the founder viruses of these infections are critical for relating infection risk to antibody titer and homology between the founder virus and antibody binding sites. Dr. Paul Edlefsen at the Fred Hutch Cancer Research Institute developed a pipeline that performs infection timing and founder reconstruction. Here, we document a portion of the pipeline, produce detailed tests for that portion of the pipeline and investigate the robustness of some of the tools used in the pipeline to violations of their assumptions.
APA, Harvard, Vancouver, ISO, and other styles
19

Ye, Lin, and 叶林. "Exploring microbial community structures and functions of activated sludge by high-throughput sequencing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B48079649.

Full text
Abstract:
To investigate the diversities and abundances of nitrifiers and to apply the highthroughput sequencing technologies to analyze the overall microbial community structures and functions in the wastewater treatment bioreactors were the major objectives of this study. Specifically, this study was conducted: (1) to investigate the diversities and abundances of AOA, AOB and NOB in bioreactors, (2) to explore the bacterial communities in bioreactors using 454 pyrosequencing, and (3) to analyze the metagenomes of activated sludge using Illumina sequencing. A lab-scale nitrification bioreactor was operated for 342 days under low DO (0.15~0.5 mg/L) and high nitrogen loading (0.26~0.52 kg-N/(m3d)). T-RFLP and cloning analysis showed there were only one dominant AOA, AOB and NOB species in the bioreactor, respectively. The amoA gene of the dominant AOA had a similarity of 89.3% with the isolated AOA species Nitrosopumilus maritimus SCM1. The AOB species detected in the bioreactor belonged to Nitrosomonas genus. The abundance of AOB was more than 40 times larger than that of AOA. The percentage of NOB in total bacteria increased from not detectable to 30% when DO changed from 0.15 to 0.5 mg/L. Compared with traditional methods, pyrosequencing analysis of the bacteria in this bioreactor provided unprecedented information. 494 bacterial OTUs was obtained at 3% distance cutoff. Furthermore, 454 pyrosequencing was applied to investigate the bacterial communities of activated sludge samples from 14 WWTPs of Asia (mainland China, Hong Kong, and Singapore) and North America (Canada and the United States). The results revealed huge amounts of OTUs in activated sludge, i.e. 1183~3567 OTUs in one sludge sample at 3% distance cutoff. Clear geographical differences among these samples were observed. The AOB amoA genes in different WWTPs were found quite diverse while the 16S rRNA genes were relatively conserved. To explore microbial community structures and functions in the abovementioned labscale bioreactor and a full-scale bioreactor, over six gigabases of metagenomic sequence data and 150,000 paired-end reads of PCR amplicons were generated from the activated sludge in the two bioreactors on Illumina HiSeq2000 platform. Three kinds of sequences (16S rRNA amplicons, 16S rRNA gene tags and predicted genes) were used to conduct taxonomic assignment and their applicabilities and reliabilities were compared. Specially, based on 16S rRNA and amoA gene sequences, AOB were found more abundant than AOA in the two bioreactors. Furthermore, the analysis of the metabolic profiles and pathways indicated that the overall pathways in the two bioreactors were quite similar. However, the abundances of some specific genes in the two bioreactors were different. In addition, 454 pyrosequencing was also used to detect potentially pathogenic bacteria in environmental samples. It was found most abundant potentially pathogenic bacteria in the WWTPs were affiliated with Aeromonas and Clostridium. Aeromonas veronii, Aeromonas hydrophila and Clostridium perfringens were species most similar to the potentially pathogenic bacteria found in this study. Overall, the percentage of the sequences closely related to known pathogenic bacteria sequences was about 0.16% of the total sequences. Additionally, a Java application (BAND) was developed for graphical visualization of microbial abundance data.
published_or_final_version
Civil Engineering
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
20

Dalke, Trevor. "Data Chunking in Quasi-Synchronous DS-CDMA." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1187.

Full text
Abstract:
DS-CDMA is a popular multiple access technique used in many mobile networks to efficiently share channel resources between users in a cell. Synchronization between users maximizes the user capacity of these systems. However, it is difficult to perfectly synchronize users in the reverse link due to the geographic diversity of mobile users in the cell. As a result, most commercial DS-CDMA networks utilize an asynchronous reverse link resulting in a reduced user capacity. A possible compromise to increase the user capacity in the reverse link is to implement a quasi-synchronous timing scheme, a timing scheme in which users are allowed to be slightly out of synchronization. This paper suggests a possible way to implement a quasi-synchronous DS-CDMA reverse link using the method of “data chunking”. The basic premise is derived by making a link between TDMA and synchronous DS-CDMA. By considering some basic TDMA limitations, a proposed “data chunked” quasi-synchronous DS-CDMA system is derived from a TDMA system. The effects of such a system are compared to those of a chip interleaved system. MATLAB simulations are performed to analyze the performance of the system in the presence of small synchronization errors between users. Implementation of guard bands is explored to further reduce errors due to imperfect synchronization between users.
APA, Harvard, Vancouver, ISO, and other styles
21

Zeng, Shuai, and 曾帥. "Predicting functional impact of nonsynonymous mutations by quantifying conservation information and detect indels using split-read approach." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/198818.

Full text
Abstract:
The rapidly developing sequencing technology has brought up an opportunity to scientists to look into the detailed genotype information in human genome. Computational programs have played important roles in identifying disease related genomic variants from huge amount of sequencing data. In the past years, a number of computational algorithms have been developed, solving many crucial problems in sequencing data analysis, such as mapping sequencing reads to genome and identifying SNPs. However, many difficult and important issues are still expecting satisfactory solutions. A key challenge is identifying disease related mutations in the background of non-pathogenic polymorphisms. Another crucial problem is detecting INDELs especially the long deletions under the technical limitations of second generation sequencing technology. To predict disease related mutations, we developed a machine learning-based (Random forests) prediction tool, EFIN (Evaluation of Functional Impact of Nonsynonymous mutations). We build A Multiple Sequence Alignment (MSA) for a querying protein with its homologous sequences. MSA is later divided into different blocks according to taxonomic information of the sequences. After that, we quantified the conservation in each block using a number of selected features, for example, entropy, a concept borrowed from information theory. EFIN was trained by Swiss-Prot and HumDiv datasets. By a series of fair comparisons, EFIN showed better results than the widely-used algorithms in terms of AUC (Area under ROC curve), accuracy, specificity and sensitivity. The web-based database is provided to worldwide user at paed.hku.hk/efin. To solve the second problem, we developed Linux-based software, SPLindel that detects deletions (especially long deletions) and insertions using second generation sequencing data. For each sample, SPLindel uses split-read method to detect the candidate INDELs by building alternative references to go along with the reference sequences. And then we remap all the relevant reads using both original references and alternative allele references. A Bayesian model integrating paired-end information was used to assign the reads to the most likely locations on either the original reference allele or the alternative allele. Finally we count the number of reads that support the alternative allele (with insertion or deletions comparing to the original reference allele) and the original allele, and fit a beta-binomial mixture model. Based on this model, the likelihood for each INDEL is calculated and the genotype is predicted. SPLindel runs about the same speed as GATK and DINDEL, but much faster than DINDEL. SPLindel obtained very similar results as GATK and DINDEL for the INDELs of size 1-15 bps, but is much more effective in detecting INDELs of larger size. Using machine learning method and statistical modeling technology, we proposed the tools to solve these two important problems in sequencing data analysis. This study will help identify novel damaging nsSNPs more accurately and efficiently, and equip researcher with more powerful tool in identifying INDELs, especially long deletions. As more and more sequencing data are generated, methods and tools introduced in this thesis may help us extract useful information to facilitate identification of causal mutations to human diseases.
published_or_final_version
Paediatrics and Adolescent Medicine
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
22

Murrel, Benjamin. "Improved models of biological sequence evolution." Thesis, Stellenbosch : Stellenbosch University, 2012. http://hdl.handle.net/10019.1/71870.

Full text
Abstract:
Thesis (PhD)--Stellenbosch University, 2012.
ENGLISH ABSTRACT: Computational molecular evolution is a field that attempts to characterize how genetic sequences evolve over phylogenetic trees – the branching processes that describe the patterns of genetic inheritance in living organisms. It has a long history of developing progressively more sophisticated stochastic models of evolution. Through a probabilist’s lens, this can be seen as a search for more appropriate ways to parameterize discrete state continuous time Markov chains to better encode biological reality, matching the historical processes that created empirical data sets, and creating useful tools that allow biologists to test specific hypotheses about the evolution of the organisms or the genes that interest them. This dissertation is an attempt to fill some of the gaps that persist in the literature, solving what we see as existing open problems. The overarching theme of this work is how to better model variation in the action of natural selection at multiple levels: across genes, between sites, and over time. Through four published journal articles and a fifth in preparation, we present amino acid and codon models that improve upon existing approaches, providing better descriptions of the process of natural selection and better tools to detect adaptive evolution.
AFRIKAANSE OPSOMMING: Komputasionele molekulêre evolusie is ’n navorsingsarea wat poog om die evolusie van genetiese sekwensies oor filogenetiese bome – die vertakkende prosesse wat die patrone van genetiese oorerwing in lewende organismes beskryf – te karakteriseer. Dit het ’n lang geskiedenis waartydens al hoe meer gesofistikeerde waarskynlikheidsmodelle van evolusie ontwikkel is. Deur die lens van waarskynlikheidsleer kan hierdie proses gesien word as ’n soektog na meer gepasde metodes om diskrete-toestand kontinuë-tyd Markov kettings te parametriseer ten einde biologiese realiteit beter te enkodeer – op so ’n manier dat die historiese prosesse wat tot die vorming van biologiese sekwensies gelei het nageboots word, en dat nuttige metodes geskep word wat bioloë toelaat om spesifieke hipotesisse met betrekking tot die evolusie van belanghebbende organismes of gene te toets. Hierdie proefskrif is ’n poging om sommige van die gapings wat in die literatuur bestaan in te vul en bestaande oop probleme op te los. Die oorkoepelende tema is verbeterde modellering van variasie in die werking van natuurlike seleksie op verskeie vlakke: variasie van geen tot geen, variasie tussen posisies in gene en variasie oor tyd. Deur middel van vier gepubliseerde joernaalartikels en ’n vyfde artikel in voorbereiding, bied ons aminosuur- en kodon-modelle aan wat verbeter op bestaande benaderings – hierdie modelle verskaf beter beskrywings van die proses van natuurlike seleksie sowel as beter metodes om gevalle van aanpassing in evolusie te vind.
APA, Harvard, Vancouver, ISO, and other styles
23

Siu, Man-hung, and 蕭文鴻. "Finding motif pairs from protein interaction networks." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B40987760.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Yue. "Detection copy number variants profile by multiple constrained optimization." HKBU Institutional Repository, 2017. https://repository.hkbu.edu.hk/etd_oa/439.

Full text
Abstract:
Copy number variation, causing by the genome rearrangement, generally refers to the copy numbers increased or decreased of large genome segments whose lengths are more than 1kb. Such copy number variations mainly appeared as the sub-microscopic level of deletion and duplication. Copy number variation is an important component of genome structural variation, and is one of pathogenic factors of human diseases. Next generation sequencing technology is a popular CNV detection method and it has been widely used in various fields of life science research. It possesses the advantages of high throughput and low cost. By tailoring NGS technology, it is plausible to sequence individual cells. Such single cell sequencing can reveal the gene expression status and genomic variation profile of a single-cell. Single cell sequencing is promising in the study of tumor, developmental biology, neuroscience and other fields. However, there are two challenging problems encountered in CNV detection for NGS data. The first one is that since single-cell sequencing requires a special genome amplification step to accumulate enough samples, a large number of bias is introduced, making the calling of copy number variants rather challenging. The performances of many popular copy number calling methods, designed for bulk sequencings, are not consistent and cannot be applied on single-cell sequenced data directly. The second one is to simultaneously analyze genome data for multiple samples, thus achieving assembling and subgrouping similar cells accurately and efficiently. The high level of noises in single-cell-sequencing data negatively affects the reliability of sequence reads and leads to inaccurate patterns of variations. To handle the problem of reliably finding CNVs in NGS data, in this thesis, we firstly establish a workflow for analyzing NGS and single-cell sequencing data. The CNVs identification is formulated as a quadratic optimization problem with both constraints of sparsity and smoothness. Tailored from alternating direction minimization (ADM) framework, an efficient numerical solution is designed accordingly. The proposed model was tested extensively to demonstrate its superior performances. It is shown that the proposed approach can successfully reconstruct CNVs especially somatic copy number alteration patterns from raw data. By comparing with existing counterparts, it achieved superior or comparable performances in detection of the CNVs. To tackle this issue of recovering the hidden blocks within multiple single-cell DNA-sequencing samples, we present an permutation based model to rearrange the samples such that similar ones are positioned adjacently. The permutation is guided by the total variational (TV) norm of the recovered copy number profiles, and is continued until the TV-norm is minimized when similar samples are stacked together to reveal block patterns. Accordingly, an efficient numerical scheme for finding this permutation is designed, tailored from the alternating direction method of multipliers. Application of this method to both simulated and real data demonstrates its ability to recover the hidden structures of single-cell DNA sequences.
APA, Harvard, Vancouver, ISO, and other styles
25

So, Wai-ki, and 蘇慧琪. "Shadow identification in traffic video sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B32045967.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Küchler, Andreas. "Adaptive processing of structural data: from sequences to trees and beyond." Ulm : Universität Ulm, Fakultät für Informatik, 1999. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB8541389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Wu, Qinyi. "Partial persistent sequences and their applications to collaborative text document editing and processing." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/44916.

Full text
Abstract:
In a variety of text document editing and processing applications, it is necessary to keep track of the revision history of text documents by recording changes and the metadata of those changes (e.g., user names and modification timestamps). The recent Web 2.0 document editing and processing applications, such as real-time collaborative note taking and wikis, require fine-grained shared access to collaborative text documents as well as efficient retrieval of metadata associated with different parts of collaborative text documents. Current revision control techniques only support coarse-grained shared access and are inefficient to retrieve metadata of changes at the sub-document granularity. In this dissertation, we design and implement partial persistent sequences (PPSs) to support real-time collaborations and manage metadata of changes at fine granularities for collaborative text document editing and processing applications. As a persistent data structure, PPSs have two important features. First, items in the data structure are never removed. We maintain necessary timestamp information to keep track of both inserted and deleted items and use the timestamp information to reconstruct the state of a document at any point in time. Second, PPSs create unique, persistent, and ordered identifiers for items of a document at fine granularities (e.g., a word or a sentence). As a result, we are able to support consistent and fine-grained shared access to collaborative text documents by detecting and resolving editing conflicts based on the revision history as well as to efficiently index and retrieve metadata associated with different parts of collaborative text documents. We demonstrate the capabilities of PPSs through two important problems in collaborative text document editing and processing applications: data consistency control and fine-grained document provenance management. The first problem studies how to detect and resolve editing conflicts in collaborative text document editing systems. We approach this problem in two steps. In the first step, we use PPSs to capture data dependencies between different editing operations and define a consistency model more suitable for real-time collaborative editing systems. In the second step, we extend our work to the entire spectrum of collaborations and adapt transactional techniques to build a flexible framework for the development of various collaborative editing systems. The generality of this framework is demonstrated by its capabilities to specify three different types of collaborations as exemplified in the systems of RCS, MediaWiki, and Google Docs respectively. We precisely specify the programming interfaces of this framework and describe a prototype implementation over Oracle Berkeley DB High Availability, a replicated database management engine. The second problem of fine-grained document provenance management studies how to efficiently index and retrieve fine-grained metadata for different parts of collaborative text documents. We use PPSs to design both disk-economic and computation-efficient techniques to index provenance data for millions of Wikipedia articles. Our approach is disk economic because we only save a few full versions of a document and only keep delta changes between those full versions. Our approach is also computation-efficient because we avoid the necessity of parsing the revision history of collaborative documents to retrieve fine-grained metadata. Compared to MediaWiki, the revision control system for Wikipedia, our system uses less than 10% of disk space and achieves at least an order of magnitude speed-up to retrieve fine-grained metadata for documents with thousands of revisions.
APA, Harvard, Vancouver, ISO, and other styles
28

Chapple, Charles E. "Finding a needle in haystack: the Eukaryotic selenoproteome." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7184.

Full text
Abstract:
Les selenoproteïnes constitueixen una família diversa de proteïnes, caracteritzada per la presència del Seleni (Se), en forma de l'amino àcid atípic, la selenocisteïna (Sec). La selenocisteïna, coneguda com l'amino àcid 21, és similar a la cisteïna (Cys) amb un àtom de seleni en lloc de sofre (S). Les selenoproteïnes són els responsables majoritaris dels efectes biològics del seleni i s'ha observat que poden estar implicades en la infertilitat masculina, el càncer, algunes malalties coronàries,l'activació de virus latents i l'envelliment. La selenocisteïna es codifica pel codó UGA, normalment codó de parada (STOP). Per a la recodificació correcta del UGA són necessaris diversos factors. A la part 3' de la regió no traduïda (UTR) dels transcrits dels gens de selenoproteïnes en organismes eucariotes s'hi troba una estructura de "stem-loop" anomenada SECIS. La proteïna SBP2 interactua amb el SECIS, així com amb el ribosoma, i forma un complex amb el factor d'elongació EFsec i el tRNA de la selenocisteïna, el tRNASec. Donat que el codó TGA normalment significa fi de la traducció, les formes tradicionals de cerca de gens no el reconeixen com a codó codificant. Per aquesta raó ha estat necessari desenvolupar una metodologia específica per a la predicció de gens de selenoproteïnes. En els últims anys, hem contribuït a la descripció del selenoproteoma eucariota amb el descobriment de noves famílies (Castellano et al., 2005), amb l'elaboració de nous mètodes (Taskov et al., 2005; Chapple et al., 2009) i l'anotació de diferents genomes (Jaillon et al., 2004; Drosophila 12 genomes Consortium, 2007; Bovine Genome Sequencing and Analysis Consortium, 2009). Finalment, hem identificat el primer animal que no té selenoproteïnes (Drosophila 12 genomes Consortium, 2007; Chapple and Guigó, 2008), un descobriment soprenent donat que, fins el moment, es creia que les selenoproteïnes eren essencials per la vida animal.
Selenoproteins are a diverse family of proteins containing the trace element Selenium (Se)in the form of the non-canonical amino acid selenocysteine (Sec). Selenocysteine, the 21st amino acid, is similar to cysteine (Cys)but with Se replacing Sulphur. In many cases the homologous gene of a known selenoprotein is present with cysteine in the place of Sec in a different genome. Selenoproteins are believed to be the effectors of the biological functions of Selenium and have been implicated in male infertility, cancer and heart diseases, viral expression and ageing. Selenocysteine is coded by the opal STOP codon (TGA). A number of factors combine to achieve the co-translational recoding of TGA to Sec. The 3' Untranslated regions (UTRs) of eukaryotic selenoprotein transcripts contain a stem-loop structure called a Sec Insertion Sequence (SECIS) element. This is recognised by the Secis Binding Protein 2 (SBP2), which binds to both the SECIS element and the ribosome. SBP2, in turn, recruits the Sec-specific Elongation Factor EFsec, and the selenocysteine transfer RNA, tRNASec. The dual meaning of the TGA codon means that selenoprotein genes are often mispredicted by the standard annotation pipelines. The correct prediction of these genes, therefore, requires the development of specific methods. In the past few years we have contributed significally to the description of the eukaryotic selenoproteome2 with the discovery of novel families (Castellano et al., 2005), the elaboration of novel methods (Taskov et al., 2005; Chapple et al., 2009) and the annotation of different genomes (Jaillon et al., 2004; Drosophila 12 genomes Consortium, 2007; Bovine Genome Sequencing and Analysis Consortium, 2009). Finally, and perhaps most importantly, we have identified the first animal to lack selenoprotein genes (Drosophila 12 genomes Consortium, 2007; Chapple and Guigó, 2008). This last finding is particularly surprising because it had previously been believed that selenoproteins were essential for animal life.
APA, Harvard, Vancouver, ISO, and other styles
29

Cheng, Lok-lam, and 鄭樂霖. "Approximate string matching in DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Patterson, Joel E. "The porting of the MCC Extensible Software Platform to the Sequent Symmetry." Master's thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-04272010-020129/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Küchler, Andreas [Verfasser]. "Adaptive processing of structural data: from sequences to trees and beyond / Andreas Küchler, Andreas." Ulm : Universität Ulm. Fakultät für Informatik, 2000. http://d-nb.info/1015211518/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Lundberg, Jesper, and Ronja Mehtonen. "Utvärdering och analys av batchstorlekar, produktsekvenser och omställningstider." Thesis, Högskolan i Skövde, Institutionen för ingenjörsvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11859.

Full text
Abstract:
Volvo GTO is one of the strongest brands in the truck industry, with a long and proud history of world-leading innovations. The factory in Skövde produces diesel engines of various sizes to Volvo GTO. The project has been carried out on the processing part of grovdel crankshaft. Where the objective was to construct a simulation model that reflects flows 0, 1 and 2 on the crankshaft grovdel order to produce the best driving style for the size of the batches and sequences, focusing on PIA, between the stock and conversion-up times. A theoretical study intervention gave knowledge to the methodology to ensure that the data is collected and processed correctly. The data were collected in an Excel document, which was integrated with the simulation model for an overview and adjustments would be possible. The simulation program, Siemens Plant Simulation 12 used in the construction of the complex model of the three flows, which where verified and validated against the real flows. Optimization on the simulation model was made with a high and a low demand for crankshafts. Several objects were taken into consideration as: minimal waiting processing Findel, minimal setup time and minimal total-PIA from a truly viable perspective. The optimization showed a possible production planning in order to best be able to run such large batches as possible with reduced readjustment time and for delays of production in processing rawflows to not occur in the refined flow. For maximum capacity in the company there are two different optimal solutions one solution focused on reducing setup time and the second solution to minimize the number of additional production hours per week. Discrete simulation of production flows are being used to support production planning and simulation model is created for the continued use of the Volvo GTO, either in simulation group or future researches and theses in collaboration with the University of Skövde. The project objectives were achieved with good results and resulted as a standing basis for future planning of batches and sequences of processing crankshaft Volvo GTO.
APA, Harvard, Vancouver, ISO, and other styles
33

Krejčí, Michal. "Komprese dat." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-217934.

Full text
Abstract:
This thesis deals with lossless and losing methods of data compressions and their possible applications in the measurement engineering. In the first part of the thesis there is a theoretical elaboration which informs the reader about the basic terminology, the reasons of data compression, the usage of data compression in standard practice and the division of compression algorithms. The practical part of thesis deals with the realization of the compress algorithms in Matlab and LabWindows/CVI.
APA, Harvard, Vancouver, ISO, and other styles
34

Renzullo, Luigi John. "Radiometric processing of multitemporal sequences of satellite imagery for surface reflectance retrievals in change detection studies." Curtin University of Technology, Department of Applied Physics, 2004. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=15737.

Full text
Abstract:
A relative, lie-value image normalisation (LVIN) procedure was investigated as a means of estimating surface reflectances from sequences of Landsat TM and ETM+ imagery, and standardising image data for change detection studies when there are uncertainties in sensor calibration and atmospheric parameters over time. The basis of the LVIX procedure is that for an A-date sequence, the digital numbers (DNs) of N-1 overpass images can be mapped to the reflectance values of a reference image for a set of pseudo- invariant targets (PITs) common to all images in the sequence. The robust M-estimator was employed to provide the transformation function that achieved the mapping. The investigation also showed that in some instances the LVIN procedure could incorporate the modelled Path DN-the modelled DN for a target of zero surface reflectance. A lack of surface validation data was a limitation in the investigation. However, a qualitative evaluation of the LVIN procedure was possible by examining the pre- and post-normalisation image histograms. In a comparison with the results of the 6S radiative transfer code, it war observed that when both overpass and reference images were acquired with the same sensor, the LVIK procedure appeared t o correct for atmospheric effects; and when overpass and reference images were with different sensors, the LVIN procedure also corrected for between-sensor differences. Moreover, it was demonstrated for the more "temporally-invariant" PITs that the procedure retrieved surface reflectances that were on average within ±0.02 reflectance units.
The ability of the LVIK procedure to standardise sequences of image data was further demonstrated in the study of vegetation change. The normalised difference vegetation index (NDVI) was calculated from LVIN estimates of surface reflectance for a selection of sites around the township of Mt. Barker, Western Australia. NDVI data had characteristics consistent with data that have been corrected for atmospheric effects. A modification to the LVIN procedure was also proposed based on an investigation of some empirically-derived vegetation reflectance relationships. Research into the robustness of the relationships for a greater range of vegetation types is recommended.
APA, Harvard, Vancouver, ISO, and other styles
35

Kornfeil, Vojtěch. "Soubor úloh pro kurs Sběr, analýza a zpracování dat." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217707.

Full text
Abstract:
This thesis proposes tasks of exercises for mentioned course and design and creation of automated evaluation system for these exercises. This thesis focuses on discussion and exemplary solutions of possible tasks of each exercise and description of created automated evaluation system. For evaluation program are made tests with chosen special data sets, which will prove it’s functionality in general data sets.
APA, Harvard, Vancouver, ISO, and other styles
36

Oliveira, Fábio Borges de. "Analysis of the cryptography security and steganography in images sequences." Laboratório Nacional de Computação Científica, 2007. http://www.lncc.br/tdmc/tde_busca/arquivo.php?codArquivo=134.

Full text
Abstract:
Information security is being considered of great importance to the private and governamental institutions. For this reason, we opted to conduct a study of security in this dissertation. We started with an introduction to the information theory, and then we proposed a new kind of Perfect Secrecy cryptographic and finally made a study of steganography in an image sequence, in which we suggest a more aggressive steganography in coefficients of the discrete cosine transform.
A segurança da informação vem sendo considerada de grande importância para as instituições privadas e governamentais. Por este motivo, optamos em realizar um estudo sobre segurança nesta dissertação. Iniciamos com uma introdução à teoria da informação, partimos para métodos de criptografia onde propomos um novo tipo de Segredo Perfeito e finalmente fazemos um estudo de esteganografia em uma sequência de imagens, onde propomos uma esteganografia mais agressiva nos coeficientes da transformada discreta de cosseno.
APA, Harvard, Vancouver, ISO, and other styles
37

Battikh, Dalia. "Sécurité de l’information par stéganographie basée sur les séquences chaotiques." Thesis, Rennes, INSA, 2015. http://www.theses.fr/2015ISAR0013/document.

Full text
Abstract:
La stéganographie est l’art de la dissimulation de l’information secrète dans un médium donné (cover) de sorte que le médium résultant (stégo) soit quasiment identique au médium cover. De nos jours, avec la mondialisation des échanges (Internet, messagerie et commerce électronique), s’appuyant sur des médiums divers (son, image, vidéo), la stéganographie moderne a pris de l’ampleur. Dans ce manuscrit, nous avons étudié les méthodes de stéganographie LSB adaptatives, dans les domaines spatial et fréquentiel (DCT, et DWT), permettant de cacher le maximum d’information utile dans une image cover, de sorte que l’existence du message secret dans l’image stégo soit imperceptible et pratiquement indétectable. La sécurité du contenu du message, dans le cas de sa détection par un adversaire, n’est pas vraiment assurée par les méthodes proposées dans la littérature. Afin de résoudre cette question, nous avons adapté et implémenté deux méthodes (connues) de stéganographie LSB adaptatives, en ajoutant un système chaotique robuste permettant une insertion quasi-chaotique des bits du message secret. Le système chaotique proposé consiste en un générateur de séquences chaotiques robustes fournissant les clés dynamiques d’une carte Cat 2-D chaotique modifiée. La stéganalyse universelle (classification) des méthodes de stéganographie développées est étudiée. A ce sujet, nous avons utilisé l’analyse discriminante linéaire de Fisher comme classifieur des vecteurs caractéristiques de Farid, Shi et Wang. Ce choix est basé sur la large variété de vecteurs caractéristiques testés qui fournissent une information sur les propriétés de l’image avant et après l’insertion du message. Une analyse des performances des trois méthodes de stéganalyse développées, appliquées sur des images stégo produites par les deux méthodes de stéganographie LSB adaptatives proposées, est réalisée. L’évaluation des résultats de la classification est réalisée par les paramètres: sensibilité, spécificité, précision et coefficient Kappa
Steganography is the art of the dissimulation of a secret message in a cover medium such that the resultant medium (stego) is almost identical to the cover medium. Nowadays, with the globalization of the exchanges (Internet, messaging and e-commerce), using diverse mediums (sound, embellish with images, video), modern steganography is widely expanded. In this manuscript, we studied adaptive LSB methods of stéganography in spatial domain and frequency domain (DCT, and DWT), allowing of hiding the maximum of useful information in a cover image, such that the existence of the secret message in the stégo image is imperceptible and practically undetectable. Security of the message contents, in the case of its detection by an opponent, is not really insured by the methods proposed in the literature. To solve this question, we adapted and implemented two (known) methods of adaptive stéganographie LSB, by adding a strong chaotic system allowing a quasi-chaotic insertion of the bits of the secret message. The proposed chaotic system consists of a generator of strong chaotic sequences, supplying the dynamic keys of a modified chaotic 2D Cat map. Universal steganalysis (classification) of the developed methods of stéganography, is studied. On this question, we used the linear discriminating analysis of Fisher as classifier of the characteristic vectors of Farid, Shi and Wang. This choice is based on the wide variety of tested characteristic vectors that give an information about the properties of the image before and after message insertion. An analysis of the performances of three developed methods of steganalysis, applied to the produced stego images by the proposed adaptive methods of stéganography, is realized. Performance evaluation of the classification is realized by using the parameters: sensibility, specificity, precision and coefficient Kappa
APA, Harvard, Vancouver, ISO, and other styles
38

Matulík, Martin. "Modelování a animace biologických struktur." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-377662.

Full text
Abstract:
Following work deals with subject matter of digital modelling and animation of biological structures. Software tools for computer generated images (CGI), well proven in common practice, are evaluated, as well as tools for specific activities, available inside chosen software environment. Among vast pool of modelling approaches are discussed tools suitable for creation and representation of selected structures, along with tools essential for their consequent animation. Possible rendering approaches and their parameters in relation to qualities of resulting computer-generated images are discussed as well. Above-mentioned approaches will be consequently utilized for modelling, physical simulation and animation of erythrocyte’s flow throughout blood vessel in following project. Resulting output of that work will be based on series of digital images, suitable for creating video-sequence containing abovementioned animation in end-user digestible form.
APA, Harvard, Vancouver, ISO, and other styles
39

Wang, Shu 1973. "On multiple sequence alignment." Thesis, 2007. http://hdl.handle.net/2152/3715.

Full text
Abstract:
The tremendous increase in biological sequence data presents us with an opportunity to understand the molecular and cellular basis for cellular life. Comparative studies of these sequences have the potential, when applied with sufficient rigor, to decipher the structure, function, and evolution of cellular components. The accuracy and detail of these studies are directly proportional to the quality of these sequences alignments. Given the large number of sequences per family of interest, and the increasing number of families to study, improving the speed, accuracy and scalability of MSA is becoming an increasingly important task. In the past, much of interest has been on Global MSA. In recent years, the focus for MSA has shifted from global MSA to local MSA. Local MSA is being needed to align variable sequences from different families/species. In this dissertation, we developed two new algorithms for fast and scalable local MSA, a three-way-consistency-based MSA and a biclustering -based MSA. The first MSA algorithm is a three-way-Consistency-Based MSA (CBMSA). CBMSA applies alignment consistency heuristics in the form of a new three-way alignment to MSA. While three-way consistency approach is able to maintain the same time complexity as the traditional pairwise consistency approach, it provides more reliable consistency information and better alignment quality. We quantify the benefit of using three-way consistency as compared to pairwise consistency. We have also compared CBMSA to a suite of leading MSA programs and CBMSA consistently performs favorably. We also developed another new MSA algorithm, a biclustering-based MSA. Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in MSA is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering algorithms are intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was compared with a suite of leading MSA programs. With respect to quantitative measures of MSA, BlockMSA scores comparable to or better than the other leading MSA programs. With respect to biological validation of MSA, the other leading MSA programs lag BlockMSA in their ability to identify the most highly conserved regions.
APA, Harvard, Vancouver, ISO, and other styles
40

"Computational models for extracting structural signals from noisy high-throughput sequencing data: 通过计算模型来提取高通量测序数据中的分子结构信息." 2015. http://repository.lib.cuhk.edu.hk/en/item/cuhk-1291576.

Full text
Abstract:
Hu, Xihao.
Thesis Ph.D. Chinese University of Hong Kong 2015.
Includes bibliographical references (leaves 147-161).
Abstracts also in Chinese.
Title from PDF title page (viewed on 26, October, 2016).
Hu, Xihao.
APA, Harvard, Vancouver, ISO, and other styles
41

Morris, Joseph P. "An analysis pipeline for the processing, annotation, and dissemination of expressed sequence tags." 2009. http://etd.louisville.edu/data/UofL0482t2009.pdf.

Full text
Abstract:
Thesis (M.Eng.)--University of Louisville, 2009.
Title and description from thesis home page (viewed May 22, 2009). Department of Computer Engineering and Computer Science. Vita. "May 2009." Includes bibliographical references (p. 39-41).
APA, Harvard, Vancouver, ISO, and other styles
42

Evans, Patricia Anne. "Algorithms and complexity for annotated sequence analysis." Thesis, 1999. https://dspace.library.uvic.ca//handle/1828/8864.

Full text
Abstract:
Molecular biologists use algorithms that compare and otherwise analyze sequences that represent genetic and protein molecules. Most of these algorithms, however, operate on the basic sequence and do not incorporate the additional information that is often known about the molecule and its pieces. This research describes schemes to combinatorially annotate this information onto sequences so that it can be analyzed in tandem with the sequence; the overall result would thus reflect both types of information about the sequence. These annotation schemes include adding colours and arcs to the sequence. Colouring a sequence would produce a same-length sequence of colours or other symbols that highlight or label parts of the sequence. Arcs can be used to link sequence symbols (or coloured substrings) to indicate molecular bonds or other relationships. Adding these annotations to sequence analysis problems such as sequence alignment or finding the longest common subsequence can make the problem more complex, often depending on the complexity of the annotation scheme. This research examines the different annotation schemes and the corresponding problems of verifying annotations, creating annotations, and finding the longest common subsequence of pairs of sequences with annotations. This work involves both the conventional complexity framework and parameterized complexity, and includes algorithms and hardness results for both frameworks. Automata and transducers are created for some annotation verification and creation problems. Different restrictions on layered substring and arc annotation are considered to determine what properties an annotation scheme must have to make its incorporation feasible. Extensions to the algorithms that use weighting schemes are explored. schemes are explored.
Graduate
APA, Harvard, Vancouver, ISO, and other styles
43

Xu, Weijia. "On integrating biological sequence analysis with metric distance based database management systems." Thesis, 2006. http://hdl.handle.net/2152/2955.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Zwickl, Derrick Joel. "Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion." Thesis, 2006. http://hdl.handle.net/2152/2666.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

"Bioinformatics analyses for next-generation sequencing of plasma DNA." 2012. http://library.cuhk.edu.hk/record=b5549423.

Full text
Abstract:
1997年,Dennis等證明胚胎DNA在孕婦母體中存在的事實開啟了產前無創診斷的大門。起初的應用包括性別鑒定和恒河猴血型系統的識別。隨著二代測序的出現和發展,對外周血游離DNA更加成熟的分析和應用應運而生。例如當孕婦懷孕十二周時, 應用二代測序技術在母體外周血DNA中預測胎兒21號染色體是否是三倍體, 其準確性達到98%。本論文的第一部分介紹如何應用母體外周血DNA構建胎兒的全基因組遺傳圖譜。這項研究極具挑戰,原因是孕後12周,胎兒對外周血DNA貢獻很小,大多數在10%左右,另外外周血中的胎兒DNA大多數短於200 bp。目前的演算法和程式都不適合於從母體外周血DNA中構建胎兒的遺傳圖譜。在這項研究中,根據母親和父親的基因型,用生物資訊學手段先構建胎兒可能有的遺傳圖譜,然後將母體外周血DNA的測序資訊比對到這張可能的遺傳圖譜上。如果在母親純和遺傳背景下,決定父親的特異遺傳片段,只要定性檢測父親的特異遺傳片段是否在母體外周血中存在。如果在母親雜合遺傳背景下,決定母親的遺傳特性,就要進行定量分析。我開發了單倍型相對劑量分析方案,統計學上判斷母親外周血中的兩條單倍型相對劑量水準,顯著增加的單倍型即為最大可能地遺傳給胎兒的單倍型。單倍型相對劑量分析方案可以加強測序資訊的分析效率,降低測序數據波動,比單個位點分析更加穩定,強壯。
隨著靶標富集測序出現,測序價格急劇下降。第一部分運用母親父親的多態位點基因型的組合加上測序的資訊可以計算出胎兒DNA在母體外周血中的濃度。但是該方法的局限是要利用母親父親的多態位點的基因型,而不能直接從測序的資訊中推測胎兒DNA在母體外周血中的濃度。本論文的第二部分,我開發了基於二項分佈的混合模型直接預測胎兒DNA在母體外周血中的濃度。當混合模型的似然值達到最大的時候,胎兒DNA在母體外周血中的濃度得到最優估算。由於靶標富集測序可以提供高倍覆蓋的測序資訊,從而有機會直接根據概率模型識別出母親是純和而且胎兒是雜合的有特異信息量的位點。
除了母體外周血DNA水準分析推動產前無創診斷外,表觀遺傳學的分析也不容忽視。 在本論文的第三部分,我開發了Methyl-Pipe軟體,專門用於全基因組的甲基化的分析。甲基化測序數據分析比一般的基因組測序分析更加複雜。由於重亞硫酸鹽測序文庫的沒有甲基化的胞嘧啶轉化成尿嘧啶,最後以胸腺嘧啶的形式存在PCR產物中, 但是對於甲基化的胞嘧啶則保持不變。 因此,為了實現將重亞硫酸鹽處理過的測序序列比對到參考基因組。首先,分別將Watson和Crick鏈的參考基因組中胞嘧啶轉化成全部轉化為胸腺嘧啶,同時也將測序序列中的胞嘧啶轉化成胸腺嘧啶。然後將轉化後的測序序列比對到參考基因組上。最後根據比對到基因組上的測序序列中的胞嘧啶和胸腺嘧啶的含量推到全基因組的甲基化水準和甲基化特定模式。Methyl-Pipe可以用於識別甲基化水平顯著性差異的基因組區別,因此它可以用於識別潛在的胎兒特異的甲基化位點用於產前無創診斷。
The presence of fetal DNA in the cell-free plasma of pregnant women was first described in 1997. The initial clinical applications of this phenomenon focused on the detection of paternally inherited traits such as sex and rhesus D blood group status. The development of massively parallel sequencing technologies has allowed more sophisticated analyses on circulating cell-free DNA in maternal plasma. For example, through the determination of the proportional representation of chromosome 21 sequences in maternal plasma, noninvasive prenatal diagnosis of fetal Down syndrome can be achieved with an accuracy of >98%. In the first part of my thesis, I have developed bioinformatics algorithms to perform genome-wide construction of the fetal genetic map from the massively parallel sequencing data of the maternal plasma DNA sample of a pregnant woman. The construction of the fetal genetic map through the maternal plasma sequencing data is very challenging because fetal DNA only constitutes approximately 10% of the maternal plasma DNA. Moreover, as the fetal DNA in maternal plasma exists as short fragments of less than 200 bp, existing bioinformatics techniques for genome construction are not applicable for this purpose. For the construction of the genome-wide fetal genetic map, I have used the genome of the father and the mother as scaffolds and calculated the fractional fetal DNA concentration. First, I looked at the paternal specific sequences in maternal plasma to determine which portions of the father’s genome had been passed on to the fetus. For the determination of the maternal inheritance, I have developed the Relative Haplotype Dosage (RHDO) approach. This method is based on the principle that the portion of maternal genome inherited by the fetus would be present in slightly higher concentration in the maternal plasma. The use of haplotype information can enhance the efficacy of using the sequencing data. Thus, the maternal inheritance can be determined with a much lower sequencing depth than just looking at individual loci in the genome. This algorithm makes it feasible to use genome-wide scanning to diagnose fetal genetic disorders prenatally in a noninvasive way.
As the emergence of targeted massively parallel sequencing, the sequencing cost per base is reducing dramatically. Even though the first part of the thesis has already developed a method to estimate fractional fetal DNA concentration using parental genotype informations, it still cannot be used to deduce the fractional fetal DNA concentration directly from sequencing data without prior knowledge of genotype information. In the second part of this thesis, I propose a statistical mixture model based method, FetalQuant, which utilizes the maximum likelihood to estimate the fractional fetal DNA concentration directly from targeted massively parallel sequencing of maternal plasma DNA. This method allows fetal DNA concentration estimation superior to the existing methods in term of obviating the need of genotype information without loss of accuracy. Furthermore, by using Bayes’ rule, this method can distinguish the informative SNPs where mother is homozygous and fetus is heterozygous, which is potential to detect dominant inherited disorder.
Besides the genetic analysis at the DNA level, epigenetic markers are also valuable for noninvasive diagnosis development. In the third part of this thesis, I have also developed a bioinformatics algorithm to efficiently analyze genomewide DNA methylation status based on the massively parallel sequencing of bisulfite-converted DNA. DNA methylation is one of the most important mechanisms for regulating gene expression. The study of DNA methylation for different genes is important for the understanding of the different physiological and pathological processes. Currently, the most popular method for analyzing DNA methylation status is through bisulfite sequencing. The principle of this method is based on the fact that unmethylated cytosine residues would be chemically converted to uracil on bisulfite treatment whereas methylated cytosine would remain unchanged. The converted uracil and unconverted cytosine can then be discriminated on sequencing. With the emergence of massively parallel sequencing platforms, it is possible to perform this bisulfite sequencing analysis on a genome-wide scale. However, the bioinformatics analysis of the genome-wide bisulfite sequencing data is much more complicated than analyzing the data from individual loci. Thus, I have developed Methyl-Pipe, a bioinformatics program for analyzing the DNA methylation status of genome-wide methylation status of DNA samples based on massively parallel sequencing. In the first step of this algorithm, an in-silico converted reference genome is produced by converting all the cytosine residues to thymine residues. Then, the sequenced reads of bisulfite-converted DNA sequences are aligned to this modified reference sequence. Finally, post-processing of the alignments removes non-unique and low-quality mappings and characterizes the methylation pattern in genome-wide manner. Making use of this new program, potential fetal-specific hypomethylated regions which can be used as blood biomarkers can be identified in a genome-wide manner.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Jiang, Peiyong.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 100-105).
Abstracts also in Chinese.
Chapter SECTION I : --- BACKGROUND --- p.1
Chapter CHAPTER 1: --- Circulating nucleic acids and Next-generation sequencing --- p.2
Chapter 1.1 --- Circulating nucleic acids --- p.2
Chapter 1.2 --- Next-generation sequencing --- p.3
Chapter 1.3 --- Bioinformatics analyses --- p.9
Chapter 1.4 --- Applications of the NGS --- p.11
Chapter 1.5 --- Aims of this thesis --- p.12
Chapter SECTION II : --- Mathematically decoding fetal genome in maternal plasma --- p.14
Chapter CHAPTER 2: --- Characterizing the maternal and fetal genome in plasma at single base resolution --- p.15
Chapter 2.1 --- Introduction --- p.15
Chapter 2.2 --- SNP categories and principle --- p.17
Chapter 2.3 --- Clinical cases and SNP genotyping --- p.20
Chapter 2.4 --- Sequencing depth and fractional fetal DNA concentration determination --- p.24
Chapter 2.5 --- Filtering of genotyping errors for maternal genotypes --- p.26
Chapter 2.6 --- Constructing fetal genetic map in maternal plasma --- p.27
Chapter 2.7 --- Sequencing error estimation --- p.36
Chapter 2.8 --- Paternal-inherited alleles --- p.38
Chapter 2.9 --- Maternally-derived alleles by RHDO analysis --- p.39
Chapter 2.1 --- Recombination breakpoint simulation and detection --- p.49
Chapter 2.11 --- Prenatal diagnosis of β- thalassaemia --- p.51
Chapter 2.12 --- Discussion --- p.53
Chapter SECTION III : --- Statistical model for fractional fetal DNA concentration estimation --- p.56
Chapter CHAPTER 3: --- FetalQuant: deducing the fractional fetal DNA concentration from massively parallel sequencing of maternal plasma DNA --- p.57
Chapter 3.1 --- Introduction --- p.57
Chapter 3.2 --- Methods --- p.60
Chapter 3.2.1 --- Maternal-fetal genotype combinations --- p.60
Chapter 3.2.2 --- Binomial mixture model and likelihood --- p.64
Chapter 3.2.3 --- Fractional fetal DNA concentration fitting --- p.66
Chapter 3.3 --- Results --- p.71
Chapter 3.3.1 --- Datasets --- p.71
Chapter 3.3.2 --- Evaluation of FetalQuant algorithm --- p.75
Chapter 3.3.3 --- Simulation --- p.78
Chapter 3.3.4 --- Sequencing depth and the number of SNPs required by FetalQuant --- p.81
Chapter 3.5 --- Discussion --- p.85
Chapter SECTION IV : --- NGS-based data analysis pipeline development --- p.88
Chapter CHAPTER 4: --- Methyl-Pipe: Methyl-Seq bioinformatics analysis pipeline --- p.89
Chapter 4.1 --- Introduction --- p.89
Chapter 4.2 --- Methods --- p.89
Chapter 4.2.1 --- Overview of Methyl-Pipe --- p.90
Chapter 4.3 --- Results and discussion --- p.96
Chapter SECTION V : --- CONCLUDING REMARKS --- p.97
Chapter CHAPTER 5: --- Conclusion and future perspectives --- p.98
Chapter 5.1 --- Conclusion --- p.98
Chapter 5.2 --- Future perspectives --- p.99
Reference --- p.100
APA, Harvard, Vancouver, ISO, and other styles
46

"The analysis of cDNA sequences: an algorithm for alignment." 1997. http://library.cuhk.edu.hk/record=b5889263.

Full text
Abstract:
by Lam Fung Ming.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references (leaves 45-47).
Chapter CHAPTER 1 --- INTRODUCTION --- p.1
Chapter CHAPTER 2 --- BACKGROUND --- p.4
Section 2.1 DNA Cloning --- p.5
Section 2.1.1 Principles of cell-based DNA cloning --- p.5
Section 2.1.2. Polymerase Chain Reaction --- p.8
Section 2.2 DNA Libraries --- p.10
Section 2.3. Expressed Sequence Tags --- p.11
"Section 2.4 dbEST - Database for ""Expressed Sequence Tag""" --- p.13
Chapter CHAPTER 3 --- REDUCTION OF PARTIAL SEQUENCE REDUNDANCY AND CDNA ALIGNMENT --- p.15
Section 3.1 Materials --- p.15
Section 3.2 Our Algorithm --- p.16
Section 3.3 Data Storage --- p.24
Section 3.4 Criterion of Alignment --- p.27
Section 3.5 Pairwise Alignment --- p.29
Chapter CHAPTER 4 --- RESULTS AND DISCUSSION --- p.32
Chapter CHAPTER 5 --- CONCLUSION AND FUTURE DEVELOPMENT --- p.42
REFERENCES --- p.45
APPENDIX --- p.i
APA, Harvard, Vancouver, ISO, and other styles
47

"Applications of evolutionary algorithms on biomedical systems." 2007. http://library.cuhk.edu.hk/record=b5893179.

Full text
Abstract:
Tse, Sui Man.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 95-104).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.v
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Motivation --- p.1
Chapter 1.1.1 --- Basic Concepts and Definitions --- p.2
Chapter 1.2 --- Evolutionary Algorithms --- p.5
Chapter 1.2.1 --- Chromosome Encoding --- p.6
Chapter 1.2.2 --- Selection --- p.7
Chapter 1.2.3 --- Crossover --- p.9
Chapter 1.2.4 --- Mutation --- p.10
Chapter 1.2.5 --- Elitism --- p.11
Chapter 1.2.6 --- Niching --- p.11
Chapter 1.2.7 --- Population Manipulation --- p.13
Chapter 1.2.8 --- Building Blocks --- p.13
Chapter 1.2.9 --- Termination Conditions --- p.14
Chapter 1.2.10 --- Co-evolution --- p.14
Chapter 1.3 --- Local Search --- p.15
Chapter 1.4 --- Memetic Algorithms --- p.16
Chapter 1.5 --- Objective --- p.17
Chapter 1.6 --- Summary --- p.17
Chapter 2 --- Background --- p.18
Chapter 2.1 --- Multiple Drugs Tumor Chemotherapy --- p.18
Chapter 2.2 --- Bioinformatics --- p.22
Chapter 2.2.1 --- Basics of Bioinformatics --- p.24
Chapter 2.2.2 --- Applications on Biomedical Systems --- p.26
Chapter 3 --- A New Drug Administration Dynamic Model --- p.29
Chapter 3.1 --- Three Drugs Mathematical Model --- p.31
Chapter 3.1.1 --- Rate of Change of Different Subpopulations --- p.32
Chapter 3.1.2 --- Rate of Change of Different Drug Concen- trations --- p.35
Chapter 3.1.3 --- Toxicity Effects --- p.35
Chapter 3.1.4 --- Summary --- p.36
Chapter 4 --- Memetic Algorithm - Iterative Dynamic Program- ming (MA-IDP) --- p.38
Chapter 4.1 --- Problem Formulation: Optimal Control Problem (OCP) for Mutlidrug Optimization --- p.38
Chapter 4.2 --- Proposed Memetic Optimization Algorithm --- p.40
Chapter 4.2.1 --- Iterative Dynamic Programming (IDP) . . --- p.40
Chapter 4.2.2 --- Adaptive Elitist-population-based Genetic Algorithm (AEGA) --- p.44
Chapter 4.2.3 --- Memetic Algorithm 一 Iterative Dynamic Programming (MA-IDP) --- p.50
Chapter 4.3 --- Summary --- p.56
Chapter 5 --- MA-IDP: Experiments and Results --- p.57
Chapter 5.1 --- Experiment Settings --- p.57
Chapter 5.2 --- Optimization Results --- p.61
Chapter 5.3 --- Extension to Other Mutlidrug Scheduling Model . --- p.62
Chapter 5.4 --- Summary --- p.65
Chapter 6 --- DNA Sequencing by Hybridization (SBH) --- p.66
Chapter 6.1 --- Problem Formulation: Reconstructing a DNA Sequence from Hybridization Data --- p.70
Chapter 6.2 --- Proposed Memetic Optimization Algorithm --- p.71
Chapter 6.2.1 --- Chromosome Encoding --- p.71
Chapter 6.2.2 --- Fitness Function --- p.73
Chapter 6.2.3 --- Crossover --- p.74
Chapter 6.2.4 --- Hill Climbing Local Search for Sequencing by Hybridization --- p.76
Chapter 6.2.5 --- Elitism and Diversity --- p.79
Chapter 6.2.6 --- Outline of Algorithm: MA-HC-SBH --- p.81
Chapter 6.3 --- Summary --- p.82
Chapter 7 --- DNA Sequencing by Hybridization (SBH): Experiments and Results --- p.83
Chapter 7.1 --- Experiment Settings --- p.83
Chapter 7.2 --- Experiment Results --- p.85
Chapter 7.3 --- Summary --- p.89
Chapter 8 --- Conclusion --- p.90
Chapter 8.1 --- Multiple Drugs Cancer Chemotherapy Schedule Optimization --- p.90
Chapter 8.2 --- Use of the MA-IDP --- p.91
Chapter 8.3 --- DNA Sequencing by Hybridization (SBH) --- p.92
Chapter 8.4 --- Use of the MA-HC-SBH --- p.92
Chapter 8.5 --- Future Work --- p.93
Chapter 8.6 --- Item Learned --- p.93
Chapter 8.7 --- Papers Published --- p.94
Bibliography --- p.95
APA, Harvard, Vancouver, ISO, and other styles
48

Zhao, Huiying. "Protein function prediction by integrating sequence, structure and binding affinity information." Thesis, 2014. http://hdl.handle.net/1805/3913.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)
Proteins are nano-machines that work inside every living organism. Functional disruption of one or several proteins is the cause for many diseases. However, the functions for most proteins are yet to be annotated because inexpensive sequencing techniques dramatically speed up discovery of new protein sequences (265 million and counting) and experimental examinations of every protein in all its possible functional categories are simply impractical. Thus, it is necessary to develop computational function-prediction tools that complement and guide experimental studies. In this study, we developed a series of predictors for highly accurate prediction of proteins with DNA-binding, RNA-binding and carbohydrate-binding capability. These predictors are a template-based technique that combines sequence and structural information with predicted binding affinity. Both sequence and structure-based approaches were developed. Results indicate the importance of binding affinity prediction for improving sensitivity and precision of function prediction. Application of these methods to the human genome and structure genome targets demonstrated its usefulness in annotating proteins of unknown functions and discovering moon-lighting proteins with DNA,RNA, or carbohydrate binding function. In addition, we also investigated disruption of protein functions by naturally occurring genetic variations due to insertions and deletions (INDELS). We found that protein structures are the most critical features in recognising disease-causing non-frame shifting INDELs. The predictors for function predictions are available at http://sparks-lab.org/spot, and the predictor for classification of non-frame shifting INDELs is available at http://sparks-lab.org/ddig.
APA, Harvard, Vancouver, ISO, and other styles
49

Krishnadev, O. "Inferences On The Function Of Proteins And Protein-Protein Interactions Using Large Scale Sequence And Structure Analysis." Thesis, 2005. http://etd.iisc.ernet.in/handle/2005/1503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

"Generalized pattern matching applied to genetic analysis." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075184.

Full text
Abstract:
Approximate pattern matching problem is, given a reference sequence T, a pattern (query) Q, and a maximum allowed error e, to find all the substrings in the reference, such that the edit distance between the substrings and the pattern is smaller than or equal to the maximum allowed error. Though it is a well-studied problem in Computer Science, it gains a resurrection in Bioinformatics in recent years, largely due to the emergence of the next-generation high-throughput sequencing technologies. This thesis contributes in a novel generalized pattern matching framework, and applies it to solve pattern matching problems in general and alternative splicing detection (AS) in particular. AS is to map a large amount of next-generation sequencing short reads data to a reference human genome, which is the first and an important step in analyzing the sequenced data for further Biological analysis. The four parts of my research are as follows.
In the first part of my research work, we propose a novel deterministic pattern matching algorithm which applies Agrep, a well-known bit-parallel matching algorithm, to a truncated suffix array. Due to the linear cost of Agrep, the cost of our approach is linear to the number of characters processed in the truncated suffix array. We analyze the matching cost theoretically, and .obtain empirical costs from experiments. We carry out experiments using both synthetic and real DNA sequence data (queries) and search them in Chromosome-X of a reference human genome. The experimental results show that our approach achieves a speed-up of several magnitudes over standard Agrep algorithm.
In the fourth part, we focus on the seeding strategies for alternative splicing detection. We review the history of seeding-and-extending (SAE), and assess both theoretically and empirically the seeding strategies adopted in existing splicing detection tools, including Bowtie's heuristic and ABMapper's exact seedings, against the novel complementary quad-seeding strategy we proposed and the corresponding novel splice detection tool called CS4splice, which can handle inexact seeding (with errors) and all 3 types of errors including mismatch (substitution), insertion, and deletion. We carry out experiments using short reads (queries) of length 105bp comprised of several data sets consisting of various levels of errors, and align them back to a reference human genome (hg18). On average, CS4splice can align 88. 44% (recall rate) of 427,786 short reads perfectly back to the reference; while the other existing tools achieve much smaller recall rates: SpliceMap 48.72%, MapSplice 58.41%, and ABMapper 51.39%. The accuracies of CS4splice are also the highest or very close to the highest in all the experiments carried out. But due to the complementary quad-seeding that CS4splice use, it takes more computational resources, about twice (or more) of the other alternative splicing detection tools, which we think is practicable and worthy.
In the second part, we define a novel generalized pattern (query) and a framework of generalized pattern matching, for which we propose a heuristic matching algorithm. Simply speaking, a generalized pattern is Q 1G1Q2 ... Qc--1Gc--1 Qc, which consists of several substrings Q i and gaps Gi occurring in-between two substrings. The prototypes of the generalized pattern come from several real Biological problems that can all be modeled as generalized pattern matching problems. Based on a well-known seeding-and-extending heuristic, we propose a dual-seeding strategy, with which we solve the matching problem effectively and efficiently. We also develop a specialized matching tool called Gpattern-match. We carry out experiments using 10,000 generalized patterns and search them in a reference human genome (hg18). Over 98.74% of them can be recovered from the reference. It takes 1--2 seconds on average to recover a pattern, and memory peak goes to a little bit more than 1G.
In the third part, a natural extension of the second part, we model a real biological problem, alternative splicing detection, into a generalized pattern matching problem, and solve it using a proposed bi-directional seeding-and-extending algorithm. Different from all the other tools which depend on third-party tools, our mapping tool, ABMapper, is not only stand-alone but performs unbiased alignments. We carry out experiments using 427,786 real next-generation sequencing short reads data (queries) and align them back to a reference human genome (hg18). ABMapper achieves 98.92% accuracy and 98.17% recall rate, and is much better than the other state-of-the-art tools: SpliceMap achieves 94.28% accuracy and 78.13% recall rate;while TopHat 88.99% accuracy and 76.33% recall rate. When the seed length is set to 12 in ABMapper, the whole searching and alignment process takes about 20 minutes, and memory peak goes to a little bit more than 2G.
Ni, Bing.
Adviser: Kwong-Sak Leung.
Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical referencesTexture mapping (leaves 151-161).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography