Relevant bibliographies by topics / Data sequence processing

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Data sequence processing'

Author: Grafiati

Published: 4 June 2021

Last updated: 9 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Data sequence processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Data sequence processing"

Wendl, M. C., I. Korf, A. T. Chinwalla, and L. W. Hillier. "Automated processing of raw DNA sequence data." IEEE Engineering in Medicine and Biology Magazine 20, no. 4 (2001): 41–48. http://dx.doi.org/10.1109/51.940044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Song, Bosheng, Zimeng Li, Xuan Lin, Jianmin Wang, Tian Wang, and Xiangzheng Fu. "Pretraining model for biological sequence data." Briefings in Functional Genomics 20, no. 3 (May 2021): 181–95. http://dx.doi.org/10.1093/bfgp/elab025.

Full text

Abstract:

Abstract With the development of high-throughput sequencing technology, biological sequence data reflecting life information becomes increasingly accessible. Particularly on the background of the COVID-19 pandemic, biological sequence data play an important role in detecting diseases, analyzing the mechanism and discovering specific drugs. In recent years, pretraining models that have emerged in natural language processing have attracted widespread attention in many research fields not only to decrease training cost but also to improve performance on downstream tasks. Pretraining models are used for embedding biological sequence and extracting feature from large biological sequence corpus to comprehensively understand the biological sequence data. In this survey, we provide a broad review on pretraining models for biological sequence data. Moreover, we first introduce biological sequences and corresponding datasets, including brief description and accessible link. Subsequently, we systematically summarize popular pretraining models for biological sequences based on four categories: CNN, word2vec, LSTM and Transformer. Then, we present some applications with proposed pretraining models on downstream tasks to explain the role of pretraining models. Next, we provide a novel pretraining scheme for protein sequences and a multitask benchmark for protein pretraining models. Finally, we discuss the challenges and future directions in pretraining models for biological sequences.

APA, Harvard, Vancouver, ISO, and other styles

Ma, Ling, Ke Zhu Song, Jun Feng Yang, and Ping Cao. "Hardware Implementation of a Real-Time Data Processing Algorithm in Marine Engineering Data Acquisition." Advanced Materials Research 268-270 (July 2011): 110–15. http://dx.doi.org/10.4028/www.scientific.net/amr.268-270.110.

Full text

Abstract:

According to the architecture characteristics of the mass data acquisition system in marine seismic exploration, this paper designed a real-time data processing algorithm which can convert the collected time-sequence data to channel-sequence data. A hardware implementation of the algorithm based on FPGA+DDR SDRAM is developed to complete the whole conversion process. Here, FPGA is used to achieve time sequence data receiving, analyzing, preliminary processing and the interface to DDR SDRAM. Two DDR SDRAM’s are used in ping-pang mode to store time-sequence data and to cooperate with FPGA in realizing time-to-channel sequence data conversion. Test results showed that, after applying the algorithm to the FCI in high-precision marine seismic data acquisition and recording system, this arithmetic could realize caching collected data without redundancy and converting data from time sequence to channel sequence without dead time, besides, this algorithm also greatly improved the efficiency and reliability of data processing.

APA, Harvard, Vancouver, ISO, and other styles

Baradello, Luca. "An improved processing sequence for uncorrelated Chirp sonar data." Marine Geophysical Research 35, no. 4 (March 22, 2014): 337–44. http://dx.doi.org/10.1007/s11001-014-9220-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lin, Edgar Chia Han. "Research on Sequence Query Processing Techniques over Data Streams." Applied Mechanics and Materials 284-287 (January 2013): 3507–11. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3507.

Full text

Abstract:

Due to the great progress of computer technology and mature development of network, more and more data are generated and distributed through the network, which is called data streams. During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area. Since the current DSMS does not support queries on sequence data, this project will study the issues related to two types of data. First, we will focus on the content filtering on single-attribute streams, such as sensor data. Second, we will focus on multi-attribute streams, such as video films. We will discuss the related issues such as how to build an efficient index for all queries of different streams and the corresponding query processing mechanisms.

APA, Harvard, Vancouver, ISO, and other styles

Munková, Daša, Michal Munk, and Martin Vozár. "Data Pre-processing Evaluation for Text Mining: Transaction/Sequence Model." Procedia Computer Science 18 (2013): 1198–207. http://dx.doi.org/10.1016/j.procs.2013.05.286.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mendizabal-Ruiz, Gerardo, Israel Román-Godínez, Sulema Torres-Ramos, Ricardo A. Salido-Ruiz, Hugo Vélez-Pérez, and J. Alejandro Morales. "Genomic signal processing for DNA sequence clustering." PeerJ 6 (January 24, 2018): e4264. http://dx.doi.org/10.7717/peerj.4264.

Full text

Abstract:

Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

APA, Harvard, Vancouver, ISO, and other styles

Macnar, Joanna M., Natalia A. Szulc, Justyna D. Kryś, Aleksandra E. Badaczewska-Dawid, and Dominik Gront. "BioShell 3.0: Library for Processing Structural Biology Data." Biomolecules 10, no. 3 (March 16, 2020): 461. http://dx.doi.org/10.3390/biom10030461.

Full text

Abstract:

BioShell is an open-source package for processing biological data, particularly focused on structural applications. The package provides parsers, data structures and algorithms for handling and analyzing macromolecular sequences, structures and sequence profiles. The most frequently used routines are accessible by a set of easy-to-use command line utilities for a Linux environment. The full functionality of the package assumes knowledge of C++ or Python to assemble an application using this software library. Since the last publication that announced the version 2.0, the package has been greatly expanded and rewritten in C++ standard 11 (C++11) to improve its modularity and efficiency. A new testing platform has been implemented to continuously test the correctness and integrity of the package. More than two hundred test programs have been published to provide simple examples that can be used as templates. This makes BioShell an easy to use library that greatly speeds up development of bioinformatics applications and web services without compromising computational efficiency.

APA, Harvard, Vancouver, ISO, and other styles

Yuan, Fei, Hoa Nguyen, and Dan Graur. "ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements." Genes 10, no. 3 (February 26, 2019): 181. http://dx.doi.org/10.3390/genes10030181.

Full text

Abstract:

Studying parallel and convergent amino acid replacements in protein evolution is frequently used to assess adaptive evolution at the molecular level. Identifying parallel and convergent replacements involves multiple steps and computational routines, such as multiple sequence alignment, phylogenetic tree inference, ancestral state reconstruction, topology tests, and simulation of sequence evolution. Here, we present ProtParCon, a Python 3 package that provides a common interface for users to process molecular data and identify parallel and convergent amino acid replacements in orthologous protein sequences. By integrating several widely used programs for computational biology, ProtParCon implements general functions for handling multiple sequence alignment, ancestral-state reconstruction, maximum-likelihood phylogenetic tree inference, and sequence simulation. ProtParCon also contains a built-in pipeline that automates all these sequential steps, and enables quick identification of observed and expected parallel and convergent amino acid replacements under different evolutionary assumptions. The most up-to-date version of ProtParCon, including scripts containing user tutorials, the full API reference and documentation are publicly and freely available under an open source MIT License via GitHub. The latest stable release is also available on PyPI (the Python Package Index).

APA, Harvard, Vancouver, ISO, and other styles

Honarvar, Ali Reza, and Ashkan Sami. "Extracting Usage Patterns from Power Usage Data of Homes' Appliances in Smart Home using Big Data Platform." International Journal of Information Technology and Web Engineering 11, no. 2 (April 2016): 39–50. http://dx.doi.org/10.4018/ijitwe.2016040103.

Full text

Abstract:

Advances in sensing techniques and IOT enabled the possibility to gain precise information about devices in smart home and smart city environments. Data analysis for sensors and devices may help us develop friendlier systems for smart city or smart home. Sequence pattern mining extracts interesting sequence pattern from data. Electricity usage dose follow a sequence of events. In this study the authors investigate this issue and extracted valuable sequence pattern from real appliances' power usage dataset using PrefixSpan. The experiments in this research is implemented on Spark as a novel distributed and parallel big data processing platform on two different clusters and interesting findings are obtained. These findings show the importance of extracting sequence pattern from power usage data to various applications such as decreasing CO2 and greenhouse gas emission by decreasing the electricity usage. The findings also show the needs to bring big data platforms to processing such kind of data which is captured in smart home and smart cities.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Data sequence processing"

Hansson, Andreas. "Sequence Processing from A Connectionist View." Thesis, University of Skövde, Department of Computer Science, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-481.

Full text

Abstract:

In this work we explore how close the artificial intelligence community has come to model the human mind regarding representation and processing of sequences. We analyse results produced by cognitive psychologists, who explore real minds, for features exhibited by human short- and long-term memory when representing and processing sequences. We compare these features with theories and models from the AI community divided into two types of theories: intrinsic and extrinsic theories. We conclude that the intrinsic theories have managed to explain most of the features, whereas the extrinsic theories still have a lot to do before exhibiting all features. We also present several suggestions for continued research to the AI community within the area of sequence representation and processing in the human mind.

APA, Harvard, Vancouver, ISO, and other styles

Dameh, Mustafa, and n/a. "Insights into gene interactions using computational methods for literature and sequence resources." University of Otago. Department of Anatomy & Structural Biology, 2008. http://adt.otago.ac.nz./public/adt-NZDU20090109.095349.

Full text

Abstract:

At the beginning of this century many sequencing projects were finalised. As a result, overwhelming amount of literature and sequence data have been available to biologist via online bioinformatics databases. This biological data lead to better understanding of many organisms and have helped identify genes. However, there is still much to learn about the functions and interactions of genes. This thesis is concerned with predicting gene interactions using two main online resources: biomedical literature and sequence data. The biomedical literature is used to explore and refine a text mining method, known as the "co-occurrence method", which is used to predict gene interactions. The sequence data are used in an analysis to predict an upper bound of the number of genes involved in gene interactions. The co-occurrence method of text mining was extensively explored in this thesis. The effects of certain computational parameters on influencing the relevancy of documents in which two genes co-occur were critically examined. The results showed that indeed some computational parameters do have an impact on the outcome of the co-occurrence method, and if taken into consideration, can lead to better identification of documents that describe gene interactions. To explore the co-occurrence method of text mining, a prototype system was developed, and as a result, it contains unique functions that are not present in currently available text mining systems. Sequence data were used to predict the upper bound of the number of genes involved in gene interactions within a tissue. A novel approach was undertaken that used an analysis of SAGE and EST sequence libraries using ecological estimation methods. The approach proves that the species accumulation theory used in ecology can be applied to tag libraries (SAGE or EST) to predict an upper bound to the number of mRNA transcript species in a tissue. The novel computational analysis provided in this study can be used to extend the body of knowledge and insights relating to gene interactions and, hence, provide better understanding of genes and their functions.

APA, Harvard, Vancouver, ISO, and other styles

Hung, Rong-I. "Computational studies of protein sequence and structure." Thesis, University of Oxford, 1999. http://ora.ox.ac.uk/objects/uuid:9905c946-86dd-4bb3-8824-7c50df136913.

Full text

Abstract:

This thesis explores aspects protein function, structure and sequence by computational approaches. A comparative study of definitions of protein secondary structures was performed. Disagreements in assignment resulting from three different algorithms were observed. The causes of inaccuracies in structure assignments were discussed and possibilities of projecting protein secondary structures by different structural descriptors were tested. The investigation of inconsistent assignments of protein secondary structure led to a study of a more specific issue concerning protein structure/function relationships, namely cis/trans isomerisation of a peptide bond. Surveys were carried out at the level of protein molecules to detect the occurrences of the cis peptide bond, and at the level of protein domains to explore the possible biological implications of the occurrences of the structural motif. Research was then focussed on andalpha;-helical integral membrane proteins. A detailed analysis of sequences and putative transmembrane helical structures was conducted on the ABC transporters from different organisms. Interesting relationships between protein sequences, putative a-helical structures and transporter functions were identified. Applications of molecular dynamics simulations to the transmembrane helices of a specific human ABC transporter, cystic flbrosis transmembrane conductance regulator (CFTR), explored some of these relationships at the atomic resolution. Functional and structural implications of individual residues within membrane-spanning helices were revealed by these simulations studies.

APA, Harvard, Vancouver, ISO, and other styles

Li, Yaoman, and 李耀满. "Efficient methods for improving the sensitivity and accuracy of RNA alignments and structure prediction." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/195977.

Full text

Abstract:

RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq align tools, the existing gene information database and how to improve the accuracy of alignment and predict RNA secondary structure. The known gene information database contains a lot of reliable gene information that has been discovered. And we note most DNA align tools are well developed. They can run much faster than existing RNA-Seq align tools and have higher sensitivity and accuracy. Combining with the known gene information database, we present a method to align RNA-Seq data by using DNA align tools. I.e. we use the DNA align tools to do alignment and use the gene information to convert the alignment to genome based. The gene information database, though updated daily, there are still a lot of genes and alternative splicings that hadn't been discovered. If our RNA align tool only relies on the known gene database, then there may be a lot reads that come from unknown gene or alternative splicing cannot be aligned. Thus, we show a combinational method that can cover potential alternative splicing junction sites. Combining with the original gene database, the new align tools can cover most alignments which are reported by other RNA-Seq align tools. Recently a lot of RNA-Seq align tools have been developed. They are more powerful and faster than the old generation tools. However, the RNA read alignment is much more complicated than other sequence alignment. The alignments reported by some RNA-Seq align tools have low accuracy. We present a simple and efficient filter method based on the quality score of the reads. It can filter most low accuracy alignments. At last, we present a RNA secondary prediction method that can predict pseudoknot(a type of RNA secondary structure) with high sensitivity and specificity.
published_or_final_version
Computer Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

Wang, Yi, and 王毅. "Binning and annotation for metagenomic next-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/208040.

Full text

Abstract:

The development of next-generation sequencing technology enables us to obtain a vast number of short reads from metagenomic samples. In metagenomic samples, the reads from different species are mixed together. So, metagenomic binning has been introduced to cluster reads from the same or closely related species and metagenomic annotation is introduced to predict the taxonomic information of each read. Both metagenomic binning and annotation are critical steps in downstream analysis. This thesis discusses the difficulties of these two computational problems and proposes two algorithmic methods, MetaCluster 5.0 and MetaAnnotator, as solutions. There are six major challenges in metagenomic binning: (1) the lack of reference genomes; (2) uneven abundance ratios; (3) short read lengths; (4) a large number of species; (5) the existence of species with extremely-low-abundance; and (6) recovering low-abundance species. To solve these problems, I propose a two-round binning method, MetaCluster 5.0. The improvement achieved by MetaCluster 5.0 is based on three major observations. First, the short q-mer (length-q substring of the sequence with q = 4, 5) frequency distributions of individual sufficiently long fragments sampled from the same genome are more similar than those sampled from different genomes. Second, sufficiently long w-mers (length-w substring of the sequence with w ≈ 30) are usually unique in each individual genome. Third, the k-mer (length-k substring of the sequence with k ≈ 16) frequencies from reads of a species are usually linearly proportional to that of the species’ abundance. The metagenomic annotation methods in the literatures often suffer from five major drawbacks: (1) unable to annotate many reads; (2) less precise annotation for reads and more incorrect annotation for contigs; (3) unable to deal with novel clades with limited references genomes well; (4) performance affected by variable genome sequence similarities between different clades; and (5) high time complexity. In this thesis, a novel tool, MetaAnnotator, is proposed to tackle these problems. There are four major contributions of MetaAnnotator. Firstly, instead of annotating reads/contigs independently, a cluster of reads/contigs are annotated as a whole. Secondly, multiple reference databases are integrated. Thirdly, for each individual clade, quadratic discriminant analysis is applied to capture the similarities between reference sequences in the clade. Fourthly, instead of using alignment tools, MetaAnnotator perform annotation using k-mer exact match which is more efficient. Experiments on both simulated datasets and real datasets show that MetaCluster 5.0 and MetaAnnotator outperform existing tools with higher accuracy as well as less time and space cost.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

Liu, Kai. "Detecting stochastic motifs in network and sequence data for human behavior analysis." HKBU Institutional Repository, 2014. https://repository.hkbu.edu.hk/etd_oa/60.

Full text

Abstract:

With the recent advent of Web 2.0, mobile computing, and pervasive sensing technologies, human activities can readily be logged, leaving digital traces of di.erent forms. For instance, human communication activities recorded in online social networks allow user interactions to be represented as “network” data. Also, human daily activities can be tracked in a smart house, where the log of sensor triggering events can be represented as “sequence” data. This thesis research aims to develop computational data mining algorithms using the generative modeling approach to extract salient patterns (motifs) embedded in such network and sequence data, and to apply them for human behavior analysis. Motifs are de.ned as the recurrent over-represented patterns embedded in the data, and have been known to be e.ective for characterizing complex networks. Many motif extraction methods found in the literature assume that a motif is either present or absent. In real practice, such salient patterns can appear partially due to their stochastic nature and/or the presence of noise. Thus, the probabilistic approach is adopted in this thesis to model motifs. For network data, we use a probability matrix to represent a network motif and propose a mixture model to extract network motifs. A component-wise EM algorithm is adopted where the optimal number of stochastic motifs is automatically determined with the help of a minimum message length criterion. Considering also the edge occurrence ordering within a motif, we model a motif as a mixture of .rst-order Markov chains for the extraction. Using a probabilistic approach similar to the one for network motif, an optimal set of stochastic temporal network motifs are extracted. We carried out rigorous experiments to evaluate the performance of the proposed motif extraction algorithms using both synthetic data sets and real-world social network data sets and mobile phone usage data sets, and obtained promising results. Also, we found that some of the results can be interpreted using the social balance and social status theories which are well-known in social network analysis. To evaluate the e.ectiveness of adopting stochastic temporal network motifs for not only characterizing human behaviors, we incorporate stochastic temporal network motifs as local structural features into a factor graph model for followee recommendation prediction (essentially a link prediction problem) in online social networks. The proposed motif-based factor graph model is found to outperform signi.cantly the existing state-of-the-art methods for the prediction task. For extract motifs from sequence data, the probabilistic framework proposed for the stochastic temporal network motif extraction is also applicable. One possible way is to make use of the edit distance in the probabilistic framework so that the subsequences with minor ordering variations can .rst be grouped to form the initial set of motif candidates. A mixture model can then be used to determine the optimal set of temporal motifs. We applied this approach to extract sequence motifs from a smart home data set which contains sensor triggering events corresponding to some activities performed by residents in the smart home. The unique behavior extracted for each resident based on the detected motifs is also discussed. Keywords: Stochastic network motifs, .nite mixture models, expectation maximization algorithms, social networks, stochastic temporal network motifs, mixture of Markov chains, human behavior analysis, followee recommendation, signed social networks, activity of daily living, smart environments

APA, Harvard, Vancouver, ISO, and other styles

Peng, Yu, and 彭煜. "Iterative de Bruijn graph assemblers for second-generation sequencing reads." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B50534051.

Full text

Abstract:

The recent advance of second-generation sequencing technologies has made it possible to generate a vast amount of short read sequences from a DNA (cDNA) sample. Current short read assemblers make use of the de Bruijn graph, in which each vertex is a k-mer and each edge connecting vertex u and vertex v represents u and v appearing in a read consecutively, to produce contigs. There are three major problems for de Bruijn graph assemblers: (1) branch problem, due to errors and repeats; (2) gap problem, due to low or uneven sequencing depth; and (3) error problem, due to sequencing errors. A proper choice of k value is a crucial tradeoff in de Bruijn graph assemblers: a low k value leads to fewer gaps but more branches; a high k value leads to fewer branches but more gaps. In this thesis, I first analyze the fundamental genome assembly problem and then propose an iterative de Bruijn graph assembler (IDBA), which iterates from low to high k values, to construct a de Bruijn graph with fewer branches and fewer gaps than any other de Bruijn graph assembler using a fixed k value. Then, the second-generation sequencing data from metagenomic, single-cell and transcriptome samples is investigated. IDBA is then tailored with special treatments to handle the specific issues for each kind of data. For metagenomic sequencing data, a graph partition algorithm is proposed to separate de Bruijn graph into dense components, which represent similar regions in subspecies from the same species, and multiple sequence alignment is used to produce consensus of each component. For sequencing data with highly uneven depth such as single-cell and metagenomic sequencing data, a method called local assembly is designed to reconstruct missing k-mers in low-depth regions. Then, based on the observation that short and relatively low-depth contigs are more likely erroneous, progressive depth on contigs is used to remove errors in both low-depth and high-depth regions iteratively. For transcriptome sequencing data, a variant of the progressive depth method is adopted to decompose the de Bruijn graph into components corresponding to transcripts from the same gene, and then the transcripts are found in each component by considering the reads and paired-end reads support. Plenty of experiments on both simulated and real data show that IDBA assemblers outperform the existing assemblers by constructing longer contigs with higher completeness and similar or better accuracy. The running time of IDBA assemblers is comparable to existing algorithms, while the memory cost is usually less than the others.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

Kutlu, Mucahid. "Parallel Processing of Large Scale Genomic Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436355132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bao, Suying, and 鲍素莹. "Deciphering the mechanisms of genetic disorders by high throughput genomic data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hdl.handle.net/10722/196471.

Full text

Abstract:

A new generation of non-Sanger-based sequencing technologies, so called “next-generation” sequencing (NGS), has been changing the landscape of genetics at unprecedented speed. In particular, our capacity in deciphering the genotypes underlying phenotypes, such as diseases, has never been greater. However, before fully applying NGS in medical genetics, researchers have to bridge the widening gap between the generation of massively parallel sequencing output and the capacity to analyze the resulting data. In addition, even a list of candidate genes with potential causal variants can be obtained from an effective NGS analysis, to pinpoint disease genes from the long list remains a challenge. The issue becomes especially difficult when the molecular basis of the disease is not fully elucidated. New NGS users are always bewildered by a plethora of options in mapping, assembly, variant calling and filtering programs and may have no idea about how to compare these tools and choose the “right” ones. To get an overview of various bioinformatics attempts in mapping and assembly, a series of performance evaluation work was conducted by using both real and simulated NGS short reads. For NGS variant detection, the performances of two most widely used toolkits were assessed, namely, SAM tools and GATK. Based on the results of systematic evaluation, a NGS data processing and analysis pipeline was constructed. And this pipeline was proved a success with the identification of a mutation (a frameshift deletion on Hnrnpa1, p.Leu181Valfs*6) related to congenital heart defect (CHD) in procollagen type IIA deficient mice. In order to prioritize risk genes for diseases, especially those with limited prior knowledge, a network-based gene prioritization model was constructed. It consists of two parts: network analysis on known disease genes (seed-based network strategy)and network analysis on differential expression (DE-based network strategy). Case studies of various complex diseases/traits demonstrated that the DE-based network strategy can greatly outperform traditional gene expression analysis in predicting disease-causing genes. A series of simulation work indicated that the DE-based strategy is especially meaningful to diseases with limited prior knowledge, and the model’s performance can be further advanced by integrating with seed-based network strategy. Moreover, a successful application of the network-based gene prioritization model in influenza host genetic study further demonstrated the capacity of the model in identifying promising candidates and mining of new risk genes and pathways not biased toward our current knowledge. In conclusion, an efficient NGS analysis framework from the steps of quality control and variant detection, to those of result analysis and gene prioritization has been constructed for medical genetics. The novelty in this framework is an encouraging attempt to prioritize risk genes for not well-characterized diseases by network analysis on known disease genes and differential expression data. The successful applications in detecting genetic factors associated with CHD and influenza host resistance demonstrated the efficacy of this framework. And this may further stimulate more applications of high throughput genomic data in dissecting the genetic components of human disorders in the near future.
published_or_final_version
Biochemistry
Doctoral
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

Chan, Pui-yee, and 陳沛儀. "A study on predicting gene relationship from a computational perspective." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30461352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Data sequence processing"

1960-, Morishita Shinichi, ed. Large-scale genome sequence processing. London: Imperial College Press, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Ophir, Frieder, and Martino Robert L, eds. High performance computational methods for biological sequence analysis. Boston: Kluwer Academic Publishers, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Sequence analysis in molecular biology: Treasure trove or trivial pursuit. San Diego: Academic Press, 1987.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Kumar, Pradeep. Pattern discovery using sequence data mining: Applications and studies. Hershey, PA: Information Science Reference, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Darryl, León, ed. Sequence analysis in a nutshell: A guide to tools and databases. Sebastopol, CA: O'Reilly, 2003.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Interface between Computation Science and Nucleic Acid Sequencing Workshop ((1988 Santa Fe, N.M.). Computers and DNA: The proceedings of the Interface between Computation Science and Nucleic Acid Sequencing Workshop, held December 12 to 16, 1988 in Santa Fe, New Mexico. Edited by Bell George I and Marr Thomas G. Redwood City, Calif: Addison-Wesley Pub. Co, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Practical bioinformatics. New York: Garland Science, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Interface between Computation Science and Nucleic Acid Sequencing Workshop (1988 Santa Fe, N.M.). Computers and DNA: The proceedings of the Interface between Computation Science and Nucleic Acid Sequencing Workshop, held December 12 to 16, 1988 in Santa Fe, New Mexico. Redwood City, Calif: Addison-Wesley Pub. Co., 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Grigorev, Anatoliy. Methods and algorithms of data processing. ru: INFRA-M Academic Publishing LLC., 2017. http://dx.doi.org/10.12737/22119.

Full text

Abstract:

In this manual some methods and algorithms of data processing, the sequence of the solution of problems of processing and the analysis of data for creation of behavior model of an object taking into account all a component of his mathematical model are considered. Types of technological methods of use of software and hardware for the solution of tasks in this area are described. Algorithms of distributions, regressions of temporary ranks, their transformation for the purpose of receiving mathematical models and the forecast of behavior of information and economic systems (objects) are considered. Conforms to requirements of the Federal state educational standard of the higher education of the last generation. For students of economic specialties, experts, graduate students.

APA, Harvard, Vancouver, ISO, and other styles

Gromiha, M. Michael. Protein bioinformatics: From sequence to function. Amsterdam: Academic Press/Elsevier, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Data sequence processing"

Mersereau, R. M., M. J. T. Smith, C. S. Kim, F. Kossentini, and K. K. Truong. "Vector Quantization for Video Data Compression." In Motion Analysis and Image Sequence Processing, 257–83. Boston, MA: Springer US, 1993. http://dx.doi.org/10.1007/978-1-4615-3236-1_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Loechner, Vincent, Benoît Meister, and Philippe Clauss. "Data Sequence Locality: A Generalization of Temporal Locality." In Euro-Par 2001 Parallel Processing, 262–72. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-44681-8_38.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gallinari, P. "Predictive models for sequence modelling, application to speech and character recognition." In Adaptive Processing of Sequences and Data Structures, 418–34. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Peace, R. J., and James R. Green. "Computational Sequence- and NGS-Based MicroRNA Prediction." In Signal Processing and Machine Learning for Biomedical Big Data, 381–410. Boca Raton : Taylor & Francis, 2018.: CRC Press, 2018. http://dx.doi.org/10.1201/9781351061223-19.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ahmed, Zeeshan, Justin Pranulis, Saman Zeeshan, and Chew Yee Ngan. "Bioinformatics Tools for PacBio Sequenced Amplicon Data Pre-processing and Target Sequence Extraction." In Lecture Notes in Networks and Systems, 326–40. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-12385-7_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tündik, Máté Ákos, Balázs Tarján, and György Szaszák. "Low Latency MaxEnt- and RNN-Based Word Sequence Models for Punctuation Restoration of Closed Caption Data." In Statistical Language and Speech Processing, 155–66. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68456-7_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hassina, Gheribi, and Boukebbab Salim. "Contribution to the Evaluation of Uncertainties of Measurement to the Data Processing Sequence of a CMM." In Design and Modeling of Mechanical Systems—III, 291–301. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-66697-6_29.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sperduti, Alessandro. "Neural networks for processing data structures." In Adaptive Processing of Sequences and Data Structures, 121–44. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0053997.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cruz, Cristina, and Jonathan Houseley. "Protocols for Northern Analysis of Exosome Substrates and Other Noncoding RNAs." In Methods in Molecular Biology, 83–103. New York, NY: Springer New York, 2019. http://dx.doi.org/10.1007/978-1-4939-9822-7_5.

Full text

Abstract:

AbstractOver the past decade a plethora of noncoding RNAs (ncRNAs) have been identified, initiating an explosion in RNA research. Although RNA sequencing methods provide unsurpassed insights into ncRNA distribution and expression, detailed information on structure and processing are harder to extract from sequence data. In contrast, northern blotting methods provide uniquely detailed insights into complex RNA populations but are rarely employed outside specialist RNA research groups. Such techniques are generally considered difficult for nonspecialists, which is unfortunate as substantial technical advances in the past few decades have solved the major challenges. Here we present simple, reproducible and highly robust protocols for separating glyoxylated RNA on agarose gels and heat denatured RNA on polyacrylamide–urea gels using standard laboratory electrophoresis equipment. We also provide reliable transfer and hybridization protocols that do not require optimization for most applications. Together, these should allow any molecular biology lab to elucidate the structure and processing of ncRNAs of interest.

APA, Harvard, Vancouver, ISO, and other styles

Tsoi, Ah Chung. "Recurrent neural network architectures: An overview." In Adaptive Processing of Sequences and Data Structures, 1–26. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0053993.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Data sequence processing"

Roy, Abhishek, Yanlei Diao, Uday Evani, Avinash Abhyankar, Clinton Howarth, Rémi Le Priol, and Toby Bloom. "Massively Parallel Processing of Whole Genome Sequence Data." In SIGMOD/PODS'17: International Conference on Management of Data. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3035918.3064048.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Boonserm, Prasitchai, Bingqiang Wang, Simon See, and Tiranee Achalakul. "Improving Data Processing Time with Access Sequence Prediction." In 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2012. http://dx.doi.org/10.1109/icpads.2012.125.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Latorre, Javier, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman, Srikanth Ronanki, and Viacheslav Klimkov. "Effect of Data Reduction on Sequence-to-sequence Neural TTS." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8682168.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Min, Zhenjiang Miao, and Cong Ma. "Sequence-to-Sequence Labanotation Generation Based on Motion Capture Data." In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9054302.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Junqiang Liu and Xiaoling Guan. "Complex event processing for sequence data and domain knowledge." In 2010 International Conference on Mechanic Automation and Control Engineering (MACE). IEEE, 2010. http://dx.doi.org/10.1109/mace.2010.5536086.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Chih-Li, Qian Zhong, Szu-Ying Wang, and Vwani Roychowdhury. "Data-Driven Chord-Sequence Representations of Songs and Applications." In Signal and Image Processing. Calgary,AB,Canada: ACTAPRESS, 2012. http://dx.doi.org/10.2316/p.2012.759-109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Full text

APA, Harvard, Vancouver, ISO, and other styles

Guo, Demi, Yoon Kim, and Alexander Rush. "Sequence-Level Mixed Sample Data Augmentation." In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.emnlp-main.447.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Liu, Mo, Ming Li, Denis Golovnya, Elke A. Rundensteiner, and Kajal Claypool. "Sequence Pattern Query Processing over Out-of-Order Event Streams." In 2009 IEEE 25th International Conference on Data Engineering (ICDE). IEEE, 2009. http://dx.doi.org/10.1109/icde.2009.95.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tanaka, Yuichi, Madoka Hasegawa, and Shigeo Kato. "Generalized selective data pruning for video sequence." In 2011 18th IEEE International Conference on Image Processing (ICIP 2011). IEEE, 2011. http://dx.doi.org/10.1109/icip.2011.6116027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Data sequence processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Data sequence processing"

Dissertations / Theses on the topic "Data sequence processing"

Books on the topic "Data sequence processing"

Book chapters on the topic "Data sequence processing"

Conference papers on the topic "Data sequence processing"