To see the other types of publications on this topic, follow the link: Data sequence processing.

Journal articles on the topic 'Data sequence processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data sequence processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Wendl, M. C., I. Korf, A. T. Chinwalla, and L. W. Hillier. "Automated processing of raw DNA sequence data." IEEE Engineering in Medicine and Biology Magazine 20, no. 4 (2001): 41–48. http://dx.doi.org/10.1109/51.940044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Song, Bosheng, Zimeng Li, Xuan Lin, Jianmin Wang, Tian Wang, and Xiangzheng Fu. "Pretraining model for biological sequence data." Briefings in Functional Genomics 20, no. 3 (May 2021): 181–95. http://dx.doi.org/10.1093/bfgp/elab025.

Full text
Abstract:
Abstract With the development of high-throughput sequencing technology, biological sequence data reflecting life information becomes increasingly accessible. Particularly on the background of the COVID-19 pandemic, biological sequence data play an important role in detecting diseases, analyzing the mechanism and discovering specific drugs. In recent years, pretraining models that have emerged in natural language processing have attracted widespread attention in many research fields not only to decrease training cost but also to improve performance on downstream tasks. Pretraining models are used for embedding biological sequence and extracting feature from large biological sequence corpus to comprehensively understand the biological sequence data. In this survey, we provide a broad review on pretraining models for biological sequence data. Moreover, we first introduce biological sequences and corresponding datasets, including brief description and accessible link. Subsequently, we systematically summarize popular pretraining models for biological sequences based on four categories: CNN, word2vec, LSTM and Transformer. Then, we present some applications with proposed pretraining models on downstream tasks to explain the role of pretraining models. Next, we provide a novel pretraining scheme for protein sequences and a multitask benchmark for protein pretraining models. Finally, we discuss the challenges and future directions in pretraining models for biological sequences.
APA, Harvard, Vancouver, ISO, and other styles
3

Ma, Ling, Ke Zhu Song, Jun Feng Yang, and Ping Cao. "Hardware Implementation of a Real-Time Data Processing Algorithm in Marine Engineering Data Acquisition." Advanced Materials Research 268-270 (July 2011): 110–15. http://dx.doi.org/10.4028/www.scientific.net/amr.268-270.110.

Full text
Abstract:
According to the architecture characteristics of the mass data acquisition system in marine seismic exploration, this paper designed a real-time data processing algorithm which can convert the collected time-sequence data to channel-sequence data. A hardware implementation of the algorithm based on FPGA+DDR SDRAM is developed to complete the whole conversion process. Here, FPGA is used to achieve time sequence data receiving, analyzing, preliminary processing and the interface to DDR SDRAM. Two DDR SDRAM’s are used in ping-pang mode to store time-sequence data and to cooperate with FPGA in realizing time-to-channel sequence data conversion. Test results showed that, after applying the algorithm to the FCI in high-precision marine seismic data acquisition and recording system, this arithmetic could realize caching collected data without redundancy and converting data from time sequence to channel sequence without dead time, besides, this algorithm also greatly improved the efficiency and reliability of data processing.
APA, Harvard, Vancouver, ISO, and other styles
4

Baradello, Luca. "An improved processing sequence for uncorrelated Chirp sonar data." Marine Geophysical Research 35, no. 4 (March 22, 2014): 337–44. http://dx.doi.org/10.1007/s11001-014-9220-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lin, Edgar Chia Han. "Research on Sequence Query Processing Techniques over Data Streams." Applied Mechanics and Materials 284-287 (January 2013): 3507–11. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3507.

Full text
Abstract:
Due to the great progress of computer technology and mature development of network, more and more data are generated and distributed through the network, which is called data streams. During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area. Since the current DSMS does not support queries on sequence data, this project will study the issues related to two types of data. First, we will focus on the content filtering on single-attribute streams, such as sensor data. Second, we will focus on multi-attribute streams, such as video films. We will discuss the related issues such as how to build an efficient index for all queries of different streams and the corresponding query processing mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
6

Munková, Daša, Michal Munk, and Martin Vozár. "Data Pre-processing Evaluation for Text Mining: Transaction/Sequence Model." Procedia Computer Science 18 (2013): 1198–207. http://dx.doi.org/10.1016/j.procs.2013.05.286.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Mendizabal-Ruiz, Gerardo, Israel Román-Godínez, Sulema Torres-Ramos, Ricardo A. Salido-Ruiz, Hugo Vélez-Pérez, and J. Alejandro Morales. "Genomic signal processing for DNA sequence clustering." PeerJ 6 (January 24, 2018): e4264. http://dx.doi.org/10.7717/peerj.4264.

Full text
Abstract:
Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.
APA, Harvard, Vancouver, ISO, and other styles
8

Macnar, Joanna M., Natalia A. Szulc, Justyna D. Kryś, Aleksandra E. Badaczewska-Dawid, and Dominik Gront. "BioShell 3.0: Library for Processing Structural Biology Data." Biomolecules 10, no. 3 (March 16, 2020): 461. http://dx.doi.org/10.3390/biom10030461.

Full text
Abstract:
BioShell is an open-source package for processing biological data, particularly focused on structural applications. The package provides parsers, data structures and algorithms for handling and analyzing macromolecular sequences, structures and sequence profiles. The most frequently used routines are accessible by a set of easy-to-use command line utilities for a Linux environment. The full functionality of the package assumes knowledge of C++ or Python to assemble an application using this software library. Since the last publication that announced the version 2.0, the package has been greatly expanded and rewritten in C++ standard 11 (C++11) to improve its modularity and efficiency. A new testing platform has been implemented to continuously test the correctness and integrity of the package. More than two hundred test programs have been published to provide simple examples that can be used as templates. This makes BioShell an easy to use library that greatly speeds up development of bioinformatics applications and web services without compromising computational efficiency.
APA, Harvard, Vancouver, ISO, and other styles
9

Yuan, Fei, Hoa Nguyen, and Dan Graur. "ProtParCon: A Framework for Processing Molecular Data and Identifying Parallel and Convergent Amino Acid Replacements." Genes 10, no. 3 (February 26, 2019): 181. http://dx.doi.org/10.3390/genes10030181.

Full text
Abstract:
Studying parallel and convergent amino acid replacements in protein evolution is frequently used to assess adaptive evolution at the molecular level. Identifying parallel and convergent replacements involves multiple steps and computational routines, such as multiple sequence alignment, phylogenetic tree inference, ancestral state reconstruction, topology tests, and simulation of sequence evolution. Here, we present ProtParCon, a Python 3 package that provides a common interface for users to process molecular data and identify parallel and convergent amino acid replacements in orthologous protein sequences. By integrating several widely used programs for computational biology, ProtParCon implements general functions for handling multiple sequence alignment, ancestral-state reconstruction, maximum-likelihood phylogenetic tree inference, and sequence simulation. ProtParCon also contains a built-in pipeline that automates all these sequential steps, and enables quick identification of observed and expected parallel and convergent amino acid replacements under different evolutionary assumptions. The most up-to-date version of ProtParCon, including scripts containing user tutorials, the full API reference and documentation are publicly and freely available under an open source MIT License via GitHub. The latest stable release is also available on PyPI (the Python Package Index).
APA, Harvard, Vancouver, ISO, and other styles
10

Honarvar, Ali Reza, and Ashkan Sami. "Extracting Usage Patterns from Power Usage Data of Homes' Appliances in Smart Home using Big Data Platform." International Journal of Information Technology and Web Engineering 11, no. 2 (April 2016): 39–50. http://dx.doi.org/10.4018/ijitwe.2016040103.

Full text
Abstract:
Advances in sensing techniques and IOT enabled the possibility to gain precise information about devices in smart home and smart city environments. Data analysis for sensors and devices may help us develop friendlier systems for smart city or smart home. Sequence pattern mining extracts interesting sequence pattern from data. Electricity usage dose follow a sequence of events. In this study the authors investigate this issue and extracted valuable sequence pattern from real appliances' power usage dataset using PrefixSpan. The experiments in this research is implemented on Spark as a novel distributed and parallel big data processing platform on two different clusters and interesting findings are obtained. These findings show the importance of extracting sequence pattern from power usage data to various applications such as decreasing CO2 and greenhouse gas emission by decreasing the electricity usage. The findings also show the needs to bring big data platforms to processing such kind of data which is captured in smart home and smart cities.
APA, Harvard, Vancouver, ISO, and other styles
11

Molinari, Marco, and Laura Petrosini. "Is sequence-in/sequence-out a cerebellar mode of operation in cognition too?" Behavioral and Brain Sciences 20, no. 2 (June 1997): 259–60. http://dx.doi.org/10.1017/s0140525x97391432.

Full text
Abstract:
This commentary reinterprets our recent data on the cerebellar contribution to different cognitive functions in light of Braitenberg and coworkers's hypothesis about the sequence-in/sequence-out cerebellar mode of operation in the motor domain. Cerebellar involvement in spatial data processing, procedural learning, verbal fluency, application of grammatical rules, and writing is dependent on sequence processing.
APA, Harvard, Vancouver, ISO, and other styles
12

WANG, Ting-Zhang, Gao SHAN, Jian-Hong XU, and Qing-Zhong XUE. "Genome-scale sequence data processing and epigenetic analysis of DNA methylation." Hereditas (Beijing) 35, no. 6 (September 29, 2013): 685–94. http://dx.doi.org/10.3724/sp.j.1005.2013.00685.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Lin, Edgar Chia Han. "Research on Multi-Attribute Sequence Query Processing Techniques over Data Streams." Applied Mechanics and Materials 513-517 (February 2014): 575–78. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.575.

Full text
Abstract:
Due to the great progress of computer technology and mature development of network, more and more data are generated and distributed through the network, which is called data streams. During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area. Since the current DSMS does not support queries on sequence data, this paper, we will focus on multi-attribute streams, such as video films. We will discuss the related issues such as how to build an efficient index for all queries of different streams and the corresponding query processing mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
14

D'Heygère, François, Pierre Mariot, and Jean-Baptiste Renard. "QUATRAIN — A design support tool and a data processing sequence supervisor." Future Generation Computer Systems 9, no. 4 (December 1993): 321–28. http://dx.doi.org/10.1016/0167-739x(93)90034-m.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Tapinos, Avraam, Bede Constantinides, My V. T. Phan, Samaneh Kouchaki, Matthew Cotten, and David L. Robertson. "The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences." Viruses 11, no. 5 (April 26, 2019): 394. http://dx.doi.org/10.3390/v11050394.

Full text
Abstract:
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
APA, Harvard, Vancouver, ISO, and other styles
16

Jun Fang and Hongbin Li. "Distributed Consensus With Quantized Data via Sequence Averaging." IEEE Transactions on Signal Processing 58, no. 2 (February 2010): 944–48. http://dx.doi.org/10.1109/tsp.2009.2032951.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Zhang, Shuang, and Shi Xiong Zhang. "Improved Top-k Query Processing on Uncertain Data." Applied Mechanics and Materials 380-384 (August 2013): 2837–40. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.2837.

Full text
Abstract:
Bottom-up algorithm, which is one of the two probabilistic Top-k query algorithms, was improved. The core of the bottomup algorithm is the iteration on the three courses of bounding, pruning,and refining towards the objects and instances. The main contribution is to change the iteration on instances of objects one by one into iterating all the instances of objects from the superior to the inferior;and to transform the condition and sequence of pruning in order to make the pruning more effective. Theoretical analysis and experimental results show that the algorithm efficiency could be obviously increased by about 20%.
APA, Harvard, Vancouver, ISO, and other styles
18

El-Metwally, Sara, Taher Hamza, Magdi Zakaria, and Mohamed Helmy. "Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges." PLoS Computational Biology 9, no. 12 (December 12, 2013): e1003345. http://dx.doi.org/10.1371/journal.pcbi.1003345.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Diot, Christophe, and Francois Gagnon. "Impact of out-of-sequence processing on the performance of data transmission." Computer Networks 31, no. 5 (March 1999): 475–92. http://dx.doi.org/10.1016/s0169-7552(98)00287-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Armstrong, J. "Symbol synchronization using baud-rate sampling and data-sequence-dependent signal processing." IEEE Transactions on Communications 39, no. 1 (1991): 127–32. http://dx.doi.org/10.1109/26.68283.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Stewart, Robert D., and Mick Watson. "poRe GUIs for parallel and real-time processing of MinION sequence data." Bioinformatics 33, no. 14 (March 9, 2017): 2207–8. http://dx.doi.org/10.1093/bioinformatics/btx136.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Tajer, Ali, Venugopal V. Veeravalli, and H. Vincent Poor. "Outlying Sequence Detection in Large Data Sets: A data-driven approach." IEEE Signal Processing Magazine 31, no. 5 (September 2014): 44–56. http://dx.doi.org/10.1109/msp.2014.2329428.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Barrett, Maria, and Nora Hollenstein. "Sequence labelling and sequence classification with gaze: Novel uses of eye‐tracking data for Natural Language Processing." Language and Linguistics Compass 14, no. 11 (September 22, 2020): 1–16. http://dx.doi.org/10.1111/lnc3.12396.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Haines, Seth S., Antoine Guitton, and Biondo Biondi. "Seismoelectric data processing for surface surveys of shallow targets." GEOPHYSICS 72, no. 2 (March 2007): G1—G8. http://dx.doi.org/10.1190/1.2424542.

Full text
Abstract:
The utility of the seismoelectric method relies on the development of methods to extract the signal of interest from background and source-generated coherent noise that may be several orders-of-magnitude stronger. We compare data processing approaches to develop a sequence of preprocessing and signal/noise separation and to quantify the noise level from which we can extract signal events. Our preferred sequence begins with the removal of power line harmonic noise and the use of frequency filters to minimize random and source-generated noise. Mapping to the linear Radon domain with an inverse process incorporating a sparseness constraint provides good separation of signal from noise, though it is ineffective on noise that shows the same dip as the signal. Similarly, the seismoelectric signal and noise do not separate cleanly in the Fourier domain, so [Formula: see text]-[Formula: see text] filtering can not remove all of the source-generated noise and it also disrupts signal amplitude patterns. We find that prediction-error filters provide the most effective method to separate signal and noise, while also preserving amplitude information, assuming that adequate pattern models can be determined for the signal and noise. These Radon-domain and prediction-error-filter methods successfully separate signal from [Formula: see text] stronger noise in our test data.
APA, Harvard, Vancouver, ISO, and other styles
25

Duan, Yun Peng, Chun Xi Zhao, and Ying Shi. "Application of Data Pre-Processing Method in Web Mining." Applied Mechanics and Materials 687-691 (November 2014): 1592–95. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.1592.

Full text
Abstract:
With the widely application of the WWW and the emergence of Web technology, make the research of data mining has entered a new stage. Web log mining is based on the idea of data mining to analyze the server log processing. Paper aimed at the early stage of the data mining is put forward based on log data preprocessing methods, the purpose is to divide server logs into multiple unique user access sequence at a time, and to give a good algorithm.
APA, Harvard, Vancouver, ISO, and other styles
26

Ghoneimy, Samy, and Samir Abou El-Seoud. "A MapReduce Framework for DNA Sequencing Data Processing." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 4, no. 4 (December 30, 2016): 11. http://dx.doi.org/10.3991/ijes.v4i4.6537.

Full text
Abstract:
<p class="Els-1storder-head">Genomics and Next Generation Sequencers (NGS) like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC) and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format) file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA), Sequence Alignment/Map (SAM) ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie) that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp) algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable accuracy to alternative ‎methods for sequencing data processing.‎‎</p><p class="Abstract"><em></em><em><br /></em></p>
APA, Harvard, Vancouver, ISO, and other styles
27

Wingett, Steven W., Philip Ewels, Mayra Furlan-Magaril, Takashi Nagano, Stefan Schoenfelder, Peter Fraser, and Simon Andrews. "HiCUP: pipeline for mapping and processing Hi-C data." F1000Research 4 (November 20, 2015): 1310. http://dx.doi.org/10.12688/f1000research.7334.1.

Full text
Abstract:
HiCUP is a pipeline for processing sequence data generated by Hi-C and Capture Hi-C (CHi-C) experiments, which are techniques used to investigate three-dimensional genomic organisation. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also produces an easy-to-interpret yet detailed quality control (QC) report that assists in refining experimental protocols for future studies. The software is freely available and has already been used for processing Hi-C and CHi-C data in several recently published peer-reviewed studies.
APA, Harvard, Vancouver, ISO, and other styles
28

Cabarcas, Carlos, and Roger Slatt. "Sequence stratigraphic principles applied to the analysis of borehole microseismic data." Interpretation 2, no. 3 (August 1, 2014): SG15—SG23. http://dx.doi.org/10.1190/int-2013-0151.1.

Full text
Abstract:
Based on a sequence stratigraphic framework developed using gamma ray stacking patterns, we have identified brittle-ductile couplets, which allow us to better interpret the microseismic response recorded during a single-stage hydraulic fracture stimulation treatment monitored from three strategically located observation wells. We have analyzed and compared hydraulic fracturing results inferred by individual processing of microseismic data acquired from horizontal and vertical sensor arrays, as well as the results from simultaneously processing the signals recorded by all three sensors. Ultimately, we have decided in favor of the triple array simultaneous solution as the most useful data set to interpret the stimulation treatment due to the location of the microseismic events coupled with the theoretical expectation from our sequence stratigraphic framework. The final data set has not only allowed us to better interpret the hydraulic fracturing results, but also helped us improve recommendations in support of the field development campaign.
APA, Harvard, Vancouver, ISO, and other styles
29

Gao, Miao, and Guo-You Shi. "Ship-Collision Avoidance Decision-Making Learning of Unmanned Surface Vehicles with Automatic Identification System Data Based on Encoder—Decoder Automatic-Response Neural Networks." Journal of Marine Science and Engineering 8, no. 10 (September 27, 2020): 754. http://dx.doi.org/10.3390/jmse8100754.

Full text
Abstract:
Intelligent unmanned surface vehicle (USV) collision avoidance is a complex inference problem based on current navigation status. This requires simultaneous processing of the input sequences and generation of the response sequences. The automatic identification system (AIS) encounter data mainly include the time-series data of two AIS sets, which exhibit a one-to-one mapping relation. Herein, an encoder–decoder automatic-response neural network is designed and implemented based on the sequence-to-sequence (Seq2Seq) structure to simultaneously process the two AIS encounter trajectory sequences. Furthermore, this model is combined with the bidirectional long short-term memory recurrent neural networks (Bi-LSTM RNN) to obtain a network framework for processing the time-series data to obtain ship-collision avoidance decisions based on big data. The encoder–decoder neural networks were trained based on the AIS data obtained in 2018 from Zhoushan Port to achieve ship collision avoidance decision-making learning. The results indicated that the encoder–decoder neural networks can be used to effectively formulate the sequence of the collision avoidance decision of the USV. Thus, this study significantly contributes to the increased efficiency and safety of maritime transportation. The proposed method can potentially be applied to the USV technology and intelligent collision-avoidance systems.
APA, Harvard, Vancouver, ISO, and other styles
30

NEBEL, MARKUS E., SEBASTIAN WILD, MICHAEL HOLZHAUSER, LARS HÜTTENBERGER, RAPHAEL REITZIG, MATTHIAS SPERBER, and THORSTEN STOECK. "JAGUC — A SOFTWARE PACKAGE FOR ENVIRONMENTAL DIVERSITY ANALYSES." Journal of Bioinformatics and Computational Biology 09, no. 06 (December 2011): 749–73. http://dx.doi.org/10.1142/s0219720011005781.

Full text
Abstract:
Background: The study of microbial diversity and community structures heavily relies on the analyses of sequence data, predominantly taxonomic marker genes like the small subunit of the ribosomal RNA (SSU rRNA) amplified from environmental samples. Until recently, the "gold standard" for this strategy was the cloning and Sanger sequencing of amplified target genes, usually restricted to a few hundred sequences per sample due to relatively high costs and labor intensity. The recent introduction of massive parallel tag sequencing strategies like pyrosequencing (454 sequencing) has opened a new window into microbial biodiversity research. Due to its swift nature and relatively low expense, this strategy produces millions of environmental SSU rDNA sequences granting the opportunity to gain deep insights into the true diversity and complexity of microbial communities. The bottleneck, however, is the computational processing of these massive sequence data, without which, biologists are hardly able to exploit the full information included in these sequence data. Results: The freely available standalone software package JAGUC implements a broad regime of different functions, allowing for efficient and convenient processing of a huge number of sequence tags, including importing custom-made reference data bases for basic local alignment searches, user-defined quality and search filters for analyses of specific sets of sequences, pairwise alignment-based sequence similarity calculations and clustering as well as sampling saturation and rank abundance analyses. In initial applications, JAGUC successfully analyzed hundreds of thousands of sequence data (eukaryote SSU rRNA genes) from aquatic samples and also was applied for quality assessments of different pyrosequencing platforms. Conclusions: The new program package JAGUC is a tool that bridges the gap between computational and biological sciences. It enables biologists to process large sequence data sets in order to infer biological meaning from hundreds of thousands of raw sequence data. JAGUC offers advantages over available tools which are further discussed in this manuscript.
APA, Harvard, Vancouver, ISO, and other styles
31

Phattarasukol, Somsak, Matthew C. Radey, Colin R. Lappala, Yasuhiro Oda, Hidetada Hirakawa, Mitchell J. Brittnacher, and Caroline S. Harwood. "Identification of ap-Coumarate Degradation Regulon in Rhodopseudomonas palustris by Xpression, an Integrated Tool for Prokaryotic RNA-Seq Data Processing." Applied and Environmental Microbiology 78, no. 19 (July 13, 2012): 6812–18. http://dx.doi.org/10.1128/aem.01418-12.

Full text
Abstract:
ABSTRACTHigh-throughput sequencing of cDNA prepared from RNA, an approach known as RNA-seq, is coming into increasing use as a method for transcriptome analysis. Despite its many advantages, widespread adoption of the technique has been hampered by a lack of easy-to-use, integrated, open-source tools for analyzing the nucleotide sequence data that are generated. Here we describe Xpression, an integrated tool for processing prokaryotic RNA-seq data. The tool is easy to use and is fully automated. It performs all essential processing tasks, including nucleotide sequence extraction, alignment, quantification, normalization, and visualization. Importantly, Xpression processes multiplexed and strand-specific nucleotide sequence data. It extracts and trims specific sequences from files and separately quantifies sense and antisense reads in the final results. Outputs from the tool can also be conveniently used in downstream analysis. In this paper, we show the utility of Xpression to process strand-specific RNA-seq data to identify genes regulated by CouR, a transcription factor that controlsp-coumarate degradation by the bacteriumRhodopseudomonas palustris.
APA, Harvard, Vancouver, ISO, and other styles
32

LIU, Xiang-Yu, and Gang WU. "An Indexing and Query Processing Approach of RDF Data Based on Prüfer Sequence." Chinese Journal of Computers 34, no. 10 (October 28, 2011): 1997–2008. http://dx.doi.org/10.3724/sp.j.1016.2011.01997.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Weihan, Wang. "MAGAN: A masked autoencoder generative adversarial network for processing missing IoT sequence data." Pattern Recognition Letters 138 (October 2020): 211–16. http://dx.doi.org/10.1016/j.patrec.2020.07.025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Yuan, Zhe, Yiming Zhang, Junxia Gao, Xuhong Wang, and Dakang Yuan. "The Superiority of M-sequence in MTEM Data Processing: a Field Contrast Experiment." Journal of Physics: Conference Series 1207 (April 2019): 012017. http://dx.doi.org/10.1088/1742-6596/1207/1/012017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Huang, Chang Jun, Yuan Zhi Cao, Li Min Hu, and Qing Shan Zhou. "Discussing of Subsidence Monitor Data Processing Methods Based on Improved GM (1, 1)." Applied Mechanics and Materials 204-208 (October 2012): 2800–2805. http://dx.doi.org/10.4028/www.scientific.net/amm.204-208.2800.

Full text
Abstract:
Based on the grey system theory, through the analysis of the theory flaw of the GM (1,1) prediction model, taking into the linear trend and the exponential growth trend of the settlement monitoring data series, and using the sequence operator role to enhance the smoothness of the original sequence data, combined with the characteristics of GM (1,1) and the linear regression, this paper presents a new and improved combination forecasting mode. Compared the traditional model, the accuracy of the improved method has been greatly enhanced, and the improved method will better meet the practical requirements.
APA, Harvard, Vancouver, ISO, and other styles
36

Qiu, Yongxiao, Guanghui Du, and Song Chai. "A Novel Algorithm for Distributed Data Stream Using Big Data Classification Model." International Journal of Information Technology and Web Engineering 15, no. 4 (October 2020): 1–17. http://dx.doi.org/10.4018/ijitwe.2020100101.

Full text
Abstract:
In order to solve the problem of real-time detection of power grid equipment anomalies, this paper proposes a data flow classification model based on distributed processing. In order to realize distributed processing of power grid data flow, a local node mining method and a global mining mode based on uneven data flow classification are designed. A data stream classification model based on distributed processing is constructed, then the corresponding data sequence is selected and formatted abstractly, and the local node mining method and global mining mode under this model are designed. In the local node miner, the block-to-block mining strategy is implemented by acquiring the current data blocks. At the same time, the expression and real-time maintenance of local mining patterns are completed by combining the clustering algorithm, thus improving the transmission rate of information between each node and ensuring the timeliness of the overall classification algorithm.
APA, Harvard, Vancouver, ISO, and other styles
37

Najam, Maleeha, Raihan Ur Rasool, Hafiz Farooq Ahmad, Usman Ashraf, and Asad Waqar Malik. "Pattern Matching for DNA Sequencing Data Using Multiple Bloom Filters." BioMed Research International 2019 (April 14, 2019): 1–9. http://dx.doi.org/10.1155/2019/7074387.

Full text
Abstract:
Storing and processing of large DNA sequences has always been a major problem due to increasing volume of DNA sequence data. However, a number of solutions have been proposed but they require significant computation and memory. Therefore, an efficient storage and pattern matching solution is required for DNA sequencing data. Bloom filters (BFs) represent an efficient data structure, which is mostly used in the domain of bioinformatics for classification of DNA sequences. In this paper, we explore more dimensions where BFs can be used other than classification. A proposed solution is based on Multiple Bloom Filters (MBFs) that finds all the locations and number of repetitions of the specified pattern inside a DNA sequence. Both of these factors are extremely important in determining the type and intensity of any disease. This paper serves as a first effort towards optimizing the search for location and frequency of substrings in DNA sequences using MBFs. We expect that further optimizations in the proposed solution can bring remarkable results as this paper presents a proof of concept implementation for a given set of data using proposed MBFs technique. Performance evaluation shows improved accuracy and time efficiency of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
38

Xie, Dong, Jie Xiao, Guangjun Guo, and Tong Jiang. "Processing Uncertain RFID Data in Traceability Supply Chains." Scientific World Journal 2014 (2014): 1–22. http://dx.doi.org/10.1155/2014/535690.

Full text
Abstract:
Radio Frequency Identification (RFID) is widely used to track and trace objects in traceability supply chains. However, massive uncertain data produced by RFID readers are not effective and efficient to be used in RFID application systems. Following the analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. We adjust different smoothing windows according to different rates of uncertain data, employ different strategies to process uncertain readings, and distinguish ghost, missing, and incomplete data according to their apparent positions. We propose a comprehensive data model which is suitable for different application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequence, the position, and the time intervals. The scheme is suitable for cyclic or long paths. Moreover, we further propose a processing algorithm for group and independent objects. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries.
APA, Harvard, Vancouver, ISO, and other styles
39

Tian, Shengfeng, Shaomin Mu, and Chuanhuan Yin. "Length-weighted string kernels for sequence data classification." Pattern Recognition Letters 28, no. 13 (October 2007): 1651–56. http://dx.doi.org/10.1016/j.patrec.2007.04.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Oh, Seung-Joon, and Jae-Yearn Kim. "A hierarchical clustering algorithm for categorical sequence data." Information Processing Letters 91, no. 3 (August 2004): 135–40. http://dx.doi.org/10.1016/j.ipl.2004.04.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Wildes, Richard P., Michael J. Amabile, Ann-Marie Lanzillotto, and Tzong-Shyng Leu. "Recovering Estimates of Fluid Flow from Image Sequence Data." Computer Vision and Image Understanding 80, no. 2 (November 2000): 246–66. http://dx.doi.org/10.1006/cviu.2000.0874.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Guo, Shunan, Zhuochen Jin, David Gotz, Fan Du, Hongyuan Zha, and Nan Cao. "Visual Progression Analysis of Event Sequence Data." IEEE Transactions on Visualization and Computer Graphics 25, no. 1 (January 2019): 417–26. http://dx.doi.org/10.1109/tvcg.2018.2864885.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Musel, Benoit, Louise Kauffmann, Stephen Ramanoël, Coralie Giavarini, Nathalie Guyader, Alan Chauvin, and Carole Peyrin. "Coarse-to-fine Categorization of Visual Scenes in Scene-selective Cortex." Journal of Cognitive Neuroscience 26, no. 10 (October 2014): 2287–97. http://dx.doi.org/10.1162/jocn_a_00643.

Full text
Abstract:
Neurophysiological, behavioral, and computational data indicate that visual analysis may start with the parallel extraction of different elementary attributes at different spatial frequencies and follows a predominantly coarse-to-fine (CtF) processing sequence (low spatial frequencies [LSF] are extracted first, followed by high spatial frequencies [HSF]). Evidence for CtF processing within scene-selective cortical regions is, however, still lacking. In the present fMRI study, we tested whether such processing occurs in three scene-selective cortical regions: the parahippocampal place area (PPA), the retrosplenial cortex, and the occipital place area. Fourteen participants were subjected to functional scans during which they performed a categorization task of indoor versus outdoor scenes using dynamic scene stimuli. Dynamic scenes were composed of six filtered images of the same scene, from LSF to HSF or from HSF to LSF, allowing us to mimic a CtF or the reverse fine-to-coarse (FtC) sequence. Results showed that only the PPA was more activated for CtF than FtC sequences. Equivalent activations were observed for both sequences in the retrosplenial cortex and occipital place area. This study suggests for the first time that CtF sequence processing constitutes the predominant strategy for scene categorization in the PPA.
APA, Harvard, Vancouver, ISO, and other styles
44

Gururaj T. and Siddesh G. M. "Hybrid Approach for Enhancing Performance of Genomic Data for Stream Matching." International Journal of Cognitive Informatics and Natural Intelligence 15, no. 4 (October 2021): 1–18. http://dx.doi.org/10.4018/ijcini.20211001.oa38.

Full text
Abstract:
In gene expression analysis, the expression levels of thousands of genes are analyzed, such as separate stages of treatments or diseases. Identifying particular gene sequence pattern is a challenging task with respect to performance issues. The proposed solution addresses the performance issues in genomic stream matching by involving assembly and sequencing. Counting the k-mer based on k-input value and while performing DNA sequencing tasks, the researches need to concentrate on sequence matching. The proposed solution addresses performance issue metrics such as processing time for k-mer counting, number of operations for matching similarity, memory utilization while performing similarity search, and processing time for stream matching. By suggesting an improved algorithm, Revised Rabin Karp(RRK) for basic operation and also to achieve more efficiency, the proposed solution suggests a novel framework based on Hadoop MapReduce blended with Pig & Apache Tez. The measure of memory utilization and processing time proposed model proves its efficiency when compared to existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
45

Hoang Viet, Nguyen Khanh, Do Thi Huyen, Le Tung Lam, Phung Thi Lan, Phung Thu Nguyet, Nguyen Thuy Tien, and Truong Nam Hai. "Probe design for exploiting gene encoding pectinesterase from dna metagenome data of bacteria in goat rumen and co-expression of gpecs1 gen with chaperone pg-kje8 in Escherichia coli." TAP CHI SINH HOC 40, no. 1 (January 25, 2018): 84–91. http://dx.doi.org/10.15625/0866-7160/v40n1.10917.

Full text
Abstract:
This article introduces the steps of constructing and using probe to exploit the gene encoding pectinesterase from metagenome DNA sequencing data by next generation gene sequencing tools. Probe was used to exploit and select the gene encoding for pectinesterase from the metagenome DNA sequences of bacteria in goat rumen and thereby select a sequence to express in E. coli. According to the CAZy classification system, pectinesterase belongs to the family of carbohydrates esterases CE8 is an enzyme that has many applications in the food processing industry, environmental treatment, animal feed processing and medicine. As the results, 3 sequences of CE8 was retrieved from CAZy database and one probe was designed, this probe length was 367 amino acids contained all the conserved amino acid residues: 200 conserved residues in all sequence, 72 residues similar in almost sequences and residues conserved in many sequences and homologus; choosed highest alkalinity index. Using the probe designed, we filtered four coding sequences for pectinesterase from metagenome DNA sequencing data of bacteria in goat rumen. Spatial structure estimation with Phyre2 has only one sequencing (code 46301) with 100% sequence identity and 90% query coverage with pectinesterase. A artificial gene were synthesized and inserted into the vector pET22b (+) at the NcoI, XhoI to co-express with chaperone pG-KJE8 in E. coli. The recombinant pectinesterase enzyme is expressed in soluble form and has a pectin substrate biodegradation activity. The results demonstrate that using probe for gene extraction is feasible.
APA, Harvard, Vancouver, ISO, and other styles
46

Engesser, Sabrina, Amanda R. Ridley, and Simon W. Townsend. "Meaningful call combinations and compositional processing in the southern pied babbler." Proceedings of the National Academy of Sciences 113, no. 21 (May 6, 2016): 5976–81. http://dx.doi.org/10.1073/pnas.1600970113.

Full text
Abstract:
Language’s expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a “mobbing sequence,” potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought.
APA, Harvard, Vancouver, ISO, and other styles
47

Davies, R. J. "A new batch-processing data-reduction application for X-ray diffraction data." Journal of Applied Crystallography 39, no. 2 (March 12, 2006): 267–72. http://dx.doi.org/10.1107/s0021889806008697.

Full text
Abstract:
Modern synchrotron radiation facility beamlines offer high-brilliance beams and sensitive area detectors. Consequently, experiments such as scanning X-ray microdiffraction can generate large data sets within relatively short time periods. In these specialist fields there are currently very few automated data-treatment solutions to tackle the large data sets produced. Where there is existing software, it is either insufficiently specialized or cannot be operated in a batch-wise processing mode. As a result, a large gap exists between the rate at which X-ray diffraction data can be generated and the rate at which they can be realistically analysed. This article describes a new software application to perform batch-wise data reduction. It is designed to operate in combination with the commonly usedFit2Dprogram. Through the use of intuitive file selection, numerous processing lists and a generic operation sequence, it is capable of the batch-wise reduction of up to 60 000 diffraction patterns during each treatment session. It can perform automated intensity corrections to large data series, perform advanced background-subtraction operations and automatically organizes results. Integration limits can be set graphically on-screen, uniquely derived from existing peak positions or globally calculated from user-supplied values. The software represents a working solution to a hitherto unsolved problem.
APA, Harvard, Vancouver, ISO, and other styles
48

Wu, Ba Teer, Er Gen Gao, and Ye Wu. "The Application and Research of the Double Difference Method about the 2004 Ludian Earthquake Sequence." Advanced Materials Research 726-731 (August 2013): 3123–27. http://dx.doi.org/10.4028/www.scientific.net/amr.726-731.3123.

Full text
Abstract:
The paper is utilizes the double difference method of to relocate the north and south earthquake belt's Ludian earthquake sequence to obtained the Ludian earthquake sequence's detailed earthquake parameters. And at the same time we have established an whole set of perfect method in data processing and result analysising. Have the ability of useing'the north and south earthquake belt earthquake to strengthen the monitor's real time data to accuratly relocate the medium intensity earthquake sequenc of the north and south belt. Thus,we could produce the more accurate earthquake parameter, and short provides a more conclusive evidence into the warning area division and earthquake forecast.
APA, Harvard, Vancouver, ISO, and other styles
49

Laycock, Paul. "Data Preparation for NA62." EPJ Web of Conferences 214 (2019): 02017. http://dx.doi.org/10.1051/epjconf/201921402017.

Full text
Abstract:
In 2017, NA62 recorded over a petabyte of raw data, collecting around a billion events per day of running. Data are collected in bursts of 3-5 seconds, producing output files of a few gigabytes. A typical run, a sequence of bursts with the same detector configuration and similar experimental conditions, contains 1500 bursts and constitutes the basic unit for offline data processing. A sample of 100 random bursts is used to make timing calibrations of all detectors, after which every burst in the run is reconstructed. Finally the reconstructed events are filtered by physics channel with an average reduction factor of 20, and data quality metrics are calculated. Initially a bespoke data processing solution was implemented using a simple finite state machine with limited production system functionality. In 2017, the ATLAS Tier-0 team offered the use of their production system, together with the necessary support. Data processing workflows were rewritten with better error-handling and I/O operations were minimised, the reconstruction software was improved and conditions data handling was changed to follow best practices suggested by the HEP Software Foundation conditions database working group. This contribution describes the experience gained in using these tools and methods for data-processing on a petabyte scale experiment.
APA, Harvard, Vancouver, ISO, and other styles
50

Hryniów, Krzysztof, and Andrzej Dzieliński. "Probabilistic Sequence Mining – Evaluation and Extension of ProMFS Algorithm for Real-Time Problems." International Journal of Electronics and Telecommunications 58, no. 4 (December 1, 2012): 323–26. http://dx.doi.org/10.2478/v10177-012-0044-0.

Full text
Abstract:
Abstract Sequential pattern mining is an extensively studied method for data mining. One of new and less documented approaches is estimation of statistical characteristics of sequence for creating model sequences, that can be used to speed up the process of sequence mining. This paper proposes extensive modifications to one of such algorithms, ProMFS (probabilistic algorithm for mining frequent sequences), which notably increases algorithm’s processing speed by a significant reduction of its computational complexity. A new version of algorithm is evaluated for real-life and artificial data sets and proven to be useful in real-time applications and problems.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography