To see the other types of publications on this topic, follow the link: High Throughput Data Storage.

Journal articles on the topic 'High Throughput Data Storage'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'High Throughput Data Storage.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Amin, A., B. Bockelman, J. Letts, T. Levshina, T. Martin, H. Pi, I. Sfiligoi, M. Thomas, and F. Wüerthwein. "High Throughput WAN Data Transfer with Hadoop-based Storage." Journal of Physics: Conference Series 331, no. 5 (December 23, 2011): 052016. http://dx.doi.org/10.1088/1742-6596/331/5/052016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jararweh, Yaser, Ola Al-Sharqawi, Nawaf Abdulla, Lo'ai Tawalbeh, and Mohammad Alhammouri. "High-Throughput Encryption for Cloud Computing Storage System." International Journal of Cloud Applications and Computing 4, no. 2 (April 2014): 1–14. http://dx.doi.org/10.4018/ijcac.2014040101.

Full text
Abstract:
In recent years Cloud computing has become the infrastructure which small and medium-sized businesses are increasingly adopting for their IT and computational needs. It provides a platform for high performance and throughput oriented computing, and massive data storage. Subsequently, novel tools and technologies are needed to handle this new infrastructure. One of the biggest challenges in this evolving field is Cloud storage security, and accordingly we propose new optimized techniques based on encryption process to achieve better storage system security. This paper proposes a symmetric block algorithm (CHiS-256) to encrypt Cloud data in efficient manner. Also, this paper presents a novel partially encrypted metadata-based data storage. The (CHiS-256) cipher is implemented as part of the Cloud data storage service to offer a secure, high-performance and throughput Cloud storage system. The results of our proposed algorithm are promising and show the methods to be advantageous in Cloud massive data storage and access applications.
APA, Harvard, Vancouver, ISO, and other styles
3

Sardaraz, Muhammad, Muhammad Tahir, and Ataul Aziz Ikram. "Advances in high throughput DNA sequence data compression." Journal of Bioinformatics and Computational Biology 14, no. 03 (June 2016): 1630002. http://dx.doi.org/10.1142/s0219720016300021.

Full text
Abstract:
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
APA, Harvard, Vancouver, ISO, and other styles
4

Rice, William J., Anchi Cheng, Sargis Dallakyan, Swapnil Bhatkar, Shaker Krit, Edward T. Eng, Bridget Carragher, and Clinton S. Potter. "Strategies for Data Flow and Storage for High Throughput, High Resolution Cryo-EM Data Collection." Microscopy and Microanalysis 25, S2 (August 2019): 1394–95. http://dx.doi.org/10.1017/s1431927619007700.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Albayrak, Levent, Kamil Khanipov, George Golovko, and Yuriy Fofanov. "Broom: application for non-redundant storage of high throughput sequencing data." Bioinformatics 35, no. 1 (July 13, 2018): 143–45. http://dx.doi.org/10.1093/bioinformatics/bty580.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hsi-Yang Fritz, M., R. Leinonen, G. Cochrane, and E. Birney. "Efficient storage of high throughput DNA sequencing data using reference-based compression." Genome Research 21, no. 5 (January 18, 2011): 734–40. http://dx.doi.org/10.1101/gr.114819.110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Caspart, René, Max Fischer, Manuel Giffels, Ralf Florian von Cube, Christoph Heidecker, Eileen Kuehn, Günter Quast, Andreas Heiss, and Andreas Petzold. "Setup and commissioning of a high-throughput analysis cluster." EPJ Web of Conferences 245 (2020): 07007. http://dx.doi.org/10.1051/epjconf/202024507007.

Full text
Abstract:
Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyper converged servers offer a good combination of compute power and local storage, they form the ideal basis for these clusters. In this contribution we report on the setup and commissioning of a dedicated analysis cluster setup at Karlsruhe Institute of Technology. This cluster was designed for use cases demanding high data-throughput. Based on hyper converged servers this cluster offers 500 job slots and 1 PB of local storage. Combined with the 100 Gb network connection between the servers and a 200 Gb uplink to the Tier-1 storage, the cluster can sustain a data-throughput of 1 PB per day. In addition, the local storage provided by the hyper converged worker nodes can be used as cache space. This allows employing of caching approaches on the cluster, thereby enabling a more efficient usage of the disk space. In previous contributions this concept has been shown to lead to an expected speedup of 2 to 4 compared to conventional setups.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Qi, Yan-yun Han, Zhong-bin Su, Jun-long Fang, Zhong-qiang Liu, and Kai-yi Wang. "A storage architecture for high-throughput crop breeding data based on improved blockchain technology." Computers and Electronics in Agriculture 173 (June 2020): 105395. http://dx.doi.org/10.1016/j.compag.2020.105395.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Et. al., Pruthvi Raj Venkatesh,. "Integrated Geo Cloud Solution for Seismic Data Processing." INFORMATION TECHNOLOGY IN INDUSTRY 9, no. 2 (March 28, 2021): 589–604. http://dx.doi.org/10.17762/itii.v9i2.392.

Full text
Abstract:
Oil industries generate an enormous volume of digitized data (e.g., seismic data) as a part of their seismic study and move it to the cloud for downstream applications. Moving massive data into the cloud can pose many challenges, especially to Commercial-off-the-shelf geoscience applications as they require very high compute and disk throughput. This paper proposes a digital transformation framework for efficient seismic data processing and storage comprising of: (a) Novel Data storage options, (b) Cloud-based HPC framework for efficient seismic data processing, and (c) MD5 hash calculation using the MapReduce pattern with Hadoop clusters. Azure cloud platform is used to validate the proposed framework and compare it with the existing process. Experimental results show a significant improvement in execution time, throughput, efficiency, and cost. The proposed framework can be used in any domain which deals with extensive data requiring high compute and throughput.
APA, Harvard, Vancouver, ISO, and other styles
10

Andrian, Kim, and Ju. "A Distributed File-Based Storage System for Improving High Availability of Space Weather Data." Applied Sciences 9, no. 23 (November 21, 2019): 5024. http://dx.doi.org/10.3390/app9235024.

Full text
Abstract:
In space science research, the Indonesia National Institute of Aeronautics and Space (LAPAN) is concerned with the development of a system that provides actual information and predictions called the Space Weather Information and Forecast Services (SWIFtS). SWIFtS is supported by a data storage system that serves data, implementing a centralized storage model. This has some problems that impact to researchers as the primary users. The single point of failure and also the delay in data updating on the server is a significant issue when researchers need the latest data, but the server is unable to provide it. To overcome these problems, we proposed a new system that utilized a decentralized model for storing data, leveraging the InterPlanetary File System (IPFS) file system. Our proposed method focused on the automated background process, and its scheme would increase the data availability and throughput by spreading it into nodes through a peer-to-peer connection. Moreover, we also included system monitoring for real-time data flow from each node and information of node status that combines active and passive approaches. For system evaluation, the experiment was performed to determine the performance of the proposed system compared to the existing system by calculating mean replication time and the mean throughput of a node. As expected, performance evaluations showed that our proposed scheme had faster file replication time and supported high throughput.
APA, Harvard, Vancouver, ISO, and other styles
11

Abed Abud, Adam, Matias Bonaventura, Edoardo Farina, and Fabrice Le Goff. "Design of a Resilient, High-Throughput, Persistent Storage System for the ATLAS Phase-II DAQ System." EPJ Web of Conferences 251 (2021): 04014. http://dx.doi.org/10.1051/epjconf/202125104014.

Full text
Abstract:
The ATLAS experiment will undergo a major upgrade to take advantage of the new conditions provided by the upgraded High-Luminosity LHC. The Trigger and Data Acquisition system (TDAQ) will record data at unprecedented rates: the detectors will be read out at 1 MHz generating around 5 TB/s of data. The Dataflow system (DF), component of TDAQ, introduces a novel design: readout data are buffered on persistent storage while the event filtering system analyses them to select 10000 events per second for a total recorded throughput of around 60 GB/s. This approach allows for decoupling the detector activity from the event selection process. New challenges then arise for DF: design and implement a distributed, reliable, persistent storage system supporting several TB/s of aggregated throughput while providing tens of PB of capacity. In this paper we first describe some of the challenges that DF is facing: data safety with persistent storage limitations, indexing of data at high-granularity in a highly-distributed system, and high-performance management of storage capacity. Then the ongoing R&D to address each of the them is presented and the performance achieved with a working prototype is shown.
APA, Harvard, Vancouver, ISO, and other styles
12

Huo, Dao An, and Qiang Cao. "Traffic Throttling to Improve TCP Incast Throughput." Advanced Materials Research 424-425 (January 2012): 227–31. http://dx.doi.org/10.4028/www.scientific.net/amr.424-425.227.

Full text
Abstract:
Cluster-based storage systems are widely used by many large data centers because of the manageability, low cost and other advantages. However, typical cluster-based storage systems rely on the standard TCP/IP Ethernet for clients to access data. Clients would experience the TCP effective throughput (called goodput) collapse which is termed as Incast problem. The main reason of Incast is TCP retransmit timeout which is caused by unfairness between the competing flows. In this paper, we analyze the TCP Incast dynamics from the fairness aspect and propose a method to make the competing flows share the network resource fairly by throttling the traffic. The simulation results show that our method is able to improve the TCP Incast goodput by 10% comparing to the high-timer-resolution
APA, Harvard, Vancouver, ISO, and other styles
13

Zhang, Pan, Brian D. Lehmann, Yu Shyr, and Yan Guo. "The Utilization of Formalin Fixed-Paraffin-Embedded Specimens in High Throughput Genomic Studies." International Journal of Genomics 2017 (2017): 1–9. http://dx.doi.org/10.1155/2017/1926304.

Full text
Abstract:
High throughput genomic assays empower us to study the entire human genome in short time with reasonable cost. Formalin fixed-paraffin-embedded (FFPE) tissue processing remains the most economical approach for longitudinal tissue specimen storage. Therefore, the ability to apply high throughput genomic applications to FFPE specimens can expand clinical assays and discovery. Many studies have measured the accuracy and repeatability of data generated from FFPE specimens using high throughput genomic assays. Together, these studies demonstrate feasibility and provide crucial guidance for future studies using FFPE specimens. Here, we summarize the findings of these studies and discuss the limitations of high throughput data generated from FFPE specimens across several platforms that include microarray, high throughput sequencing, and NanoString.
APA, Harvard, Vancouver, ISO, and other styles
14

Park, Han-Sang, Hillel Price, Silvia Ceballos, Jen-Tsan Chi, and Adam Wax. "Single Cell Analysis of Stored Red Blood Cells Using Ultra-High Throughput Holographic Cytometry." Cells 10, no. 9 (September 17, 2021): 2455. http://dx.doi.org/10.3390/cells10092455.

Full text
Abstract:
Holographic cytometry is introduced as an ultra-high throughput implementation of quantitative phase imaging of single cells flowing through parallel microfluidic channels. Here, the approach was applied for characterizing the morphology of individual red blood cells during storage under regular blood bank conditions. Samples from five blood donors were examined, over 100,000 cells examined for each, at three time points. The approach allows high-throughput phase imaging of a large number of cells, greatly extending our ability to study cellular phenotypes using individual cell images. Holographic cytology images can provide measurements of multiple physical traits of the cells, including optical volume and area, which are observed to consistently change over the storage time. In addition, the large volume of cell imaging data can serve as training data for machine-learning algorithms. For the study here, logistic regression was used to classify the cells according to the storage time points. The analysis showed that at least 5000 cells are needed to ensure accuracy of the classifiers. Overall, results showed the potential of holographic cytometry as a diagnostic tool.
APA, Harvard, Vancouver, ISO, and other styles
15

Sarkar, D., Mahesh P., Padmini S., N. Chouhan, C. Borwankar, A. K. Bhattacharya, A. K. Tickoo, and R. C. Rannot. "Comparison of data storage and analysis throughput in the light of high energy physics experiment MACE." Astronomy and Computing 33 (October 2020): 100409. http://dx.doi.org/10.1016/j.ascom.2020.100409.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

AL-aswad, Muthanna Mohammad, and KHALIL ALWAJEH. "Performance Evaluation of Storage Area Network(SAN)With internet Small Computer System Interface (iSCSI)For Local System PC." Algerian Journal of Signals and Systems 5, no. 3 (September 15, 2020): 167–78. http://dx.doi.org/10.51485/ajss.v5i3.113.

Full text
Abstract:
SCSI is a newly emerging protocol with the goal of implementing the Storage Area Network (SAN) technology over TCP/IP,where enables to access to remote data that in attached storage disks storages - Direct Attached Storage(DAS) over IP-networked. Also it's brings economy and convenience whereas it also raises performance and reliability issues. This paper investigates about possibility , using storage technology of the SANs, and iSCSI-SAN protocol,in local system PC, to improve access to attached storage disks storages in local system, with using iSCSI-SANs as virtual storage, is rather than DAS storage in local system of PC. Explicates after experiment procedure is that improving throughput of iSCSI-SANs was better than attached storage disks storages - DAS in local system . This means is that it can use iSCSI-SANs in local system of PC as attache storage disks storages as DAS , without cost , high performance, and easy control.
APA, Harvard, Vancouver, ISO, and other styles
17

Kim, Minsu, Chaewon Lee, Subin Hong, Song Lim Kim, JeongHo Baek, and Kyung-Hwan Kim. "High-Throughput Phenotyping Methods for Breeding Drought-Tolerant Crops." International Journal of Molecular Sciences 22, no. 15 (July 31, 2021): 8266. http://dx.doi.org/10.3390/ijms22158266.

Full text
Abstract:
Drought is a main factor limiting crop yields. Modern agricultural technologies such as irrigation systems, ground mulching, and rainwater storage can prevent drought, but these are only temporary solutions. Understanding the physiological, biochemical, and molecular reactions of plants to drought stress is therefore urgent. The recent rapid development of genomics tools has led to an increasing interest in phenomics, i.e., the study of phenotypic plant traits. Among phenomic strategies, high-throughput phenotyping (HTP) is attracting increasing attention as a way to address the bottlenecks of genomic and phenomic studies. HTP provides researchers a non-destructive and non-invasive method yet accurate in analyzing large-scale phenotypic data. This review describes plant responses to drought stress and introduces HTP methods that can detect changes in plant phenotypes in response to drought.
APA, Harvard, Vancouver, ISO, and other styles
18

Xiao, Chuqiao, Yefeng Xia, Qian Zhang, Xueqing Gong, and Liyan Zhu. "CBase-EC: Achieving Optimal Throughput-Storage Efficiency Trade-Off Using Erasure Codes." Electronics 10, no. 2 (January 8, 2021): 126. http://dx.doi.org/10.3390/electronics10020126.

Full text
Abstract:
Many distributed database systems that guarantee high concurrency and scalability adopt read-write separation architecture. Simultaneously, these systems need to store massive amounts of data daily, requiring different mechanisms for storing and accessing data, such as hot and cold data access strategies. Unlike distributed storage systems, the distributed database splits a table into sub-tables or shards, and the request frequency of each sub-table is not the same within a specific time. Therefore, it is not only necessary to design hot-to-cold approaches to reduce storage overhead, but also cold-to-hot methods to ensure high concurrency of those systems. We present a new redundant strategy named CBase-EC, using erasure codes to trade the performances of transaction processing and storage efficiency for CBase database systems developed for financial scenarios of the Bank. Two algorithms are proposed: the hot-cold tablets (shards) recognition algorithm and the hot-cold dynamic conversion algorithm. Then we adopt two optimization approaches to improve CBase-EC performance. In the experiment, we compare CBase-EC with three-replicas in CBase. The experimental results show that although the transaction processing performance declined by no more than 6%, the storage efficiency increased by 18.4%.
APA, Harvard, Vancouver, ISO, and other styles
19

Xiao, Chuqiao, Yefeng Xia, Qian Zhang, Xueqing Gong, and Liyan Zhu. "CBase-EC: Achieving Optimal Throughput-Storage Efficiency Trade-Off Using Erasure Codes." Electronics 10, no. 2 (January 8, 2021): 126. http://dx.doi.org/10.3390/electronics10020126.

Full text
Abstract:
Many distributed database systems that guarantee high concurrency and scalability adopt read-write separation architecture. Simultaneously, these systems need to store massive amounts of data daily, requiring different mechanisms for storing and accessing data, such as hot and cold data access strategies. Unlike distributed storage systems, the distributed database splits a table into sub-tables or shards, and the request frequency of each sub-table is not the same within a specific time. Therefore, it is not only necessary to design hot-to-cold approaches to reduce storage overhead, but also cold-to-hot methods to ensure high concurrency of those systems. We present a new redundant strategy named CBase-EC, using erasure codes to trade the performances of transaction processing and storage efficiency for CBase database systems developed for financial scenarios of the Bank. Two algorithms are proposed: the hot-cold tablets (shards) recognition algorithm and the hot-cold dynamic conversion algorithm. Then we adopt two optimization approaches to improve CBase-EC performance. In the experiment, we compare CBase-EC with three-replicas in CBase. The experimental results show that although the transaction processing performance declined by no more than 6%, the storage efficiency increased by 18.4%.
APA, Harvard, Vancouver, ISO, and other styles
20

Barisits, Martin, Mikhail Borodin, Alessandro Di Girolamo, Johannes Elmsheuser, Dmitry Golubkov, Alexei Klimentov, Mario Lassnig, Tadashi Maeno, Rodney Walker, and Xin Zhao. "ATLAS Data Carousel." EPJ Web of Conferences 245 (2020): 04035. http://dx.doi.org/10.1051/epjconf/202024504035.

Full text
Abstract:
The ATLAS experiment at CERN’s LHC stores detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide, currently in total about 200PB on disk and 250PB on tape. Data have different access characteristics due to various computational workflows, and can be accessed from different media, such as remote I/O, disk cache on hard disk drives or SSDs. Also, larger data centers provide the majority of offline storage capability via tape systems. For the HighLuminosity LHC (HL-LHC), the estimated data storage requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing was very successful in the last years with high performance and high throughput computing integration and in using opportunistic computing resources for the Monte Carlo simulation. On the other hand, equivalent opportunistic storage does not exist. ATLAS started the Data Carousel project to increase the usage of less expensive storage, i.e. tapes or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input data are available at any one time. With this project, we aim to demonstrate that this is the natural way to dramatically reduce our storage cost. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Phase II now requires a tight integration of the workload and data management systems. Additionally, the Data Carousel studies the feasibility to run multiple computing workflows from tape. The project is progressing very well and the results presented in this document will be used before the LHC Run 3.
APA, Harvard, Vancouver, ISO, and other styles
21

Tang, Yun-Ching, Hong-Ren Wang, Hongchin Lin, and Jun-Zhe Huang. "DESIGN OF AN AREA-EFFICIENT HIGH-THROUGHPUT SHIFT-BASED LDPC DECODER." Journal of Circuits, Systems and Computers 22, no. 06 (July 2013): 1350039. http://dx.doi.org/10.1142/s0218126613500394.

Full text
Abstract:
An area-efficient high-throughput shift-based LDPC decoder architecture is proposed. The specially designed (512, 1,024) parity-check matrix is effective for partial parallel decoding by the min-sum algorithm (MSA). To increase throughput during decoding, two data frames are fed into the decoder to minimize idle time of the check node unit (CNU) and the variable node unit (VNU). Thus, the throughput is increased to almost two-fold. Unlike the conventional architecture, the message storage unit contains shift registers instead of de-multiplexers and registers. Therefore, hardware costs are reduced. Routing congestion and critical path delay are also reduced, which increases energy efficiency. An implementation of the proposed decoder using TSMC 0.18 μm CMOS process achieves a decoding throughput of 1.725 Gbps, at a clock frequency of 56 MHz, a supply voltage of 1.8 V, and a core area of 5.18 mm2. The normalized area is smaller and the throughput per normalized power consumption is higher than those reported using the conventional architectures.
APA, Harvard, Vancouver, ISO, and other styles
22

Joe, Vijesh, Jennifer S. Raj, and Smys S. "Towards Efficient Big Data Storage With MapReduce Deduplication System." International Journal of Information Technology and Web Engineering 16, no. 2 (April 2021): 45–57. http://dx.doi.org/10.4018/ijitwe.2021040103.

Full text
Abstract:
In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, Yingying, Bo Liu, Hongbo Liu, and Yudong Yao. "VLC-based Data Transfer and Energy Harvesting Mobile System." Journal of Ubiquitous Systems and Pervasive Networks 15, no. 01 (March 1, 2021): 01–09. http://dx.doi.org/10.5383/juspn.15.01.001.

Full text
Abstract:
This paper explores a low-cost portable visible light communication (VLC) system to support the increasing needs of lightweight mobile applications. VLC grows rapidly in the past decade for many applications (e.g., indoor data transmission, human sensing, and visual MIMO) due to its RF interference immunity and inherent high security. However, most existing VLC systems heavily rely on fixed infrastructures with less adaptability to emerging lightweight mobile applications. This work proposes Light Storage, a portable VLC system takes the advantage of commercial smartphone flashlights as the transmitter and a solar panel equipped with both data reception and energy harvesting modules as the receiver. Light Storage can achieve concurrent data transmission and energy harvesting from the visible light signals. It develops multi-level light intensity data modulation to increase data throughput and integrates the noise reduction functionality to allow portability under various lighting conditions. The system supports synchronization together with adaptive error correction to overcome both the linear and non-linear signal offsets caused by the low time-control ability from the commercial smartphones. Finally, the energy harvesting capability in Light Storage provides sufficient energy support for efficient short range communication. Light Storage is validated in both indoor and outdoor environments and can achieve over 98% data decoding accuracy, demonstrating the potential as an important alternative to support low-cost and portable short range communication.
APA, Harvard, Vancouver, ISO, and other styles
24

Mkrtchyan, Tigran, Krishnaveni Chitrapu, Vincent Garonne, Dmitry Litvintsev, Svenja Meyer, Paul Millar, Lea Morschel, Albert Rossi, and Marina Sahakyan. "dCache: Inter-disciplinary storage system." EPJ Web of Conferences 251 (2021): 02010. http://dx.doi.org/10.1051/epjconf/202125102010.

Full text
Abstract:
The dCache project provides open-source software deployed internationally to satisfy ever more demanding storage requirements. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, data sharing over wide area networks, efficient access from HPC clusters and long term data persistence on a tertiary storage. Though it was originally developed for the HEP experiments, today it is used by various scientific communities, including astrophysics, biomed, life science, which have their specific requirements. In this paper we describe some of the new requirements as well as demonstrate how dCache developers are addressing them.
APA, Harvard, Vancouver, ISO, and other styles
25

Nobrega, R. Paul, Michael Brown, Cody Williams, Chris Sumner, Patricia Estep, Isabelle Caffry, Yao Yu, et al. "Database-Centric Method for Automated High-Throughput Deconvolution and Analysis of Kinetic Antibody Screening Data." SLAS TECHNOLOGY: Translating Life Sciences Innovation 22, no. 5 (April 21, 2017): 547–56. http://dx.doi.org/10.1177/2472630317705611.

Full text
Abstract:
The state-of-the-art industrial drug discovery approach is the empirical interrogation of a library of drug candidates against a target molecule. The advantage of high-throughput kinetic measurements over equilibrium assessments is the ability to measure each of the kinetic components of binding affinity. Although high-throughput capabilities have improved with advances in instrument hardware, three bottlenecks in data processing remain: (1) intrinsic molecular properties that lead to poor biophysical quality in vitro are not accounted for in commercially available analysis models, (2) processing data through a user interface is time-consuming and not amenable to parallelized data collection, and (3) a commercial solution that includes historical kinetic data in the analysis of kinetic competition data does not exist. Herein, we describe a generally applicable method for the automated analysis, storage, and retrieval of kinetic binding data. This analysis can deconvolve poor quality data on-the-fly and store and organize historical data in a queryable format for use in future analyses. Such database-centric strategies afford greater insight into the molecular mechanisms of kinetic competition, allowing for the rapid identification of allosteric effectors and the presentation of kinetic competition data in absolute terms of percent bound to antigen on the biosensor.
APA, Harvard, Vancouver, ISO, and other styles
26

Correia, Damien, Olivia Doppelt-Azeroual, Jean-Baptiste Denis, Mathias Vandenbogaert, and Valérie Caro. "MetaGenSense : A web application for analysis and visualization of high throughput sequencing metagenomic data." F1000Research 4 (April 2, 2015): 86. http://dx.doi.org/10.12688/f1000research.6139.1.

Full text
Abstract:
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
APA, Harvard, Vancouver, ISO, and other styles
27

Correia, Damien, Olivia Doppelt-Azeroual, Jean-Baptiste Denis, Mathias Vandenbogaert, and Valérie Caro. "MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data." F1000Research 4 (August 22, 2016): 86. http://dx.doi.org/10.12688/f1000research.6139.2.

Full text
Abstract:
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
APA, Harvard, Vancouver, ISO, and other styles
28

Correia, Damien, Olivia Doppelt-Azeroual, Jean-Baptiste Denis, Mathias Vandenbogaert, and Valérie Caro. "MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data." F1000Research 4 (December 1, 2016): 86. http://dx.doi.org/10.12688/f1000research.6139.3.

Full text
Abstract:
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
APA, Harvard, Vancouver, ISO, and other styles
29

Lambert, Christophe, Cassandra Braxton, Robert Charlebois, Avisek Deyati, Paul Duncan, Fabio La Neve, Heather Malicki, et al. "Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection." Viruses 10, no. 10 (September 27, 2018): 528. http://dx.doi.org/10.3390/v10100528.

Full text
Abstract:
High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.
APA, Harvard, Vancouver, ISO, and other styles
30

Gomez-Sanchez, Ruben, Stephen Besley, Julie Quayle, Jasmine Green, Natalie Warren-Godkin, Irene Areri, and Zoe Zeliku. "Maintaining a High-Quality Screening Collection: The GSK Experience." SLAS DISCOVERY: Advancing the Science of Drug Discovery 26, no. 8 (June 18, 2021): 1065–70. http://dx.doi.org/10.1177/24725552211017526.

Full text
Abstract:
The storage of screening collections in DMSO is commonplace in the pharmaceutical industry. To ensure a high-quality screening collection, and hence effective and efficient high-throughput screening, all compounds entering the GlaxoSmithKline (GSK) screening collection undergo a liquid chromatography–mass spectrometry (LC-MS) quality control (QC). It is generally accepted that even under optimal conditions, a small percentage of these compounds are unstable after prolonged storage in DMSO. This article presents how these QC data can be mined using a data-driven clustering algorithm to identify chemical substructures likely to cause degradation in DMSO. This knowledge provides new structural filters for use in excluding compounds with these undesirable substructures from the collection. This information also suggests an efficient, targeted approach to compound collection clean-up initiatives. Stability studies are also designed to maintain a high-quality screening collection. To define the best practice for the storage and handling of solution samples, GSK has undertaken stability experiments for two decades, initially to support the implementation of new automated liquid stores and, subsequently, to enhance storage and use of compounds in solution through an understanding of compound degradation under storage and assay conditions.
APA, Harvard, Vancouver, ISO, and other styles
31

Sun, Yangzesheng, Robert F. DeJaco, Zhao Li, Dai Tang, Stephan Glante, David S. Sholl, Coray M. Colina, et al. "Fingerprinting diverse nanoporous materials for optimal hydrogen storage conditions using meta-learning." Science Advances 7, no. 30 (July 2021): eabg3983. http://dx.doi.org/10.1126/sciadv.abg3983.

Full text
Abstract:
Adsorptive hydrogen storage is a desirable technology for fuel cell vehicles, and efficiently identifying the optimal storage temperature requires modeling hydrogen loading as a continuous function of pressure and temperature. Using data obtained from high-throughput Monte Carlo simulations for zeolites, metal-organic frameworks, and hyper–cross-linked polymers, we develop a meta-learning model that jointly predicts the adsorption loading for multiple materials over wide ranges of pressure and temperature. Meta-learning gives higher accuracy and improved generalization compared to fitting a model separately to each material and allows us to identify the optimal hydrogen storage temperature with the highest working capacity for a given pressure difference. Materials with high optimal temperatures are found in close proximity in the fingerprint space and exhibit high isosteric heats of adsorption. Our method and results provide new guidelines toward the design of hydrogen storage materials and a new route to incorporate machine learning into high-throughput materials discovery.
APA, Harvard, Vancouver, ISO, and other styles
32

Reppe, Sjur, Catherine Joan Jackson, Håkon Ringstad, Kim Alexander Tønseth, Hege Bakke, Jon Roger Eidet, and Tor Paaske Utheim. "High Throughput Screening of Additives Using Factorial Design to Promote Survival of Stored Cultured Epithelial Sheets." Stem Cells International 2018 (November 18, 2018): 1–9. http://dx.doi.org/10.1155/2018/6545876.

Full text
Abstract:
There is a need to optimize storage conditions to preserve cell characteristics during transport of cultured cell sheets from specialized culture units to distant hospitals. In this study, we aimed to explore a method to identify additives that diminish the decrease in the viability of stored undifferentiated epidermal cells using multifactorial design and an automated screening procedure. The cultured cells were stored for 7–11 days at 12°C in media supplemented with various additives. Effects were evaluated by calcein staining of live cells as well as morphology. Twenty-six additives were tested using (1) a two-level factorial design in which 10 additives were added or omitted in 64 different combinations and (2) a mixture design with 5 additives at 5 different concentrations in a total of 64 different mixtures. Automated microscopy and cell counting with Fiji enabled efficient processing of data. Significant regression models were identified by Design-Expert software. A calculated maximum increase of live cells to 37 ± 6% was achieved upon storage of cell sheets for 11 days in the presence of 6% glycerol. The beneficial effect of glycerol was shown for epidermal cell sheets from three different donors in two different storage media and with two different factorial designs. We have thus developed a high throughput screening system enabling robust assessment of live cells and identified glycerol as a beneficial additive that has a positive effect on epidermal cell sheet upon storage at 12°C. We believe this method could be of use in other cell culture optimization strategies where a large number of conditions are compared for their effect on cell viability or other quantifiable dependent variables.
APA, Harvard, Vancouver, ISO, and other styles
33

Yang, Tianming, Jing Zhang, and Ningbo Hao. "Improving Read Performance with BP-DAGs for Storage-Efficient File Backup." Open Electrical & Electronic Engineering Journal 7, no. 1 (October 18, 2013): 90–97. http://dx.doi.org/10.2174/1874129001307010090.

Full text
Abstract:
The continued growth of data and high-continuity of application have raised a critical and mounting demand on storage-efficient and high-performance data protection. New technologies, especially the D2D (Disk-to-Disk) deduplication storage are therefore getting wide attention both in academic and industry in the recent years. Existing deduplication systems mainly rely on duplicate locality inside the backup workload to achieve high throughput but suffer from read performance degrading under conditions of poor duplicate locality. This paper presents the design and performance evaluation of a D2D-based de-duplication file backup system, which employs caching techniques to improve write throughput while encoding files as graphs called BP-DAGs (Bi-pointer-based Directed Acyclic Graphs). BP-DAGs not only satisfy the ‘unique’ chunk storing policy of de-duplication, but also help improve file read performance in case of poor duplicate locality workloads. Evaluation results show that the system can achieve comparable read performance than non de-duplication backup systems such as Bacula under representative workloads, and the metadata storage overhead for BP-DAGs are reasonably low.
APA, Harvard, Vancouver, ISO, and other styles
34

Koppad, Saraswati, Annappa B, Georgios V. Gkoutos, and Animesh Acharjee. "Cloud Computing Enabled Big Multi-Omics Data Analytics." Bioinformatics and Biology Insights 15 (January 2021): 117793222110359. http://dx.doi.org/10.1177/11779322211035921.

Full text
Abstract:
High-throughput experiments enable researchers to explore complex multifactorial diseases through large-scale analysis of omics data. Challenges for such high-dimensional data sets include storage, analyses, and sharing. Recent innovations in computational technologies and approaches, especially in cloud computing, offer a promising, low-cost, and highly flexible solution in the bioinformatics domain. Cloud computing is rapidly proving increasingly useful in molecular modeling, omics data analytics (eg, RNA sequencing, metabolomics, or proteomics data sets), and for the integration, analysis, and interpretation of phenotypic data. We review the adoption of advanced cloud-based and big data technologies for processing and analyzing omics data and provide insights into state-of-the-art cloud bioinformatics applications.
APA, Harvard, Vancouver, ISO, and other styles
35

Caspart, Rene, Max Fischer, Manuel Giffels, Christoph Heidecker, Eileen Kühn, Günter Quast, Martin Sauter, Matthias J. Schnepf, and R. Florian von Cube. "Advancing throughput of HEP analysis work-flows using caching concepts." EPJ Web of Conferences 214 (2019): 04007. http://dx.doi.org/10.1051/epjconf/201921404007.

Full text
Abstract:
High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX.
APA, Harvard, Vancouver, ISO, and other styles
36

Zhao, Wen Zhe, Kai Zhao, Qiu Bo Chen, Min Jie Lv, and Zuo Xun Hou. "A Novel Bit-Flipping LDPC Decoder for Solid-State Data Storage." Applied Mechanics and Materials 513-517 (February 2014): 2094–98. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.2094.

Full text
Abstract:
This paper concerns the design of high-speed and low-cost LDPC code bit-flipping decoder. Due to its inferior error correction strength, bit-flipping decoding received very little attention compared with message-passing decoding. Nevertheless, emerging flash-based solid-state data storage systems inherently favor a hybrid bit-flipping/message-passing decoding strategy, due to the significant dynamics and variation of NAND flash memory raw storage reliability. Therefore, for the first time highly efficient silicon implementation of bit-flipping decoder becomes a practically relevant topic. To address the drawbacks caused by the global search operation in conventional bit-flipping decoding, this paper presents a novel bit-flipping decoder design. Decoding simulations and ASIC design show that the proposed design solution can achieve upto 80% higher decoding throughput and meanwhile consume upto 50% less silicon cost, while maintaining almost the same decoding error correction strength.
APA, Harvard, Vancouver, ISO, and other styles
37

Sirisha, N., and K. V.D. Kiran. "Authorization of Data In Hadoop Using Apache Sentry." International Journal of Engineering & Technology 7, no. 3.6 (July 4, 2018): 234. http://dx.doi.org/10.14419/ijet.v7i3.6.14978.

Full text
Abstract:
Big Data has become more popular, as it can provide on-demand, reliable and flexible services to users such as storage and its processing. The data security has become a major issue in the Big data. The open source HDFS software is used to store huge amount of data with high throughput and fault tolerance and Map Reduce is used for its computations and processing. However, it is a significant target in the Hadoop system, security model was not designed and became the major drawback of Hadoop software. In terms of storage, meta data security, sensitive data and also the data security will be an serious issue in HDFS. With the importance of Hadoop in today's enterprises, there is also an increasing trend in providing a high security features in enterprises. Over recent years, only some level of security in Hadoop such as Kerberos and Transparent Data Encryption(TDE),Encryption techniques, hash techniques are shown for Hadoop. This paper, shows the efforts that are made to present Hadoop Authorization security issues using Apache Sentry in HDFS.
APA, Harvard, Vancouver, ISO, and other styles
38

Lin, Ying-Chih, Chin-Sheng Yu, and Yen-Jen Lin. "Enabling Large-Scale Biomedical Analysis in the Cloud." BioMed Research International 2013 (2013): 1–6. http://dx.doi.org/10.1155/2013/185679.

Full text
Abstract:
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.
APA, Harvard, Vancouver, ISO, and other styles
39

Motz, Gary, Alexander Zimmerman, Kimberly Cook, and Alyssa Bancroft. "Collections Management and High-Throughput Digitization using Distributed Cyberinfrastructure Resources." Biodiversity Information Science and Standards 2 (July 5, 2018): e25643. http://dx.doi.org/10.3897/biss.2.25643.

Full text
Abstract:
Collections digitization relies increasingly upon computational and data management resources that occasionally exceed the capacity of natural history collections and their managers and curators. Digitization of many tens of thousands of micropaleontological specimen slides, as evidenced by the effort presented here by the Indiana University Paleontology Collection, has been a concerted effort in adherence to the recommended practices of multifaceted aspects of collections management for both physical and digital collections resources. This presentation highlights the contributions of distributed cyberinfrastructure from the National Science Foundation-supported Extreme Science and Engineering Discovery Environment (XSEDE) for web-hosting of collections management system resources and distributed processing of millions of digital images and metadata records of specimens from our collections. The Indiana University Center for Biological Research Collections is currently hosting its instance of the Specify collections management system (CMS) on a virtual server hosted on Jetstream, the cloud service for on-demand computational resources as provisioned by XSEDE. This web-service allows the CMS to be flexibly hosted on the cloud with additional services that can be provisioned on an as-needed basis for generating and integrating digitized collections objects in both web-friendly and digital preservation contexts. On-demand computing resources can be used for the manipulation of digital images for automated file I/O, scripted renaming of files for adherence to file naming conventions, derivative generation, and backup to our local tape archive for digital disaster preparedness and long-term storage. Here, we will present our strategies for facilitating reproducible workflows for general collections digitization of the IUPC nomenclatorial types and figured specimens in addition to the gigapixel resolution photographs of our large collection of microfossils using our GIGAmacro system (e.g., this slide of conodonts). We aim to demonstrate the flexibility and nimbleness of cloud computing resources for replicating this, and other, workflows to enhance the findability, accessibility, interoperability, and reproducibility of the data and metadata contained within our collections.
APA, Harvard, Vancouver, ISO, and other styles
40

Cheng, Yinyi, Kefa Zhou, Jinlin Wang, Philippe De Maeyer, Tim Van de Voorde, Jining Yan, and Shichao Cui. "A Comprehensive Study of Geochemical Data Storage Performance Based on Different Management Methods." Remote Sensing 13, no. 16 (August 13, 2021): 3208. http://dx.doi.org/10.3390/rs13163208.

Full text
Abstract:
The spatial calculation of vector data is crucial for geochemical analysis in geological big data. However, large volumes of geochemical data make for inefficient management. Therefore, this study proposed a shapefile storage method based on MongoDB in GeoJSON form (SSMG) and a shapefile storage method based on PostgreSQL with open location code (OLC) geocoding (SSPOG) to solve the problem of low efficiency of electronic form management. The SSMG method consists of a JSONification tier and a cloud storage tier, while the SSPOG method consists of a geocoding tier, an extension tier, and a storage tier. Using MongoDB and PostgreSQL as databases, this study achieved two different types of high-throughput and high-efficiency methods for geochemical data storage and retrieval. Xinjiang, the largest province in China, was selected as the study area in which to test the proposed methods. Using geochemical data from shapefile as a data source, several experiments were performed to improve geochemical data storage efficiency and achieve efficient retrieval. The SSMG and SSPOG methods can be applied to improve geochemical data storage using different architectures, so as to achieve management of geochemical data organization in an efficient way, through time consumed and data compression ratio (DCR), in order to better support geological big data. The purpose of this study was to find ways to build a storage method that can improve the speed of geochemical data insertion and retrieval by using excellent big data technology to help us efficiently solve problem of geochemical data preprocessing and provide support for geochemical analysis.
APA, Harvard, Vancouver, ISO, and other styles
41

Groeneveld, E., and C. V. C. Truong. "A database for efficient storage and management of multi panel SNP data." Archives Animal Breeding 56, no. 1 (November 20, 2013): 1023–27. http://dx.doi.org/10.7482/0003-9438-56-103.

Full text
Abstract:
Abstract. The fast development of high throughput genotyping has opened up new possibilities in genetics while at the same time producing immense data handling issues. A system design and proof of concept implementation are presented which provides efficient data storage and manipulation of single nucleotide polymorphism (SNP) genotypes in a relational database. A new strategy using SNP and individual selection vectors allows us to view SNP data as matrices or sets. These genotype sets provide an easy way to handle original and derived data, the latter at basically no storage costs. Due to its vector based database storage, data imports and exports are much faster than those of other SNP databases. In the proof of concept implementation, the compressed storage scheme reduces disk space requirements by a factor of around 300. Furthermore, this design scales linearly with number of individuals and SNPs involved. The procedure supports panels of different sizes. This allows a straight forward management of different panel sizes in the same population as it occurs in animal breeding programs when higher density panels replace previous lower density versions.
APA, Harvard, Vancouver, ISO, and other styles
42

Mkrtchyan, Tigran, Olufemi Adeyemi, Vincent Garonne, Dmitry Litvintsev, Paul Millar, Lea Morschel, Albert Rossi, Marina Sahakyan, Jürgen Starek, and Sibel Yasar. "dCache - Keeping up With the Evolution of Science." EPJ Web of Conferences 245 (2020): 04039. http://dx.doi.org/10.1051/epjconf/202024504039.

Full text
Abstract:
The dCache project provides open-source software deployed internationally to satisfy ever more demanding storage requirements of various scientific communities. Its multifaceted approach provides an integrated way of supporting different use-cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems, including event driven workflow management. With this presentation, we will show some of the recent developments that optimize data management and access to maximise the gain from stored data.
APA, Harvard, Vancouver, ISO, and other styles
43

Tanaka, Kenji, Manabu Oumi, Takashi Niwa, Susumu Ichihara, Yasuyuki Mitsuoka, Kunio Nakajima, Toshifumi Ohkubo, Hiroshi Hosaka, and Kiyoshi Itao. "High Spatial Resolution and Throughput Potential of an Optical Head with a Triangular Aperture for Near-Field Optical Data Storage." Japanese Journal of Applied Physics 42, Part 1, No. 2B (February 28, 2003): 1113–17. http://dx.doi.org/10.1143/jjap.42.1113.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Liu, Yong, Bing Li, Yan Zhang, and Xia Zhao. "A Huffman-Based Joint Compression and Encryption Scheme for Secure Data Storage Using Physical Unclonable Functions." Electronics 10, no. 11 (May 25, 2021): 1267. http://dx.doi.org/10.3390/electronics10111267.

Full text
Abstract:
With the developments of Internet of Things (IoT) and cloud-computing technologies, cloud servers need storage of a huge volume of IoT data with high throughput and robust security. Joint Compression and Encryption (JCAE) scheme based on Huffman algorithm has been regarded as a promising technology to enhance the data storage method. Existing JCAE schemes still have the following limitations: (1) The keys in the JCAE would be cracked by physical and cloning attacks; (2) The rebuilding of Huffman tree reduces the operational efficiency; (3) The compression ratio should be further improved. In this paper, a Huffman-based JCAE scheme using Physical Unclonable Functions (PUFs) is proposed. It provides physically secure keys with PUFs, efficient Huffman tree mutation without rebuilding, and practical compression ratio by combining the Lempel-Ziv and Welch (LZW) algorithm. The performance of the instanced PUFs and the derived keys was evaluated. Moreover, our scheme was demonstrated in a file protection system with the average throughput of 473Mbps and the average compression ratio of 0.5586. Finally, the security analysis shows that our scheme resists physical and cloning attacks as well as several classic attacks, thus improving the security level of existing data protection methods.
APA, Harvard, Vancouver, ISO, and other styles
45

Jayaraj, V., and S. Alonshia. "Quality based drip drag match data collection in wireless sensor network." International Journal of Engineering & Technology 7, no. 1.1 (December 21, 2017): 426. http://dx.doi.org/10.14419/ijet.v7i1.1.9948.

Full text
Abstract:
Although data collection has received much attention by effectively minimizing delay, computational complexity and increasing the total data transmitted, the transience of sensor nodes for multiple data collection of sensed node in wireless sensor network (WSN) renders quality of service a great challenge. To circumvent transience of sensor nodes for multiple data collection, Quality based Drip-Drag-Match Data Collection (QDDM-DC) scheme have been proposed. In Drip-Drag-Match data collection scheme, initially dripping of data is done on the sink by applying Equidistant-based Optimum Communication Path from the sensor nodes which reduces the data loss. Next the drag operation pulls out the required sensed data using Neighbourhood-based model from multiple locations to reduce the delay for storage. Finally, the matching operation, compares the sensed data received by the dragging operation to that of the corresponding sender sensor node (drip stage) and stores the sensed data accurately which in turn improves the throughput and quality of data collection. Simulation is carried for the QDDM-DC scheme with multiple scenarios (size of data, number of sinks, storage capacity) in WSN with both random and deterministic models. Simulation results show that QDDM-DC provides better performance than other data collection schemes, especially with high throughput, ensuring minimum delay and data loss for effective multiple data collection of sensed data in WSN.
APA, Harvard, Vancouver, ISO, and other styles
46

Mkrtchyan, Tigran, Olufemi Adeyemi, Patrick Fuhrmann, Vincent Garonne, Dmitry Litvintsev, Paul Millar, Albert Rossi, Marina Sahakyan, Jürgen Starek, and Sibel Yasar. "dCache - storage for advanced scientific use cases and beyond." EPJ Web of Conferences 214 (2019): 04042. http://dx.doi.org/10.1051/epjconf/201921404042.

Full text
Abstract:
The dCache project provides open source storage software deployed internationally to satisfy ever more demanding scientific storage requirements. Its multifaceted approach provides an integrated way of supporting different use cases with the same storage, from high throughput data ingest, through wide access and easy integration with existing systems. In supporting new communities, such as photon science and microbiology, dCache is evolving to provide new features and access to new technologies. In this paper, we describe some of these recent features that facilitate the use of storage to maximise the gain from stored data, including quality-of-service management, support for distributed and federated systems, and improvements with support for parallel NFS (pNFS).
APA, Harvard, Vancouver, ISO, and other styles
47

PAN, ZIQIANG, LIN LI, ZHIHUA SHEN, YAN CHEN, and MEI LI. "Characterization of the Microbiota in Air- or Vacuum-Packed Crisp Grass Carp (Ctenopharyngodon idella C. et V.) Fillets by 16S rRNA PCR–Denaturing Gradient Gel Electrophoresis and High-Throughput Sequencing." Journal of Food Protection 81, no. 6 (May 15, 2018): 1022–29. http://dx.doi.org/10.4315/0362-028x.jfp-17-498.

Full text
Abstract:
ABSTRACT The microbial communities in air- and vacuum-packed crisp grass carp (Ctenopharyngodon idella C. et V.) fillets have not been characterized during chilled storage. High-throughput sequencing of bacterial 16S rRNA has now revealed that the bacterial community in fresh fillets is diverse and distinct from that in spoiled samples. The predominant phylum was Proteobacteria, and 66 genera were identified. In fresh fillets, the most abundant genera were Acinetobacter (53.3%), Wautersiella (6.3%), unclassified Alcaligenaceae (4.4%), Stenotrophomonas (3.8%), unclassified Enterobacteriaceae (3.8%), and Enhydrobacter (3.6%). These genera diminished during chilled storage and sometimes disappeared. At the end of storage, Aeromonas and Pseudomonas were the most abundant. Similar results were obtained by PCR–denaturing gradient gel electrophoresis. These data provide detailed insight into the evolving bacterial communities in air- and vacuum-packed crisp grass carp fillets during storage, revealing Aeromonas and Pseudomonas as major spoilage organisms. These data may be useful for improvement of crisp grass carp quality and shelf life during chilled storage.
APA, Harvard, Vancouver, ISO, and other styles
48

Xiao, Shun Wen, Xian Zhi Dai, Huai Bing Qi, Han Kui Liu, Qian Shu Zhang, and Yun Xiu Wang. "High-Speed Parallel Implementation of AES Key Expansion Algorithm Based on FPGA." Applied Mechanics and Materials 719-720 (January 2015): 712–16. http://dx.doi.org/10.4028/www.scientific.net/amm.719-720.712.

Full text
Abstract:
According to the traditional AES algorithm, we present an optimized scheme, which offers an implementation of AES key expansion algorithm. The key expansion algorithm is shown by matrix in this scheme, then it is converted to look-up table, we use FPGA which has rich look-up table and storage resources to implement algorithm in parallel. The scheme reduces the complexity of the algorithm. As can be seen from experimental results, according to the needs of the encryption system, the system data processing speed and data throughput can be changed in real-time by changing the system clock.
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Bao Yi, Chao Luo, and Shao Min Zhang. "Research on Application of Cloud Storage in Smart Distribution and Consumption Integrated Management System." Applied Mechanics and Materials 494-495 (February 2014): 1687–90. http://dx.doi.org/10.4028/www.scientific.net/amm.494-495.1687.

Full text
Abstract:
At present, the situations of long period independent operation of each kind of distribution and consumption management system and lack of effective integration, it obstructed the inter-departmental coordination of business operations. According to IEC61968, the paper proposed a smart distribution and consumption integrated management system that integration of existing systems at the distribution and consumption side. The system can solve the problem of information islands. With the scope of the power system data acquisition continues to expand, in the future need mass data storage and analysis capabilities, it is insufficient of the existing storing and information-processing platform of the power system, the cloud storage based on Hadoop can solve the problem. The Experiments show that cloud storage based on Hadoop clusters has characteristics of high throughput rates and high transmission rate, which can meet the mass data storage access requirements of the smart distribution and consumption integrated management system.
APA, Harvard, Vancouver, ISO, and other styles
50

Ow, T. J., K. Upadhyay, T. J. Belbin, M. B. Prystowsky, H. Ostrer, and R. V. Smith. "Bioinformatics in otolaryngology research. Part one: concepts in DNA sequencing and gene expression analysis." Journal of Laryngology & Otology 128, no. 10 (September 16, 2014): 848–58. http://dx.doi.org/10.1017/s002221511400200x.

Full text
Abstract:
AbstractBackground:Advances in high-throughput molecular biology, genomics and epigenetics, coupled with exponential increases in computing power and data storage, have led to a new era in biological research and information. Bioinformatics, the discipline devoted to storing, analysing and interpreting large volumes of biological data, has become a crucial component of modern biomedical research. Research in otolaryngology has evolved along with these advances.Objectives:This review highlights several modern high-throughput research methods, and focuses on the bioinformatics principles necessary to carry out such studies. Several examples from recent literature pertinent to otolaryngology are provided. The review is divided into two parts; this first part discusses the bioinformatics approaches applied in nucleotide sequencing and gene expression analysis.Conclusion:This paper demonstrates how high-throughput nucleotide sequencing and transcriptomics are changing biology and medicine, and describes how these changes are affecting otorhinolaryngology. Sound bioinformatics approaches are required to obtain useful information from the vast new sources of data.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography