Academic literature on the topic 'High Throughput Data Storage'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'High Throughput Data Storage.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "High Throughput Data Storage"

1

Amin, A., B. Bockelman, J. Letts, T. Levshina, T. Martin, H. Pi, I. Sfiligoi, M. Thomas, and F. Wüerthwein. "High Throughput WAN Data Transfer with Hadoop-based Storage." Journal of Physics: Conference Series 331, no. 5 (December 23, 2011): 052016. http://dx.doi.org/10.1088/1742-6596/331/5/052016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jararweh, Yaser, Ola Al-Sharqawi, Nawaf Abdulla, Lo'ai Tawalbeh, and Mohammad Alhammouri. "High-Throughput Encryption for Cloud Computing Storage System." International Journal of Cloud Applications and Computing 4, no. 2 (April 2014): 1–14. http://dx.doi.org/10.4018/ijcac.2014040101.

Full text
Abstract:
In recent years Cloud computing has become the infrastructure which small and medium-sized businesses are increasingly adopting for their IT and computational needs. It provides a platform for high performance and throughput oriented computing, and massive data storage. Subsequently, novel tools and technologies are needed to handle this new infrastructure. One of the biggest challenges in this evolving field is Cloud storage security, and accordingly we propose new optimized techniques based on encryption process to achieve better storage system security. This paper proposes a symmetric block algorithm (CHiS-256) to encrypt Cloud data in efficient manner. Also, this paper presents a novel partially encrypted metadata-based data storage. The (CHiS-256) cipher is implemented as part of the Cloud data storage service to offer a secure, high-performance and throughput Cloud storage system. The results of our proposed algorithm are promising and show the methods to be advantageous in Cloud massive data storage and access applications.
APA, Harvard, Vancouver, ISO, and other styles
3

Sardaraz, Muhammad, Muhammad Tahir, and Ataul Aziz Ikram. "Advances in high throughput DNA sequence data compression." Journal of Bioinformatics and Computational Biology 14, no. 03 (June 2016): 1630002. http://dx.doi.org/10.1142/s0219720016300021.

Full text
Abstract:
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
APA, Harvard, Vancouver, ISO, and other styles
4

Rice, William J., Anchi Cheng, Sargis Dallakyan, Swapnil Bhatkar, Shaker Krit, Edward T. Eng, Bridget Carragher, and Clinton S. Potter. "Strategies for Data Flow and Storage for High Throughput, High Resolution Cryo-EM Data Collection." Microscopy and Microanalysis 25, S2 (August 2019): 1394–95. http://dx.doi.org/10.1017/s1431927619007700.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Albayrak, Levent, Kamil Khanipov, George Golovko, and Yuriy Fofanov. "Broom: application for non-redundant storage of high throughput sequencing data." Bioinformatics 35, no. 1 (July 13, 2018): 143–45. http://dx.doi.org/10.1093/bioinformatics/bty580.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hsi-Yang Fritz, M., R. Leinonen, G. Cochrane, and E. Birney. "Efficient storage of high throughput DNA sequencing data using reference-based compression." Genome Research 21, no. 5 (January 18, 2011): 734–40. http://dx.doi.org/10.1101/gr.114819.110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Caspart, René, Max Fischer, Manuel Giffels, Ralf Florian von Cube, Christoph Heidecker, Eileen Kuehn, Günter Quast, Andreas Heiss, and Andreas Petzold. "Setup and commissioning of a high-throughput analysis cluster." EPJ Web of Conferences 245 (2020): 07007. http://dx.doi.org/10.1051/epjconf/202024507007.

Full text
Abstract:
Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyper converged servers offer a good combination of compute power and local storage, they form the ideal basis for these clusters. In this contribution we report on the setup and commissioning of a dedicated analysis cluster setup at Karlsruhe Institute of Technology. This cluster was designed for use cases demanding high data-throughput. Based on hyper converged servers this cluster offers 500 job slots and 1 PB of local storage. Combined with the 100 Gb network connection between the servers and a 200 Gb uplink to the Tier-1 storage, the cluster can sustain a data-throughput of 1 PB per day. In addition, the local storage provided by the hyper converged worker nodes can be used as cache space. This allows employing of caching approaches on the cluster, thereby enabling a more efficient usage of the disk space. In previous contributions this concept has been shown to lead to an expected speedup of 2 to 4 compared to conventional setups.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Qi, Yan-yun Han, Zhong-bin Su, Jun-long Fang, Zhong-qiang Liu, and Kai-yi Wang. "A storage architecture for high-throughput crop breeding data based on improved blockchain technology." Computers and Electronics in Agriculture 173 (June 2020): 105395. http://dx.doi.org/10.1016/j.compag.2020.105395.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Et. al., Pruthvi Raj Venkatesh,. "Integrated Geo Cloud Solution for Seismic Data Processing." INFORMATION TECHNOLOGY IN INDUSTRY 9, no. 2 (March 28, 2021): 589–604. http://dx.doi.org/10.17762/itii.v9i2.392.

Full text
Abstract:
Oil industries generate an enormous volume of digitized data (e.g., seismic data) as a part of their seismic study and move it to the cloud for downstream applications. Moving massive data into the cloud can pose many challenges, especially to Commercial-off-the-shelf geoscience applications as they require very high compute and disk throughput. This paper proposes a digital transformation framework for efficient seismic data processing and storage comprising of: (a) Novel Data storage options, (b) Cloud-based HPC framework for efficient seismic data processing, and (c) MD5 hash calculation using the MapReduce pattern with Hadoop clusters. Azure cloud platform is used to validate the proposed framework and compare it with the existing process. Experimental results show a significant improvement in execution time, throughput, efficiency, and cost. The proposed framework can be used in any domain which deals with extensive data requiring high compute and throughput.
APA, Harvard, Vancouver, ISO, and other styles
10

Andrian, Kim, and Ju. "A Distributed File-Based Storage System for Improving High Availability of Space Weather Data." Applied Sciences 9, no. 23 (November 21, 2019): 5024. http://dx.doi.org/10.3390/app9235024.

Full text
Abstract:
In space science research, the Indonesia National Institute of Aeronautics and Space (LAPAN) is concerned with the development of a system that provides actual information and predictions called the Space Weather Information and Forecast Services (SWIFtS). SWIFtS is supported by a data storage system that serves data, implementing a centralized storage model. This has some problems that impact to researchers as the primary users. The single point of failure and also the delay in data updating on the server is a significant issue when researchers need the latest data, but the server is unable to provide it. To overcome these problems, we proposed a new system that utilized a decentralized model for storing data, leveraging the InterPlanetary File System (IPFS) file system. Our proposed method focused on the automated background process, and its scheme would increase the data availability and throughput by spreading it into nodes through a peer-to-peer connection. Moreover, we also included system monitoring for real-time data flow from each node and information of node status that combines active and passive approaches. For system evaluation, the experiment was performed to determine the performance of the proposed system compared to the existing system by calculating mean replication time and the mean throughput of a node. As expected, performance evaluations showed that our proposed scheme had faster file replication time and supported high throughput.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "High Throughput Data Storage"

1

Roguski, Łukasz 1987. "High-throughput sequencing data compression." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/565775.

Full text
Abstract:
Thanks to advances in sequencing technologies, biomedical research has experienced a revolution over recent years, resulting in an explosion in the amount of genomic data being generated worldwide. The typical space requirement for storing sequencing data produced by a medium-scale experiment lies in the range of tens to hundreds of gigabytes, with multiple files in different formats being produced by each experiment. The current de facto standard file formats used to represent genomic data are text-based. For practical reasons, these are stored in compressed form. In most cases, such storage methods rely on general-purpose text compressors, such as gzip. Unfortunately, however, these methods are unable to exploit the information models specific to sequencing data, and as a result they usually provide limited functionality and insufficient savings in storage space. This explains why relatively basic operations such as processing, storage, and transfer of genomic data have become a typical bottleneck of current analysis setups. Therefore, this thesis focuses on methods to efficiently store and compress the data generated from sequencing experiments. First, we propose a novel general purpose FASTQ files compressor. Compared to gzip, it achieves a significant reduction in the size of the resulting archive, while also offering high data processing speed. Next, we present compression methods that exploit the high sequence redundancy present in sequencing data. These methods achieve the best compression ratio among current state-of-the-art FASTQ compressors, without using any external reference sequence. We also demonstrate different lossy compression approaches to store auxiliary sequencing data, which allow for further reductions in size. Finally, we propose a flexible framework and data format, which allows one to semi-automatically generate compression solutions which are not tied to any specific genomic file format. To facilitate data management needed by complex pipelines, multiple genomic datasets having heterogeneous formats can be stored together in configurable containers, with an option to perform custom queries over the stored data. Moreover, we show that simple solutions based on our framework can achieve results comparable to those of state-of-the-art format-specific compressors. Overall, the solutions developed and described in this thesis can easily be incorporated into current pipelines for the analysis of genomic data. Taken together, they provide grounds for the development of integrated approaches towards efficient storage and management of such data.
Gràcies als avenços en el camp de les tecnologies de seqüenciació, en els darrers anys la recerca biomèdica ha viscut una revolució, que ha tingut com un dels resultats l'explosió del volum de dades genòmiques generades arreu del món. La mida típica de les dades de seqüenciació generades en experiments d'escala mitjana acostuma a situar-se en un rang entre deu i cent gigabytes, que s'emmagatzemen en diversos arxius en diferents formats produïts en cada experiment. Els formats estàndards actuals de facto de representació de dades genòmiques són en format textual. Per raons pràctiques, les dades necessiten ser emmagatzemades en format comprimit. En la majoria dels casos, aquests mètodes de compressió es basen en compressors de text de caràcter general, com ara gzip. Amb tot, no permeten explotar els models d'informació especifícs de dades de seqüenciació. És per això que proporcionen funcionalitats limitades i estalvi insuficient d'espai d'emmagatzematge. Això explica per què operacions relativament bàsiques, com ara el processament, l'emmagatzematge i la transferència de dades genòmiques, s'han convertit en un dels principals obstacles de processos actuals d'anàlisi. Per tot això, aquesta tesi se centra en mètodes d'emmagatzematge i compressió eficients de dades generades en experiments de sequenciació. En primer lloc, proposem un compressor innovador d'arxius FASTQ de propòsit general. A diferència de gzip, aquest compressor permet reduir de manera significativa la mida de l'arxiu resultant del procés de compressió. A més a més, aquesta eina permet processar les dades a una velocitat alta. A continuació, presentem mètodes de compressió que fan ús de l'alta redundància de seqüències present en les dades de seqüenciació. Aquests mètodes obtenen la millor ratio de compressió d'entre els compressors FASTQ del marc teòric actual, sense fer ús de cap referència externa. També mostrem aproximacions de compressió amb pèrdua per emmagatzemar dades de seqüenciació auxiliars, que permeten reduir encara més la mida de les dades. En últim lloc, aportem un sistema flexible de compressió i un format de dades. Aquest sistema fa possible generar de manera semi-automàtica solucions de compressió que no estan lligades a cap mena de format específic d'arxius de dades genòmiques. Per tal de facilitar la gestió complexa de dades, diversos conjunts de dades amb formats heterogenis poden ser emmagatzemats en contenidors configurables amb l'opció de dur a terme consultes personalitzades sobre les dades emmagatzemades. A més a més, exposem que les solucions simples basades en el nostre sistema poden obtenir resultats comparables als compressors de format específic de l'estat de l'art. En resum, les solucions desenvolupades i descrites en aquesta tesi poden ser incorporades amb facilitat en processos d'anàlisi de dades genòmiques. Si prenem aquestes solucions conjuntament, aporten una base sòlida per al desenvolupament d'aproximacions completes encaminades a l'emmagatzematge i gestió eficient de dades genòmiques.
APA, Harvard, Vancouver, ISO, and other styles
2

Kalathur, Ravi Kiran Reddy. "An integrated systematic approach for storage, analysis and visualization of gene expression data from neuronal tissues acquired through high-throughput techniques." Université Louis Pasteur (Strasbourg) (1971-2008), 2008. https://publication-theses.unistra.fr/public/theses_doctorat/2008/KALATHUR_Ravi_Kiran_Reddy_2008.pdf.

Full text
Abstract:
The work presented in this manuscript concerns different aspects of gene expression data analysis, encompassing statistical methods and storage and visualization systems used to exploit and mine pertinent information from large volumes of data. Overall, I had the opportunity during my thesis to work on these various aspects firstly, by contributing to the tests through the design of biological applications for new clustering and meta-analysis approaches developed in our laboratory and secondly, by the development of RETINOBASE (http://alnitak. U-strasbg. Fr/RetinoBase/. ) , a relational database for storage and efficient querying of transcriptomic data which represents my major project
Le travail présenté dans ce manuscrit concerne différents aspects de l'analyse des données d'expression de gènes, qui englobe l'utilisation de méthodes statistiques et de systèmes de stockage et de visualisation, pour exploiter et extraire des informations pertinentes à partir de grands volumes de données. Durant ma thèse j'ai eu l'opportunité de travailler sur ces différents aspects, en contribuant en premier lieu aux tests de nouvelles approches de classification et de méta-analyses à travers la conception d'applications biologiques, puis dans le développement de RETINOBASE (http://alnitak. U-strasbg. Fr/RetinoBase/), une base de données relationnelle qui permet le stockage et l'interrogation performante de données de transcriptomique et qui représente la partie majeure de mon travail
APA, Harvard, Vancouver, ISO, and other styles
3

Nicolae, Bogdan. "BlobSeer : towards efficient data storage management for large-scale, distributed systems." Phd thesis, Université Rennes 1, 2010. http://tel.archives-ouvertes.fr/tel-00552271.

Full text
Abstract:
With data volumes increasing at a high rate and the emergence of highly scalable infrastructures (cloud computing, petascale computing), distributed management of data becomes a crucial issue that faces many challenges. This thesis brings several contributions in order to address such challenges. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, it highlights the potentially large benefits of using versioning in this context. Second, based on these principles, it introduces a series of distributed data and metadata management algorithms that enable a high throughput under concurrency. Third, it shows how to efficiently implement these algorithms in practice, dealing with key issues such as high-performance parallel transfers, efficient maintainance of distributed data structures, fault tolerance, etc. These results are used to build BlobSeer, an experimental prototype that is used to demonstrate both the theoretical benefits of the approach in synthetic benchmarks, as well as the practical benefits in real-life, applicative scenarios: as a storage backend for MapReduce applications, as a storage backend for deployment and snapshotting of virtual machine images in clouds, as a quality-of-service enabled data storage service for cloud applications. Extensive experimentations on the Grid'5000 testbed show that BlobSeer remains scalable and sustains a high throughput even under heavy access concurrency, outperforming by a large margin several state-of-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
4

Ljung, Patric. "Visualization of Particle In Cell Simulations." Thesis, Linköping University, Department of Science and Technology, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2340.

Full text
Abstract:

A numerical simulation case involving space plasma and the evolution of instabilities that generates very fast electrons, i.e. approximately at half of the speed of light, is used as a test bed for scientific visualisation techniques. A visualisation system was developed to provide interactive real-time animation and visualisation of the simulation results. The work focuses on two themes and the integration of them. The first theme is the storage and management of the large data sets produced. The second theme deals with how the Visualisation System and Visual Objects are tailored to efficiently visualise the data at hand.

The integration of the themes has resulted in an interactive real-time animation and visualisation system which constitutes a very powerful tool for analysis and understanding of the plasma physics processes. The visualisations contained in this work have spawned many new possible research projects and provided insight into previously not fully understood plasma physics phenomena.

APA, Harvard, Vancouver, ISO, and other styles
5

Carpen-Amarie, Alexandra. "BlobSeer as a data-storage facility for clouds : self-Adaptation, integration, evaluation." Thesis, Cachan, Ecole normale supérieure, 2011. http://www.theses.fr/2011DENS0066/document.

Full text
Abstract:
L’émergence de l’informatique dans les nuages met en avant de nombreux défis qui pourraient limiter l’adoption du paradigme Cloud. Tandis que la taille des données traitées par les applications Cloud augmente exponentiellement, un défi majeur porte sur la conception de solutions efficaces pour la gestion de données. Cette thèse a pour but de concevoir des mécanismes d’auto-adaptation pour des systèmes de gestion de données, afin qu’ils puissent répondre aux exigences des services de stockage Cloud en termes de passage à l’échelle, disponibilité et sécurité des données. De plus, nous nous proposons de concevoir un service de données qui soit à la fois compatible avec les interfaces Cloud standard dans et capable d’offrir un stockage de données à haut débit. Pour relever ces défis, nous avons proposé des mécanismes génériques pour l’auto-connaissance, l’auto-protection et l’auto-configuration des systèmes de gestion de données. Ensuite, nous les avons validés en les intégrant dans le logiciel BlobSeer, un système de stockage qui optimise les accès hautement concurrents aux données. Finalement, nous avons conçu et implémenté un système de fichiers s’appuyant sur BlobSeer, afin d’optimiser ce dernier pour servir efficacement comme support de stockage pour les services Cloud. Puis, nous l’avons intégré dans un environnement Cloud réel, la plate-forme Nimbus. Les avantages et les désavantages de l’utilisation du stockage dans le Cloud pour des applications réelles sont soulignés lors des évaluations effectuées sur Grid’5000. Elles incluent des applications à accès intensif aux données, comme MapReduce, et des applications fortement couplées, comme les simulations atmosphériques
The emergence of Cloud computing brings forward many challenges that may limit the adoption rate of the Cloud paradigm. As data volumes processed by Cloud applications increase exponentially, designing efficient and secure solutions for data management emerges as a crucial requirement. The goal of this thesis is to enhance a distributed data-management system with self-management capabilities, so that it can meet the requirements of the Cloud storage services in terms of scalability, data availability, reliability and security. Furthermore, we aim at building a Cloud data service both compatible with state-of-the-art Cloud interfaces and able to deliver high-throughput data storage. To meet these goals, we proposed generic self-awareness, self-protection and self-configuration components targeted at distributed data-management systems. We validated them on top of BlobSeer, a large-scale data-management system designed to optimize highly-concurrent data accesses. Next, we devised and implemented a BlobSeer-based file system optimized to efficiently serve as a storage backend for Cloud services. We then integrated it within a real-world Cloud environment, the Nimbus platform. The benefits and drawbacks of using Cloud storage for real-life applications have been emphasized in evaluations that involved data-intensive MapReduce applications and tightly-coupled, high-performance computing applications
APA, Harvard, Vancouver, ISO, and other styles
6

Kalathur, Ravi Kiran Reddy Poch Olivier. "Approche systématique et intégrative pour le stockage, l'analyse et la visualisation des données d'expression génique acquises par des techniques à haut débit, dans des tissus neuronaux An integrated systematic approach for storage, analysis and visualization of gene expression data from neuronal tissues acquired through high-throughput techniques /." Strasbourg : Université Louis Pasteur, 2008. http://eprints-scd-ulp.u-strasbg.fr:8080/920/01/KALATHUR_R_2007.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jin, Shuangshuang. "Integrated data modeling in high-throughput proteomices." Online access for everyone, 2007. http://www.dissertations.wsu.edu/Dissertations/Fall2007/S_Jin_111907.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Capparuccini, Maria. "Inferential Methods for High-Throughput Methylation Data." VCU Scholars Compass, 2010. http://scholarscompass.vcu.edu/etd/156.

Full text
Abstract:
The role of abnormal DNA methylation in the progression of disease is a growing area of research that relies upon the establishment of sound statistical methods. The common method for declaring there is differential methylation between two groups at a given CpG site, as summarized by the difference between proportions methylated db=b1-b2, has been through use of a Filtered Two Sample t-test, using the recommended filter of 0.17 (Bibikova et al., 2006b). In this dissertation, we performed a re-analysis of the data used in recommending the threshold by fitting a mixed-effects ANOVA model. It was determined that the 0.17 filter is not accurate and conjectured that application of a Filtered Two Sample t-test likely leads to loss of power. Further, the Two Sample t-test assumes that data arise from an underlying distribution encompassing the entire real number line, whereas b1 and b2 are constrained on the interval . Additionally, the imposition of a filter at a level signifying the minimum level of detectable difference to a Two Sample t-test likely reduces power for smaller but truly differentially methylated CpG sites. Therefore, we compared the Two Sample t-test and the Filtered Two Sample t-test, which are widely used but largely untested with respect to their performance, to three proposed methods. These three proposed methods are a Beta distribution test, a Likelihood ratio test, and a Bootstrap test, where each was designed to address distributional concerns present in the current testing methods. It was ultimately shown through simulations comparing Type I and Type II error rates that the (unfiltered) Two Sample t-test and the Beta distribution test performed comparatively well.
APA, Harvard, Vancouver, ISO, and other styles
9

Durif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.

Full text
Abstract:
L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF
The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Xuekui. "Mixture models for analysing high throughput sequencing data." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/35982.

Full text
Abstract:
The goal of my thesis is to develop methods and software for analysing high-throughput sequencing data, emphasizing sonicated ChIP-seq. For this goal, we developed a few variants of mixture models for genome-wide profiling of transcription factor binding sites and nucleosome positions. Our methods have been implemented into Bioconductor packages, which are freely available to other researchers. For profiling transcription factor binding sites, we developed a method, PICS, and implemented it into a Bioconductor package. We used a simulation study to confirm that PICS compares favourably to rival methods, such as MACS, QuEST, CisGenome, and USeq. Using published GABP and FOXA1 data from human cell lines, we then show that PICS predicted binding sites were more consistent with computationally predicted binding motifs than the alternative methods. For motif discovery using transcription binding sites, we combined PICS with two other existing packages to create the first complete set of Bioconductor tools for peak-calling and binding motif analysis of ChIP-Seq and ChIP-chip data. We demonstrate the effectiveness of our pipeline on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, detecting co-occurring motifs that were consistent with the literature but not detected by other methods. For nucleosome positioning, we modified PICS into a method called PING. PING can handle MNase-Seq and MNase- or sonicated-ChIP-Seq data. It compares favourably to NPS and TemplateFilter in scalability, accuracy and robustness to low read density. To demonstrate that PING predictions from sonicated data can have sufficient spatial resolution to be biologically meaningful, we use H3K4me1 data to detect nucleosome shifts, discriminate functional and non-functional transcription factor binding sites, and confirm that Foxa2 associates with the accessible major groove of nucleosomal DNA. All of the above uses single-end sequencing data. At the end of the thesis, we briefly discuss the issue of processing paired-end data, which we are currently investigating.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "High Throughput Data Storage"

1

Rodríguez-Ezpeleta, Naiara, Michael Hackenberg, and Ana M. Aransay. Bioinformatics for high throughput sequencing. New York, NY: Springer, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Geurts, Werner, Francky Catthoor, Serge Vernalde, and Hugo de Man. Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications. Boston, MA: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4419-8720-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

library, Wiley online, ed. Systems biology in psychiatric research: From high-throughput data to mathematical modeling. Weinheim: Wiley-VCH, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Introduction to clustering large and high-dimensional data. Cambridge: Cambridge University Press, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Po-sŏk. Twaeji yujŏnch'e taeryang yŏmgi sŏyŏl punsŏk mit yuyong yujŏnja palgul =: High-throughput DNA sequence analysis and identification of trait genes in pigs. [Kyŏnggi-do Suwŏn-si]: Nongch'on Chinhŭngch'ŏng, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

J, Franklin Michael. Client Data Caching: A Foundation for High Performance Object Database Systems. Boston, MA: Springer US, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rishe, Naphtali. Storage and visualization of spatial data in a high-performance semantic database system: Technical report #95-15. [Washington, DC: National Aeronautics and Space Administration, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

James, Quinn. High-tech handicapping in the information age: An information management approach to the thoroughbreds. New York: W. Morrow, 1986.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Data-intensive computing: Architectures, algorithms, and applications. Cambridge: Cambridge University Press, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Policy Research Project on Improving Postsecondary Education and Labor Market Transitions for Central Texas High School Students, ed. Beyond the numbers: Improving postsecondary success through a central Texas high school data center. Austin, TX: Lyndon B. Johnson School of Public Affairs, University of Texas at Austin, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "High Throughput Data Storage"

1

Nicolae, Bogdan. "High Throughput Data-Compression for Cloud Storage." In Data Management in Grid and Peer-to-Peer Systems, 1–12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15108-8_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zheng, Liang, Changting Li, Zongbin Liu, Lingchen Zhang, and Cunqing Ma. "Implementation of High Throughput XTS-SM4 Module for Data Storage Devices." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 271–90. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01704-0_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Habyarimana, Ephrem, and Sofia Michailidou. "Genomics Data." In Big Data in Bioeconomy, 69–76. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-71069-9_6.

Full text
Abstract:
AbstractIn silico prediction of plant performance is gaining increasing breeders’ attention. Several statistical, mathematical and machine learning methodologies for analysis of phenotypic, omics and environmental data typically use individual or a few data layers. Genomic selection is one of the applications, where heterogeneous data, such as those from omics technologies, are handled, accommodating several genetic models of inheritance. There are many new high throughput Next Generation Sequencing (NGS) platforms on the market producing whole-genome data at a low cost. Hence, large-scale genomic data can be produced and analyzed enabling intercrosses and fast-paced recurrent selection. The offspring properties can be predicted instead of manually evaluated in the field . Breeders have a short time window to make decisions by the time they receive data, which is one of the major challenges in commercial breeding. To implement genomic selection routinely as part of breeding programs, data management systems and analytics capacity have therefore to be in order. The traditional relational database management systems (RDBMS), which are designed to store, manage and analyze large-scale data, offer appealing characteristics, particularly when they are upgraded with capabilities for working with binary large objects. In addition, NoSQL systems were considered effective tools for managing high-dimensional genomic data. MongoDB system, a document-based NoSQL database, was effectively used to develop web-based tools for visualizing and exploring genotypic information. The Hierarchical Data Format (HDF5), a member of the high-performance distributed file systems family, demonstrated superior performance with high-dimensional and highly structured data such as genomic sequencing data.
APA, Harvard, Vancouver, ISO, and other styles
4

Mostolizadeh, Reihaneh, Andreas Dräger, and Neema Jamshidi. "Insights into Dynamic Network States Using Metabolomic Data." In High-Throughput Metabolomics, 243–58. New York, NY: Springer New York, 2019. http://dx.doi.org/10.1007/978-1-4939-9236-2_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Reinhold, Dominik, Harrison Pielke-Lombardo, Sean Jacobson, Debashis Ghosh, and Katerina Kechris. "Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data." In High-Throughput Metabolomics, 323–40. New York, NY: Springer New York, 2019. http://dx.doi.org/10.1007/978-1-4939-9236-2_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Yao, Linxing, Amy M. Sheflin, Corey D. Broeckling, and Jessica E. Prenni. "Data Processing for GC-MS- and LC-MS-Based Untargeted Metabolomics." In High-Throughput Metabolomics, 287–99. New York, NY: Springer New York, 2019. http://dx.doi.org/10.1007/978-1-4939-9236-2_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Farrusseng, D., L. Baumes, and C. Mirodatos. "Data Management for Combinatorial Heterogeneous Catalysis: Methodology and Development of Advanced Tools." In High-Throughput Analysis, 551–79. Boston, MA: Springer US, 2003. http://dx.doi.org/10.1007/978-1-4419-8989-5_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gubler, Hanspeter. "High-Throughput Screening Data Analysis." In Nonclinical Statistics for Pharmaceutical and Biotechnology Industries, 83–139. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-23558-5_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Agrawal, Shubhra, Sahil Kumar, Raghav Sehgal, Sabu George, Rishabh Gupta, Surbhi Poddar, Abhishek Jha, and Swetabh Pathak. "El-MAVEN: A Fast, Robust, and User-Friendly Mass Spectrometry Data Processing Engine for Metabolomics." In High-Throughput Metabolomics, 301–21. New York, NY: Springer New York, 2019. http://dx.doi.org/10.1007/978-1-4939-9236-2_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Biondi, Sherri A., Jeffrey A. Wolk, and Anne R. Kopf-Sill. "High-Density Reagent Storage Arrays for High-Throughput Screening." In Micro Total Analysis Systems 2000, 459–62. Dordrecht: Springer Netherlands, 2000. http://dx.doi.org/10.1007/978-94-017-2264-3_107.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "High Throughput Data Storage"

1

Olkkonen, Juuso, Kari Kataja, Janne Aikio, and Dennis G. Howe. "Study of high throughput aperture for near field optical data storage." In Optical Data Storage. Washington, D.C.: OSA, 2003. http://dx.doi.org/10.1364/ods.2003.tud3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Endo, Kousuke, Masaru Takai, Kazuma Kurihara, and Kenya Goto. "Readout Measurement with High Throughput GaP Probe Array for Two-dimensional Optical Data Storage Head." In Optical Data Storage. Washington, D.C.: OSA, 2003. http://dx.doi.org/10.1364/ods.2003.tue40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kim, Eun-Kyoung, Sung-Q. Lee, Sang-Choon Ko, and Kang-Ho Park. "Cantilever with High Throughput Multiaperture for Near-Field Optical Data Storage." In International Symposium on Optical Memory and Optical Data Storage. Washington, D.C.: OSA, 2005. http://dx.doi.org/10.1364/isom_ods.2005.wp24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Koets, Michael A., Larry T. McDaniel, Miles R. Darnell, and Jennifer L. Alvarez. "Data access architectures for high throughput, high capacity flash memory storage systems." In 2017 IEEE Aerospace Conference. IEEE, 2017. http://dx.doi.org/10.1109/aero.2017.7943824.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kalim, Umar, Mark Gardner, Eric Brown, and Wu-chun Feng. "Abstract: Cascaded TCP: BIG Throughput for BIG DATA Applications in Distributed HPC." In 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC). IEEE, 2012. http://dx.doi.org/10.1109/sc.companion.2012.229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kalim, Umar, Mark Gardner, Eric Brown, and Wu-chun Feng. "Poster: Cascaded TCP: BIG Throughput for BIG DATA Applications in Distributed HPC." In 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC). IEEE, 2012. http://dx.doi.org/10.1109/sc.companion.2012.230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Sarood, Osman, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. "Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget." In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2014. http://dx.doi.org/10.1109/sc.2014.71.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Malensek, M., S. L. Pallickara, and S. Pallickara. "Galileo: A Framework for Distributed Storage of High-Throughput Data Streams." In 2011 IEEE 4th International Conference on Utility and Cloud Computing (UCC 2011). IEEE, 2011. http://dx.doi.org/10.1109/ucc.2011.13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Huo, Zhisheng, Limin Xiao, Qiaoling Zhong, Shupan Li, Ang Li, Li Ruan, Kelong Liu, Yuanyuan Zang, Pei Wang, and Zheqi Lu. "Hybrid Storage Throughput Allocation Among Multiple Clients in Heterogeneous Data Center." In 2015 IEEE 17th International Conference on High-Performance Computing and Communications; 2015 IEEE 7th International Symposium on Cyberspace Safety and Security; and 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, 2015. http://dx.doi.org/10.1109/hpcc-css-icess.2015.49.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Afonso, Nuno, Manuel Bravo, and Luis Rodrigues. "Combining High Throughput and Low Migration Latency for Consistent Data Storage on the Edge." In 2020 29th International Conference on Computer Communications and Networks (ICCCN). IEEE, 2020. http://dx.doi.org/10.1109/icccn49398.2020.9209720.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "High Throughput Data Storage"

1

Matthews, W. Achieving High Data Throughput in Research Networks. Office of Scientific and Technical Information (OSTI), September 2004. http://dx.doi.org/10.2172/833103.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bulaevskaya, V., and A. P. Sales. Adaptive Sampling for High Throughput Data Using Similarity Measures. Office of Scientific and Technical Information (OSTI), May 2015. http://dx.doi.org/10.2172/1184186.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Langston, Michael A. Scalable Computational Methods for the Analysis of High-Throughput Biological Data. Office of Scientific and Technical Information (OSTI), September 2012. http://dx.doi.org/10.2172/1050046.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Neifeld, Mark A., and Richard W. Ziolkowski. Optically Addressed Nanostructures for High Density Data Storage. Fort Belvoir, VA: Defense Technical Information Center, October 2005. http://dx.doi.org/10.21236/ada440105.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Anderson, Ken. Low-Latency Ultra-High Capacity Holographic Data Storage Archive Library. Office of Scientific and Technical Information (OSTI), December 2014. http://dx.doi.org/10.2172/1164637.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rishe, Naphtali, David Barton, and Mario Sanchez. Storage and Visualization of Spatial Data in a High-Performance Semantic Database System. Fort Belvoir, VA: Defense Technical Information Center, January 1995. http://dx.doi.org/10.21236/ada308598.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Tin Aye. High Capacity High Speed Optical Data Storage System Based on Diffraction-Free Nanobeam. Final Report, 09-02-98 to 03-17-99. Office of Scientific and Technical Information (OSTI), June 1999. http://dx.doi.org/10.2172/755982.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rangaswami, Raju. Department of Energy Project ER25739 Final Report QoS-Enabled, High-performance Storage Systems for Data-Intensive Scientific Computing. Office of Scientific and Technical Information (OSTI), May 2009. http://dx.doi.org/10.2172/1046919.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Idakwo, Gabriel, Sundar Thangapandian, Joseph Luttrell, Zhaoxian Zhou, Chaoyang Zhang, and Ping Gong. Deep learning-based structure-activity relationship modeling for multi-category toxicity classification : a case study of 10K Tox21 chemicals with high-throughput cell-based androgen receptor bioassay data. Engineer Research and Development Center (U.S.), July 2021. http://dx.doi.org/10.21079/11681/41302.

Full text
Abstract:
Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for in silico predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p < 0.001, ANOVA) by 22–27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling.
APA, Harvard, Vancouver, ISO, and other styles
10

Hans Gougar. Use and Storage of Test and Operations Data from the High Temperature Test Reactor Acquired by the US Government from the Japan Atomic Energy Agency. Office of Scientific and Technical Information (OSTI), February 2010. http://dx.doi.org/10.2172/974765.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography