To see the other types of publications on this topic, follow the link: Petascale data.

Journal articles on the topic 'Petascale data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 journal articles for your research on the topic 'Petascale data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Bethel, E. W., C. Johnson, S. Ahern, J. Bell, P.-T. Bremer, H. Childs, E. Cormier-Michel, et al. "Occam's razor and petascale visual data analysis." Journal of Physics: Conference Series 180 (July 1, 2009): 012084. http://dx.doi.org/10.1088/1742-6596/180/1/012084.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Buren, G. Van, L. Didenko, J. Lauret, E. Oldag, and L. Ray. "Automated QA framework for PetaScale data challenges." Journal of Physics: Conference Series 331, no. 4 (December 23, 2011): 042026. http://dx.doi.org/10.1088/1742-6596/331/4/042026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Abbasi, Hasan, Matthew Wolf, Greg Eisenhauer, Scott Klasky, Karsten Schwan, and Fang Zheng. "DataStager: scalable data staging services for petascale applications." Cluster Computing 13, no. 3 (June 15, 2010): 277–90. http://dx.doi.org/10.1007/s10586-010-0135-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ahern, Sean. "Petascale visual data analysis in a production computing environment." Journal of Physics: Conference Series 78 (July 1, 2007): 012002. http://dx.doi.org/10.1088/1742-6596/78/1/012002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Saksena, Radhika S., Marco D. Mazzeo, Stefan J. Zasada, and Peter V. Coveney. "Petascale lattice-Boltzmann studies of amphiphilic cubic liquid crystalline materials in a globally distributed high-performance computing and visualization environment." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368, no. 1925 (August 28, 2010): 3983–99. http://dx.doi.org/10.1098/rsta.2010.0160.

Full text
Abstract:
We present very large-scale rheological studies of self-assembled cubic gyroid liquid crystalline phases in ternary mixtures of oil, water and amphiphilic species performed on petascale supercomputers using the lattice-Boltzmann method. These nanomaterials have found diverse applications in materials science and biotechnology, for example, in photovoltaic devices and protein crystallization. They are increasingly gaining importance as delivery vehicles for active agents in pharmaceuticals, personal care products and food technology. In many of these applications, the self-assembled structures are subject to flows of varying strengths and we endeavour to understand their rheological response with the objective of eventually predicting it under given flow conditions. Computationally, our lattice-Boltzmann simulations of ternary fluids are inherently memory- and data-intensive. Furthermore, our interest in dynamical processes necessitates remote visualization and analysis as well as the associated transfer and storage of terabytes of time-dependent data. These simulations are distributed on a high-performance grid infrastructure using the application hosting environment; we employ a novel parallel in situ visualization approach which is particularly suited for such computations on petascale resources. We present computational and I/O performance benchmarks of our application on three different petascale systems.
APA, Harvard, Vancouver, ISO, and other styles
6

Juric, Mario, and Tony Tyson. "LSST Data Management: Entering the Era of Petascale Optical Astronomy." Proceedings of the International Astronomical Union 10, H16 (August 2012): 675–76. http://dx.doi.org/10.1017/s174392131401285x.

Full text
Abstract:
AbstractThe Large Synoptic Survey Telescope (LSST; Ivezic et al.2008, http://lsst.org) is a planned, large-aperture, wide-field, ground-based telescope that will survey half the sky every few nights in six optical bands from 320 to 1050 nm. It will explore a wide range of astrophysical questions, ranging from discovering killer asteroids, to examining the nature of dark energy. LSST will produce on average 15 terabytes of data per night, yielding an (uncompressed) data set of 200 petabytes at the end of its 10-year mission. Dedicated HPC facilities (with a total of 320 TFLOPS at start, scaling up to 1.7 PFLOPS by the end) will process the image data in near real time, with full-dataset reprocessing on annual scale. The nature, quality, and volume of LSST data will be unprecedented, so the data system design requires petascale storage, terascale computing, and gigascale communications.
APA, Harvard, Vancouver, ISO, and other styles
7

Beyer, J., M. Hadwiger, A. Al-Awami, Won-Ki Jeong, N. Kasthuri, J. W. Lichtman, and H. Pfister. "Exploring the Connectome: Petascale Volume Visualization of Microscopy Data Streams." IEEE Computer Graphics and Applications 33, no. 4 (July 2013): 50–61. http://dx.doi.org/10.1109/mcg.2013.55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Huang, Huang, Li-Qian Zhou, YuTong Lu, Tong Xiao, Can Leng, Chuanying Li, and Zhe Quan. "An efficient real-time data collection framework on petascale systems." Neurocomputing 361 (October 2019): 100–109. http://dx.doi.org/10.1016/j.neucom.2019.06.039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Baranovski, A., K. Beattie, S. Bharathi, J. Boverhof, J. Bresnahan, A. Chervenak, I. Foster, et al. "Enabling petascale science: data management, troubleshooting, and scalable science services." Journal of Physics: Conference Series 125 (July 1, 2008): 012068. http://dx.doi.org/10.1088/1742-6596/125/1/012068.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kosar, Tevfik, Mehmet Balman, Esma Yildirim, Sivakumar Kulasekaran, and Brandon Ross. "Stork data scheduler: mitigating the data bottleneck in e-Science." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 369, no. 1949 (August 28, 2011): 3254–67. http://dx.doi.org/10.1098/rsta.2011.0148.

Full text
Abstract:
In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.
APA, Harvard, Vancouver, ISO, and other styles
11

Leung, A. W., M. Shao, T. Bisson, S. Pasupathy, and E. L. Miller. "High-performance metadata indexing and search in petascale data storage systems." Journal of Physics: Conference Series 125 (July 1, 2008): 012069. http://dx.doi.org/10.1088/1742-6596/125/1/012069.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Vohl, D., C. J. Fluke, and G. Vernardos. "Data compression in the petascale astronomy era: A GERLUMPH case study." Astronomy and Computing 12 (September 2015): 200–211. http://dx.doi.org/10.1016/j.ascom.2015.05.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Hassan, Amr, and Christopher J. Fluke. "Scientific Visualization in Astronomy: Towards the Petascale Astronomy Era." Publications of the Astronomical Society of Australia 28, no. 2 (2011): 150–70. http://dx.doi.org/10.1071/as10031.

Full text
Abstract:
AbstractAstronomy is entering a new era of discovery, coincident with the establishment of new facilities for observation and simulation that will routinely generate petabytes of data. While an increasing reliance on automated data analysis is anticipated, a critical role will remain for visualization-based knowledge discovery. We have investigated scientific visualization applications in astronomy through an examination of the literature published during the last two decades. We identify the two most active fields for progress — visualization of large-N particle data and spectral data cubes—discuss open areas of research, and introduce a mapping between astronomical sources of data and data representations used in general-purpose visualization tools. We discuss contributions using high-performance computing architectures (e.g. distributed processing and GPUs), collaborative astronomy visualization, the use of workflow systems to store metadata about visualization parameters, and the use of advanced interaction devices. We examine a number of issues that may be limiting the spread of scientific visualization research in astronomy and identify six grand challenges for scientific visualization research in the Petascale Astronomy Era.
APA, Harvard, Vancouver, ISO, and other styles
14

Schuchardt, K. L., B. J. Palmer, J. A. Daily, T. O. Elsethagen, and A. S. Koontz. "IO strategies and data services for petascale data sets from a global cloud resolving model." Journal of Physics: Conference Series 78 (July 1, 2007): 012089. http://dx.doi.org/10.1088/1742-6596/78/1/012089.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Williams, Dean N., Bryan N. Lawrence, Michael Lautenschlager, Don Middleton, and V. Balaji. "The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5." Proceedings of the Asia-Pacific Advanced Network 32 (December 13, 2011): 121. http://dx.doi.org/10.7125/apan.32.15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Sbalzarini, Ivo F. "Abstractions and Middleware for Petascale Computing and Beyond." International Journal of Distributed Systems and Technologies 1, no. 2 (April 2010): 40–56. http://dx.doi.org/10.4018/jdst.2010040103.

Full text
Abstract:
As high-performance computing moves to the petascale and beyond, a number of algorithmic and software challenges need to be addressed. This paper reviews the main performance-limiting factors in today’s high-performance computing software and outlines a possible new programming paradigm to address them. The proposed paradigm is based on abstract parallel data structures and operations that encapsulate much of the complexity of an application, but still make communication overhead explicit. The authors argue that all numerical simulations can be formulated in terms of the presented abstractions, which thus define an abstract semantic specification language for parallel numerical simulations. Simulations defined in this language can automatically be translated to source code containing the appropriate calls to a middleware that implements the underlying abstractions. Finally, the structure and functionality of such a middleware are outlined while demonstrating its feasibility on the example of the parallel particle-mesh library (PPM).
APA, Harvard, Vancouver, ISO, and other styles
17

Yu, Xiaoshan, Huaxi Gu, Kun Wang, and Shangqi Ma. "Petascale: A Scalable Buffer-Less All-Optical Network for Cloud Computing Data Center." IEEE Access 7 (2019): 42596–608. http://dx.doi.org/10.1109/access.2019.2905354.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Miyoshi, Takemasa, Guo-Yuan Lien, Shinsuke Satoh, Tomoo Ushio, Kotaro Bessho, Hirofumi Tomita, Seiya Nishizawa, et al. "“Big Data Assimilation” Toward Post-Petascale Severe Weather Prediction: An Overview and Progress." Proceedings of the IEEE 104, no. 11 (November 2016): 2155–79. http://dx.doi.org/10.1109/jproc.2016.2602560.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Hadwiger, M., J. Beyer, Won-Ki Jeong, and H. Pfister. "Interactive Volume Exploration of Petascale Microscopy Data Streams Using a Visualization-Driven Virtual Memory Approach." IEEE Transactions on Visualization and Computer Graphics 18, no. 12 (December 2012): 2285–94. http://dx.doi.org/10.1109/tvcg.2012.240.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Cicalese, Danilo, Grzegorz Jereczek, Fabrice Le Goff, Giovanna Lehmann Miotto, Jeremy Love, Maciej Maciejewski, Remigius K. Mommsen, Jakub Radtke, Jakub Schmiegel, and Malgorzata Szychowska. "The design of a distributed key-value store for petascale hot storage in data acquisition systems." EPJ Web of Conferences 214 (2019): 01014. http://dx.doi.org/10.1051/epjconf/201921401014.

Full text
Abstract:
Data acquisition systems for high energy physics experiments readout terabytes of data per second from a large number of electronic components. They are thus inherently distributed systems and require fast online data selection, otherwise requirements for permanent storage would be enormous. Still, incoming data need to be buffered while waiting for this selection to happen. Each minute of an experiment can produce hundreds of terabytes that cannot be lost before a selection decision is made. In this context, we present the design of DAQDB (Data Acquisition Database) — a distributed key-value store for high-bandwidth, generic data storage in event-driven systems. DAQDB offers not only high-capacity and low-latency buffer for fast data selection, but also opens a new approach in high-bandwidth data acquisition by decoupling the lifetime of the data analysis processes from the changing event rate due to the duty cycle of the data source. This is achievable by the option to extend its capacity even up to hundreds of petabytes to store hours of an experiment’s data. Our initial performance evaluation shows that DAQDB is a promising alternative to generic database solutions for the high luminosity upgrades of the LHC at CERN.
APA, Harvard, Vancouver, ISO, and other styles
21

Abed Abud, Adam, Danilo Cicalese, Grzegorz Jereczek, Fabrice Le Goff, Giovanna Lehmann Miotto, Jeremy Love, Maciej Maciejewski, et al. "Let’s get our hands dirty: a comprehensive evaluation of DAQDB, key-value store for petascale hot storage." EPJ Web of Conferences 245 (2020): 10004. http://dx.doi.org/10.1051/epjconf/202024510004.

Full text
Abstract:
Data acquisition systems are a key component for successful data taking in any experiment. The DAQ is a complex distributed computing system and coordinates all operations, from the data selection stage of interesting events to storage elements. For the High Luminosity upgrade of the Large Hadron Collider, the experiments at CERN need to meet challenging requirements to record data with a much higher occupancy in the detectors. The DAQ system will receive and deliver data with a significantly increased trigger rate, one million events per second, and capacity, terabytes of data per second. An effective way to meet these requirements is to decouple real-time data acquisition from event selection. Data fragments can be temporarily stored in a large distributed key-value store. Fragments belonging to the same event can be then queried on demand, by the data selection processes. Implementing such a model relies on a proper combination of emerging technologies, such as persistent memory, NVMe SSDs, scalable networking, and data structures, as well as high performance, scalable software. In this paper, we present DAQDB (Data Acquisition Database) — an open source implementation of this design that was presented earlier, with an extensive evaluation of this approach, from the single node to the distributed performance. Furthermore, we complement our study with a description of the challenges faced and the lessons learned while integrating DAQDB with the existing software framework of the ATLAS experiment.
APA, Harvard, Vancouver, ISO, and other styles
22

Davis, Andrew, Aleksander Dubas, and Ruben Otin. "Enabling validated exascale nuclear science." EPJ Web of Conferences 245 (2020): 09001. http://dx.doi.org/10.1051/epjconf/202024509001.

Full text
Abstract:
The field of fusion energy is about to enter the ITER era, for the first time we will have access to a device capable of producing 500 MW of fusion power, with plasmas lasting more than 300 seconds and with core temperatures in excess of 100-200 Million K. Engineering simulation for fusion, sits in an awkward position, a mixture of commercial and licensed tools are used, often with email driven transfer of data. In order to address the engineering simulation challenges of the future, the community must address simulation in a much more tightly coupled ecosystem, with a set of tools that can scale to take advantage of current petascale and upcoming exascale systems to address the design challenges of the ITER era.
APA, Harvard, Vancouver, ISO, and other styles
23

Mahadevan, Vijay S., Elia Merzari, Timothy Tautges, Rajeev Jain, Aleksandr Obabko, Michael Smith, and Paul Fischer. "High-resolution coupled physics solvers for analysing fine-scale nuclear reactor design problems." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372, no. 2021 (August 6, 2014): 20130381. http://dx.doi.org/10.1098/rsta.2013.0381.

Full text
Abstract:
An integrated multi-physics simulation capability for the design and analysis of current and future nuclear reactor models is being investigated, to tightly couple neutron transport and thermal-hydraulics physics under the SHARP framework. Over several years, high-fidelity, validated mono-physics solvers with proven scalability on petascale architectures have been developed independently. Based on a unified component-based architecture, these existing codes can be coupled with a mesh-data backplane and a flexible coupling-strategy-based driver suite to produce a viable tool for analysts. The goal of the SHARP framework is to perform fully resolved coupled physics analysis of a reactor on heterogeneous geometry, in order to reduce the overall numerical uncertainty while leveraging available computational resources. The coupling methodology and software interfaces of the framework are presented, along with verification studies on two representative fast sodium-cooled reactor demonstration problems to prove the usability of the SHARP framework.
APA, Harvard, Vancouver, ISO, and other styles
24

Halsey, Thomas C. "Computational sciences in the upstream oil and gas industry." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, no. 2078 (October 13, 2016): 20150429. http://dx.doi.org/10.1098/rsta.2015.0429.

Full text
Abstract:
The predominant technical challenge of the upstream oil and gas industry has always been the fundamental uncertainty of the subsurface from which it produces hydrocarbon fluids. The subsurface can be detected remotely by, for example, seismic waves, or it can be penetrated and studied in the extremely limited vicinity of wells. Inevitably, a great deal of uncertainty remains. Computational sciences have been a key avenue to reduce and manage this uncertainty. In this review, we discuss at a relatively non-technical level the current state of three applications of computational sciences in the industry. The first of these is seismic imaging, which is currently being revolutionized by the emergence of full wavefield inversion, enabled by algorithmic advances and petascale computing. The second is reservoir simulation, also being advanced through the use of modern highly parallel computing architectures. Finally, we comment on the role of data analytics in the upstream industry. This article is part of the themed issue ‘Energy and the subsurface’.
APA, Harvard, Vancouver, ISO, and other styles
25

Tayeb, Shahab, Neha Raste, Matin Pirouz, and Shahram Latifi. "A Cognitive Framework to Secure Smart Cities." MATEC Web of Conferences 208 (2018): 05001. http://dx.doi.org/10.1051/matecconf/201820805001.

Full text
Abstract:
The advancement in technology has transformed Cyber Physical Systems and their interface with IoT into a more sophisticated and challenging paradigm. As a result, vulnerabilities and potential attacks manifest themselves considerably more than before, forcing researchers to rethink the conventional strategies that are currently in place to secure such physical systems. This manuscript studies the complex interweaving of sensor networks and physical systems and suggests a foundational innovation in the field. In sharp contrast with the existing IDS and IPS solutions, in this paper, a preventive and proactive method is employed to stay ahead of attacks by constantly monitoring network data patterns and identifying threats that are imminent. Here, by capitalizing on the significant progress in processing power (e.g. petascale computing) and storage capacity of computer systems, we propose a deep learning approach to predict and identify various security breaches that are about to occur. The learning process takes place by collecting a large number of files of different types and running tests on them to classify them as benign or malicious. The prediction model obtained as such can then be used to identify attacks. Our project articulates a new framework for interactions between physical systems and sensor networks, where malicious packets are repeatedly learned over time while the system continually operates with respect to imperfect security mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
26

Davini, Paolo, Jost von Hardenberg, Susanna Corti, Hannah M. Christensen, Stephan Juricke, Aneesh Subramanian, Peter A. G. Watson, Antje Weisheimer, and Tim N. Palmer. "Climate SPHINX: evaluating the impact of resolution and stochastic physics parameterisations in the EC-Earth global climate model." Geoscientific Model Development 10, no. 3 (March 31, 2017): 1383–402. http://dx.doi.org/10.5194/gmd-10-1383-2017.

Full text
Abstract:
Abstract. The Climate SPHINX (Stochastic Physics HIgh resolutioN eXperiments) project is a comprehensive set of ensemble simulations aimed at evaluating the sensitivity of present and future climate to model resolution and stochastic parameterisation. The EC-Earth Earth system model is used to explore the impact of stochastic physics in a large ensemble of 30-year climate integrations at five different atmospheric horizontal resolutions (from 125 up to 16 km). The project includes more than 120 simulations in both a historical scenario (1979–2008) and a climate change projection (2039–2068), together with coupled transient runs (1850–2100). A total of 20.4 million core hours have been used, made available from a single year grant from PRACE (the Partnership for Advanced Computing in Europe), and close to 1.5 PB of output data have been produced on SuperMUC IBM Petascale System at the Leibniz Supercomputing Centre (LRZ) in Garching, Germany. About 140 TB of post-processed data are stored on the CINECA supercomputing centre archives and are freely accessible to the community thanks to an EUDAT data pilot project. This paper presents the technical and scientific set-up of the experiments, including the details on the forcing used for the simulations performed, defining the SPHINX v1.0 protocol. In addition, an overview of preliminary results is given. An improvement in the simulation of Euro-Atlantic atmospheric blocking following resolution increase is observed. It is also shown that including stochastic parameterisation in the low-resolution runs helps to improve some aspects of the tropical climate – specifically the Madden–Julian Oscillation and the tropical rainfall variability. These findings show the importance of representing the impact of small-scale processes on the large-scale climate variability either explicitly (with high-resolution simulations) or stochastically (in low-resolution simulations).
APA, Harvard, Vancouver, ISO, and other styles
27

Gasper, F., K. Goergen, P. Shrestha, M. Sulis, J. Rihani, M. Geimer, and S. Kollet. "Implementation and scaling of the fully coupled Terrestrial Systems Modeling Platform (TerrSysMP v1.0) in a massively parallel supercomputing environment – a case study on JUQUEEN (IBM Blue Gene/Q)." Geoscientific Model Development 7, no. 5 (October 29, 2014): 2531–43. http://dx.doi.org/10.5194/gmd-7-2531-2014.

Full text
Abstract:
Abstract. Continental-scale hyper-resolution simulations constitute a grand challenge in characterizing nonlinear feedbacks of states and fluxes of the coupled water, energy, and biogeochemical cycles of terrestrial systems. Tackling this challenge requires advanced coupling and supercomputing technologies for earth system models that are discussed in this study, utilizing the example of the implementation of the newly developed Terrestrial Systems Modeling Platform (TerrSysMP v1.0) on JUQUEEN (IBM Blue Gene/Q) of the Jülich Supercomputing Centre, Germany. The applied coupling strategies rely on the Multiple Program Multiple Data (MPMD) paradigm using the OASIS suite of external couplers, and require memory and load balancing considerations in the exchange of the coupling fields between different component models and the allocation of computational resources, respectively. Using the advanced profiling and tracing tool Scalasca to determine an optimum load balancing leads to a 19% speedup. In massively parallel supercomputer environments, the coupler OASIS-MCT is recommended, which resolves memory limitations that may be significant in case of very large computational domains and exchange fields as they occur in these specific test cases and in many applications in terrestrial research. However, model I/O and initialization in the petascale range still require major attention, as they constitute true big data challenges in light of future exascale computing resources. Based on a factor-two speedup due to compiler optimizations, a refactored coupling interface using OASIS-MCT and an optimum load balancing, the problem size in a weak scaling study can be increased by a factor of 64 from 512 to 32 768 processes while maintaining parallel efficiencies above 80% for the component models.
APA, Harvard, Vancouver, ISO, and other styles
28

Nowicki, Marek, Łukasz Górski, and Piotr Bała. "PCJ Java library as a solution to integrate HPC, Big Data and Artificial Intelligence workloads." Journal of Big Data 8, no. 1 (April 26, 2021). http://dx.doi.org/10.1186/s40537-021-00454-6.

Full text
Abstract:
AbstractWith the development of peta- and exascale size computational systems there is growing interest in running Big Data and Artificial Intelligence (AI) applications on them. Big Data and AI applications are implemented in Java, Scala, Python and other languages that are not widely used in High-Performance Computing (HPC) which is still dominated by C and Fortran. Moreover, they are based on dedicated environments such as Hadoop or Spark which are difficult to integrate with the traditional HPC management systems. We have developed the Parallel Computing in Java (PCJ) library, a tool for scalable high-performance computing and Big Data processing in Java. In this paper, we present the basic functionality of the PCJ library with examples of highly scalable applications running on the large resources. The performance results are presented for different classes of applications including traditional computational intensive (HPC) workloads (e.g. stencil), as well as communication-intensive algorithms such as Fast Fourier Transform (FFT). We present implementation details and performance results for Big Data type processing running on petascale size systems. The examples of large scale AI workloads parallelized using PCJ are presented.
APA, Harvard, Vancouver, ISO, and other styles
29

Groen, D., H. Arabnejad, V. Jancauskas, W. N. Edeling, F. Jansson, R. A. Richardson, J. Lakhlili, et al. "VECMAtk: a scalable verification, validation and uncertainty quantification toolkit for scientific simulations." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 379, no. 2197 (March 29, 2021). http://dx.doi.org/10.1098/rsta.2020.0221.

Full text
Abstract:
We present the VECMA toolkit (VECMAtk), a flexible software environment for single and multiscale simulations that introduces directly applicable and reusable procedures for verification, validation (V&V), sensitivity analysis (SA) and uncertainty quantication (UQ). It enables users to verify key aspects of their applications, systematically compare and validate the simulation outputs against observational or benchmark data, and run simulations conveniently on any platform from the desktop to current multi-petascale computers. In this sequel to our paper on VECMAtk which we presented last year [ 1 ] we focus on a range of functional and performance improvements that we have introduced, cover newly introduced components, and applications examples from seven different domains such as conflict modelling and environmental sciences. We also present several implemented patterns for UQ/SA and V&V, and guide the reader through one example concerning COVID-19 modelling in detail. This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’.
APA, Harvard, Vancouver, ISO, and other styles
30

Ogle, Cameron, David Reddick, Coleman McKnight, Tyler Biggs, Rini Pauly, Stephen P. Ficklin, F. Alex Feltus, and Susmit Shannigrahi. "Named Data Networking for Genomics Data Management and Integrated Workflows." Frontiers in Big Data 4 (February 15, 2021). http://dx.doi.org/10.3389/fdata.2021.582468.

Full text
Abstract:
Advanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data. Given the geometric growth of this and other genomics repositories, an exabyte of mineable biological data is imminent. The challenges of effectively utilizing these datasets are enormous as they are not only large in the size but also stored in geographically distributed repositories in various repositories such as National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), European Bioinformatics Institute (EBI), and NASA’s GeneLab. In this work, we first systematically point out the data-management challenges of the genomics community. We then introduce Named Data Networking (NDN), a novel but well-researched Internet architecture, is capable of solving these challenges at the network layer. NDN performs all operations such as forwarding requests to data sources, content discovery, access, and retrieval using content names (that are similar to traditional filenames or filepaths) and eliminates the need for a location layer (the IP address) for data management. Utilizing NDN for genomics workflows simplifies data discovery, speeds up data retrieval using in-network caching of popular datasets, and allows the community to create infrastructure that supports operations such as creating federation of content repositories, retrieval from multiple sources, remote data subsetting, and others. Named based operations also streamlines deployment and integration of workflows with various cloud platforms. Our contributions in this work are as follows 1) we enumerate the cyberinfrastructure challenges of the genomics community that NDN can alleviate, and 2) we describe our efforts in applying NDN for a contemporary genomics workflow (GEMmaker) and quantify the improvements. The preliminary evaluation shows a sixfold speed up in data insertion into the workflow. 3) As a pilot, we have used an NDN naming scheme (agreed upon by the community and discussed in Section 4) to publish data from broadly used data repositories including the NCBI SRA. We have loaded the NDN testbed with these pre-processed genomes that can be accessed over NDN and used by anyone interested in those datasets. Finally, we discuss our continued effort in integrating NDN with cloud computing platforms, such as the Pacific Research Platform (PRP). The reader should note that the goal of this paper is to introduce NDN to the genomics community and discuss NDN’s properties that can benefit the genomics community. We do not present an extensive performance evaluation of NDN—we are working on extending and evaluating our pilot deployment and will present systematic results in a future work.
APA, Harvard, Vancouver, ISO, and other styles
31

Baumann, Peter, Dimitar Misev, Vlad Merticariu, and Bang Pham Huu. "Array databases: concepts, standards, implementations." Journal of Big Data 8, no. 1 (February 2, 2021). http://dx.doi.org/10.1186/s40537-020-00399-2.

Full text
Abstract:
AbstractMulti-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly in silo solutions, with architectures that tend to erode and not keep up with the increasing requirements on performance and service quality. Array Database systems attempt to close this gap by providing declarative query support for flexible ad-hoc analytics on large n-D arrays, similar to what SQL offers on set-oriented data, XQuery on hierarchical data, and SPARQL and CIPHER on graph data. Today, Petascale Array Database installations exist, employing massive parallelism and distributed processing. Hence, questions arise about technology and standards available, usability, and overall maturity. Several papers have compared models and formalisms, and benchmarks have been undertaken as well, typically comparing two systems against each other. While each of these represent valuable research to the best of our knowledge there is no comprehensive survey combining model, query language, architecture, and practical usability, and performance aspects. The size of this comparison differentiates our study as well with 19 systems compared, four benchmarked to an extent and depth clearly exceeding previous papers in the field; for example, subsetting tests were designed in a way that systems cannot be tuned to specifically these queries. It is hoped that this gives a representative overview to all who want to immerse into the field as well as a clear guidance to those who need to choose the best suited datacube tool for their application. This article presents results of the Research Data Alliance (RDA) Array Database Assessment Working Group (ADA:WG), a subgroup of the Big Data Interest Group. It has elicited the state of the art in Array Databases, technically supported by IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and engineers benefit from Array Database technology? As it turns out, Array Databases can offer significant advantages in terms of flexibility, functionality, extensibility, as well as performance and scalability—in total, the database approach of offering “datacubes” analysis-ready heralds a new level of service quality. Investigation shows that there is a lively ecosystem of technology with increasing uptake, and proven array analytics standards are in place. Consequently, such approaches have to be considered a serious option for datacube services in science, engineering and beyond. Tools, though, vary greatly in functionality and performance as it turns out.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography