Log in

Relevant bibliographies by topics / Massively Parallel Processing (MPP) / Journal articles

To see the other types of publications on this topic, follow the link: Massively Parallel Processing (MPP).

Journal articles on the topic 'Massively Parallel Processing (MPP)'

Author: Grafiati

Published: 5 June 2025

Last updated: 25 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Massively Parallel Processing (MPP).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Willson, Ian A. "The Evolution of the Massively Parallel Processing Database in Support of Visual Analytics." Information Resources Management Journal 24, no. 4 (2011): 1–26. http://dx.doi.org/10.4018/irmj.2011100101.

Full text

Abstract:

This article explores the evolution of the Massively Parallel Processing (MPP) database, focusing on trends of particular relevance to analytics. The dramatic shift of database vendors and leading companies to utilize MPP databases and deploy an Enterprise Data Warehouse (EDW) is presented. The inherent benefits of fresher data, storage efficiency, and most importantly accessibility to analytics are explored. Published industry and vendor metrics are examined that demonstrate substantial and growing cost efficiencies from utilizing MPP databases. The author concludes by reviewing trends toward parallelizing decision support workload into the database, ranging from within database transformations to new statistical and spatial analytic capabilities provided by parallelizing these algorithms to execute directly within the MPP database. These new capabilities present an opportunity for timely and powerful enterprise analytics, providing a substantial competitive advantage to those companies able to leverage this technology to turn data into actionable information, gain valuable new insights, and automate operational decision making.

APA, Harvard, Vancouver, ISO, and other styles

2

Ji, Yunhong, Yunpeng Chai, Xuan Zhou, Lipeng Ren, and Yajie Qin. "Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases." Data Science and Engineering 5, no. 1 (2019): 65–79. http://dx.doi.org/10.1007/s41019-019-00114-z.

Full text

Abstract:

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.

APA, Harvard, Vancouver, ISO, and other styles

3

RACCA, R. G., Z. MENG, J. M. OZARD, and M. J. WILMUT. "EVALUATION OF MASSIVELY PARALLEL COMPUTING FOR EXHAUSTIVE AND CLUSTERED MATCHED-FIELD PROCESSING." Journal of Computational Acoustics 04, no. 02 (1996): 159–73. http://dx.doi.org/10.1142/s0218396x96000039.

Full text

Abstract:

Many computer algorithms contain an operation that accounts for a substantial portion of the total execution cost in a frequently executed loop. The use of a parallel computer to execute that operation may represent an alternative to a sheer increase in processor speed. The signal processing technique known as matched-field processing (MFP) involves performing identical and independent operations on a potentially huge set of vectors. To investigate a massively parallel approach to MFP and clustered nearest neighbors MFP, algorithms were implemented on a DECmpp 12000 massively parallel computer (from Digital Equipment and MasPar Corporation) with 8192 processors. The execution time for the MFP technique on the MasPar machine was compared with that of MFP on a serial VAX9000–210 equipped with a vector processor. The results showed that the MasPar achieved a speedup factor of at least 17 relative to the VAX9000. The speedup was 3.5 times higher than the ratio of the peak ratings of 600 MFLOPS for the MasPar versus 125 MFLOPS for the VAX9000 with vector processor. The execution speed on the parallel machine represented 64% of its peak rating. This is much better than what is commonly assumed for a parallel machine and was obtained with modest programming effort. An initial implementation of a massively parallel approach to clustered MFP on the MasPar showed a further order of magnitude increase in speed, for an overall speedup factor of 35.

APA, Harvard, Vancouver, ISO, and other styles

4

Manea, A. M., and T. Almani. "Scalable Graphics Processing Unit–Based Multiscale Linear Solvers for Reservoir Simulation." SPE Journal 27, no. 01 (2021): 643–62. http://dx.doi.org/10.2118/203939-pa.

Full text

Abstract:

Summary In this work, the scalability of two key multiscale solvers for the pressure equation arising from incompressible flow in heterogeneous porous media, namely, the multiscale finite volume (MSFV) solver, and the restriction-smoothed basis multiscale (MsRSB) solver, are investigated on the graphics processing unit (GPU) massively parallel architecture. The robustness and scalability of both solvers are compared against their corresponding carefully optimized implementation on the shared-memory multicore architecture in a structured problem setting. Although several components in MSFV and MsRSB algorithms are directly parallelizable, their scalability on the GPU architecture depends heavily on the underlying algorithmic details and data-structure design of every step, where one needs to ensure favorable control and data flow on the GPU, while extracting enough parallel work for a massively parallel environment. In addition, the type of algorithm chosen for each step greatly influences the overall robustness of the solver. Thus, we extend the work on the parallel multiscale methods of Manea et al. (2016) to map the MSFV and MsRSB special kernels to the massively parallel GPU architecture. The scalability of our optimized parallel MSFV and MsRSB GPU implementations are demonstrated using highly heterogeneous structured 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. For both solvers, the multicore implementations are benchmarked on a shared-memory multicore architecture consisting of two packages of Intel® Cascade Lake Xeon Gold 6246 central processing unit (CPU), whereas the GPU implementations are benchmarked on a massively parallel architecture consisting of NVIDIA Volta V100 GPUs. We compare the multicore implementations to the GPU implementations for both the setup and solution stages. Finally, we compare the parallel MsRSB scalability to the scalability of MSFV on the multicore (Manea et al. 2016) and GPU architectures. To the best of our knowledge, this is the first parallel implementation and demonstration of these versatile multiscale solvers on the GPU architecture. NOTE: This paper is also published as part of the 2021 SPE Reservoir Simulation Conference Special Issue.

APA, Harvard, Vancouver, ISO, and other styles

5

Gall, R., F. Tabaddor, D. Robbins, P. Majors, W. Sheperd, and S. Johnson. "Some Notes on the Finite Element Analysis of Tires." Tire Science and Technology 23, no. 3 (1995): 175–88. http://dx.doi.org/10.2346/1.2137503.

Full text

Abstract:

Abstract Over the past ten years the Finite Element Analysis (FEA) has been increasingly integrated into the tire design process. The FEA has been used to study the general tire behavior, to perform parameter studies, and to do comparative analyses. To decrease the tire development cycle, the FEA is now being used as a replacement for certain tire tests. This requires the accuracy of the FEA results to be within those test limits. This paper investigates some of the known modeling techniques and their impact on accuracy. Some of the issues are the use of shell elements, assumptions for boundary conditions, and global/local analysis approaches. Finally, the use of new generation supercomputers, massively parallel processing systems (MPP), is discussed.

APA, Harvard, Vancouver, ISO, and other styles

6

Chen, Hongzhi, Changji Li, Chenguang Zheng, et al. "G-tran." Proceedings of the VLDB Endowment 15, no. 11 (2022): 2545–58. http://dx.doi.org/10.14778/3551793.3551813.

Full text

Abstract:

Graph transaction processing poses unique challenges such as random data access due to the irregularity of graph structures, low throughput and high abort rate due to the relatively large read/write sets in graph transactions. To address these challenges, we present G-Tran, a remote direct memory access (RDMA)-enabled distributed in-memory graph database with serializable and snapshot isolation support. First, we propose a graph-native data store to achieve good data locality and fast data access for transactional updates and queries. Second, G-Tran adopts a fully decentralized architecture that leverages RDMA to process distributed transactions with the massively parallel processing (MPP) model, which can achieve high performance by utilizing all computing resources. In addition, we propose a new multi-version optimistic concurrency control (MV-OCC) protocol with two optimizations to address the issue of large read/write sets in graph transactions. Extensive experiments show that G-Tran achieves competitive performance compared with other popular graph databases on benchmark workloads.

APA, Harvard, Vancouver, ISO, and other styles

7

Zeng, Feng Sheng. "Research and Improvement of Database Storage Method." Applied Mechanics and Materials 608-609 (October 2014): 641–45. http://dx.doi.org/10.4028/www.scientific.net/amm.608-609.641.

Full text

Abstract:

This paper presents a massive data storage and parallel processing method based on MPP architecture, and put forward full persistent data storage way from the client to request, and the integration the idea of Map/Reduce, the system will be distributed to each data node, the data has high scalability, high availability, high concurrency. And the simulation test and verifies the feasibility of mass data storage mode by building a distributed data node.

APA, Harvard, Vancouver, ISO, and other styles

8

Researcher. "DATA WAREHOUSING WITH AMAZON REDSHIFT: REVOLUTIONIZING BIG DATA ANALYTICS." International Journal of Computer Engineering and Technology (IJCET) 15, no. 4 (2024): 395–405. https://doi.org/10.5281/zenodo.13270530.

Full text

Abstract:

The article talks about Amazon Redshift, a cutting-edge cloud-based data warehouse that is changing the way big data analytics is done. In it, the architecture, main features, and benefits of Redshift are discussed in detail. Columnar storage, massively parallel processing, and a distributed system design are emphasized. The article discusses how business intelligence, data science, operational analytics, customer analytics, and financial analytics are used in the real world. It also compares and contrasts with other cloud data stores, such as Snowflake and Google BigQuery, pointing out their pros and cons. The story goes into great detail about how Amazon Redshift helps businesses use the power of their data on a large scale, which leads to new ideas and a competitive edge in today's data-driven business world.

APA, Harvard, Vancouver, ISO, and other styles

9

Mitchell, Rory, Eibe Frank, and Geoffrey Holmes. "GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles." PeerJ Computer Science 8 (April 5, 2022): e880. http://dx.doi.org/10.7717/peerj-cs.880.

Full text

Abstract:

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.

APA, Harvard, Vancouver, ISO, and other styles

10

Maya Ömərova, Taleh Əsgərov, Maya Ömərova, Taleh Əsgərov. "BÖYÜK HƏCMLI VERİLƏNLƏRİN EMALI ÜÇÜN TƏTBIQLƏR." PAHTEI-Procedings of Azerbaijan High Technical Educational Institutions 36, no. 01 (2024): 204–10. http://dx.doi.org/10.36962/pahtei36012024-204.

Full text

Abstract:

Müasir dövrdə verilənlərin həcmi və müxtəlifliyliyi sürətlə artmaqdadır. Böyük həcmli verilənlər (big data) və onların emalı günümüzün informasiya texnologiyaları sahəsində ən mühüm problemlərdən biridir. Bu problemin həll edilməsi, digər problemlərin həllinə kömək edir. Çünki, günümüzün informasiya dövründə, bir çox şirkətlər, müəssisələr və hətta hökumətlər, böyük həcmli verilənlərlə işləyir. Bu verilənlər, müxtəlif istiqamətli məlumatları və informasiyaları əhatə edir və onların müxtəlif formaları, kompleksliliyi alqoritmlərin işlənməsini çətinləşdirir. Böyük həcmli verilənlər, yəni,Big data müxtəlif mənbələrdən əldə edilir. Hər gün müasir sistemlər və Əşyaların İnterneti (Internet of Things, İoT) kimi rəqəmsal texnologiyalar vasitəsilə terabaytlarla ifadə olunan böyük verilənlər anbarı yaranır. Bəzən bir gündə 2.5 eksabayt həcmində məlumatlar ortaya çıxır. Belə dövrdə məlumat analitikasını mövcud texnikalarla aparmaq çətinlik törədir. Məlumatların böyük həcmi onların ölçülməsi və genişləndirilməsi məsələsini də ortaya qoyur.Böyük verilənlərin analitikası sahəsində kritik ölçüləri başa düşmək və effektiv şəkildə idarə etmək üçün 5V adlanan xüsusiyyətlər qeyd edilmişdir. Böyük həcmli verilənləri emal etmək üçün ilk olaraq Hadoop ekosistemini və ona daxil olan Kafka tədbiqini, Hadoop tədbiqinin proqramlaşdırma modeli olan Map Reduced texnologiyasını araşdırmaq lazımdır. Daha sonra, Apache Spark ,Mongo DB, Elastich Search, Hive, Hcatalog, Hbase, MPP (Massively Parallel Processing), PIG, Mahout, NoSQL və Cassandra kimi paylanmış fayl sistemləri ilə işləyən tədbiqlər araşdırılmışdır. Ən məşhur böyük məlumat texnologiyalarından biri kimi tanınan Hadoop, onun əsas komponentləri, HDFS (Paylanmış fayl sistemi) xidmətinin əsas və köməkçi qovşaqları haqqında araşdırma aparılmışdır. Açar sözlər: Big data, 5V, analitika, Hadoop, Map Reduced, Apache Spark

APA, Harvard, Vancouver, ISO, and other styles

11

Xu, Tao, Ge Fu, Huai Yuan Tan, Hong Zhang, and Xin Ran Liu. "Structured Big Data Management System Supported Cross-Domain Query." Applied Mechanics and Materials 631-632 (September 2014): 1033–38. http://dx.doi.org/10.4028/www.scientific.net/amm.631-632.1033.

Full text

Abstract:

We design a structured big data management system which can deal with large-scale structured datasets and supports the cross-domain collaborative query. The system employs the HDFS as the storage layer. And it realizes a scheduling engine in reference with the splitting technology of the massive parallel processing (MPP) database. Using this engine, tasks can be split and distributed to different sub-nodes for parallel execution. Through cross-domain querymodule, users can execute SQL commands on the datasets of different datacenters or network domains. Meanwhile, the system supports the distributed deployment, so as to reduce the construction cost by making full use of existing software and hardware resources and equipments. We test the system functions and performance on a 80 nodes cluster, and compares with Hive. The result suggested that system performance is improved by 2-3 times than Hive and the function designed can be performed correctly.

APA, Harvard, Vancouver, ISO, and other styles

12

Krasnosky, Kristopher, and Christopher Roman. "A Massively Parallel Implementation of Gaussian Process Regression for Real Time Bathymetric Modeling and Simultaneous Localization and Mapping." Field Robotics 2, no. 1 (2022): 940–70. http://dx.doi.org/10.55417/fr.2022031.

Full text

Abstract:

A Gaussian process regression (GPR) can be used as a stochastic method for modeling underwater terrain using multibeam sonar data. A GPR model can improve the effective resolution of a terrain model over traditional gridding methods and quantify uncertainty with an estimate of model variance over its entire domain. However, GPR solutions are extremely computationally expensive and generally reserved for post-processing applications. To make GPR viable for real-time applications, we developed massively parallel GPR (MP-GPR) to run on a graphical processing unit (GPU). MP-GPR is first used to process real-time multibeam data when assuming accurate navigation from a high precision position, heading, and attitude source. In underwater environments, however, we are denied the luxury of high precision position sensors and typically rely on dead reckoning. Therefore, MP-GPR was used as a terrain model for a featureless, Rao-Blackwellized particle filter based, bathymetric particle-filter simultaneous localization and mapping (BPSLAM) algorithm. Our GPU-based extension of BPSLAM (GP-BPSLAM) estimates many possible vehicle trajectories and MP-GPR predicts a possible map for each. By comparing the recent multibeam observations against the model for each possible trajectory, unlikely trajectories can be identified and removed. GP-BPSLAM is able to process data in real time and generate a navigation solution that is more accurate than simple dead reckoning.

APA, Harvard, Vancouver, ISO, and other styles

13

Gebbert, Sören, Thomas Leppelt, and Edzer Pebesma. "A Topology Based Spatio-Temporal Map Algebra for Big Data Analysis." Data 4, no. 2 (2019): 86. http://dx.doi.org/10.3390/data4020086.

Full text

Abstract:

Continental and global datasets based on earth observations or computational models challenge the existing map algebra approaches. The available datasets differ in their spatio-temporal extents and their spatio-temporal granularity, which makes it difficult to process them as time series data in map algebra expressions. To address this issue we introduce a new map algebra approach that is topology based. This topology based map algebra uses spatio-temporal topological operators (STTOP and STTCOP) to specify spatio-temporal operations between topological related map layers of different time-series data. We have implemented several topology based map algebra tools in the open source geoinformation system GRASS GIS and its open source cloud processing engine actinia. We demonstrate the application of our topology based map algebra by solving real world big data problems using a single algebraic expression. This included the massively parallel computation of the NDVI from a series of 100 Sentinel2A scenes organized as earth observation data cubes. The processing was performed and benchmarked on a many core computer setup and in a distributed container environment. The design of our topology based map algebra allows us to deploy it as a standardized service in the EU Horizon 2020 project openEO.

APA, Harvard, Vancouver, ISO, and other styles

14

Dinh, Van Quang, and Yves Marechal. "GPU-based parallelization for bubble mesh generation." COMPEL - The international journal for computation and mathematics in electrical and electronic engineering 36, no. 4 (2017): 1184–97. http://dx.doi.org/10.1108/compel-11-2016-0476.

Full text

Abstract:

Purpose In FEM computations, the mesh quality improves the accuracy of the approximation solution and reduces the computation time. The dynamic bubble system meshing technique can provide high-quality meshes, but the packing process is time-consuming. This paper aims to improve the running time of the bubble meshing by using the advantages of parallel computing on graphics processing unit (GPU). Design/methodology/approach This paper is based on the analysis of the processing time on CPU. A massively parallel computing-based CUDA architecture is proposed to improve the bubble displacement and database updating. Constraints linked to hardware considerations are taken into account. Finally, speedup factors are provided on test cases and real scale examples. Findings The numerical experiences show the efficiency of parallel performance reaches a speedup of 35 compared to the serial implementation. Research limitations/implications This contribution is so far limited to two-dimensional (2D) geometries although the extension to three-dimension (3D) is straightforward regarding the meshing technique itself and the GPU implementation. The authors’ works are based on a CUDA environment which is widely used by developers. C\C++ and Java were the programming languages used. Other languages may of course lead to slightly different implementations. Practical implications This approach makes it possible to use bubble meshing technique for both initial design and optimization, as excellent meshes can be built in few seconds. Originality/value Compared to previous works, this contribution shows that the scalability of the bubble meshing technique needs to solve two key issues: reach a T(N) global cost of the implementation and reach a very fast size map interpolation strategy.

APA, Harvard, Vancouver, ISO, and other styles

15

Metropolis, N. "Massively parallel processing." Journal of Scientific Computing 1, no. 2 (1986): 115–16. http://dx.doi.org/10.1007/bf01061388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Hasegawa, Satoshi, Haruyasu Ito, Haruyoshi Toyoda, and Yoshio Hayasaki. "Massively parallel femtosecond laser processing." Optics Express 24, no. 16 (2016): 18513. http://dx.doi.org/10.1364/oe.24.018513.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Patel, R. R., S. W. Bond, M. D. Pocha, et al. "Multiwavelength parallel optical interconnects for massively parallel processing." IEEE Journal of Selected Topics in Quantum Electronics 9, no. 2 (2003): 657–66. http://dx.doi.org/10.1109/jstqe.2003.813313.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Armstrong, Marc P., and Richard Marciano. "Massively parallel processing of spatial statistics." International journal of geographical information systems 9, no. 2 (1995): 169–89. http://dx.doi.org/10.1080/02693799508902032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Balz, Timo, Lu Zhang, and Mingsheng Liao. "Direct stereo radargrammetric processing using massively parallel processing." ISPRS Journal of Photogrammetry and Remote Sensing 79 (May 2013): 137–46. http://dx.doi.org/10.1016/j.isprsjprs.2013.02.014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Zeki, Semir. "A massively asynchronous, parallel brain." Philosophical Transactions of the Royal Society B: Biological Sciences 370, no. 1668 (2015): 20140174. http://dx.doi.org/10.1098/rstb.2014.0174.

Full text

Abstract:

Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain.

APA, Harvard, Vancouver, ISO, and other styles

21

Shin, Woosuk, Kwan-Hee Yoo, and Nakhoon Baek. "Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations." Applied Sciences 10, no. 5 (2020): 1656. http://dx.doi.org/10.3390/app10051656.

Full text

Abstract:

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications.

APA, Harvard, Vancouver, ISO, and other styles

22

Goel, U. C., and R. C. Joshi. "A Massively Parallel Processing Computer for Satellite Image Processing." IETE Journal of Education 27, no. 3 (1986): 112–20. http://dx.doi.org/10.1080/09747338.1986.11436113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Heemink, Arnold W. "Massively parallel processing in computational fluid dynamics." Simulation Practice and Theory 3, no. 4-5 (1995): iii—iv. http://dx.doi.org/10.1016/0928-4869(96)80091-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Balasubramanian, Vijay, and Prithviraj Banerjee. "A fault tolerant massively parallel processing architecture." Journal of Parallel and Distributed Computing 4, no. 4 (1987): 363–83. http://dx.doi.org/10.1016/0743-7315(87)90025-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

O'keefe, Matthew, Terence Parr, B. Kevin Edgar, Steve Anderson, Paul Woodward, and Hank Dietz. "The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors." Scientific Programming 4, no. 1 (1995): 1–21. http://dx.doi.org/10.1155/1995/278064.

Full text

Abstract:

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

APA, Harvard, Vancouver, ISO, and other styles

26

Ishikawa, Masatoshi. "High Speed Massively Parallel Processing Vision and Applications." Review of Laser Engineering 24, Supplement (1996): 302–5. http://dx.doi.org/10.2184/lsj.24.supplement_302.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Rogers, Jonathan, and Nathan Slegers. "Robust Parafoil Terminal Guidance Using Massively Parallel Processing." Journal of Guidance, Control, and Dynamics 36, no. 5 (2013): 1336–45. http://dx.doi.org/10.2514/1.59782.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Golov, Nikolay, and Lars Rönnbäck. "Big Data normalization for massively parallel processing databases." Computer Standards & Interfaces 54 (November 2017): 86–93. http://dx.doi.org/10.1016/j.csi.2017.01.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Kawai, Shigeru, and Masanori Mizoguchi. "Two-Dimensional Optical Buses for Massively Parallel Processing." Japanese Journal of Applied Physics 31, Part 1, No. 5B (1992): 1663–65. http://dx.doi.org/10.1143/jjap.31.1663.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Gómez-Valdés, José, and Dong-Ping Wang. "Massively parallel processing in coastal ocean circulation model." Journal of Scientific Computing 10, no. 3 (1995): 305–23. http://dx.doi.org/10.1007/bf02091778.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Kadry, Seifedine, and Khaled Smaili. "Massively Parallel Processing Distributed Database for Business Intelligence." Information Technology Journal 7, no. 1 (2007): 70–76. http://dx.doi.org/10.3923/itj.2008.70.76.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

ABRAHAM, RALPH H., JOHN B. CORLISS, and JOHN E. DORBAND. "ORDER AND CHAOS IN THE TORAL LOGISTIC LATTICE." International Journal of Bifurcation and Chaos 01, no. 01 (1991): 227–34. http://dx.doi.org/10.1142/s0218127491000154.

Full text

Abstract:

Cellular dynamical systems, alias lattice dynamical systems, emerged as a new mathematical structure and modeling strategy in the 1980s. Based, like cellular automata, on finite difference methods for partial differential equations, they provide challenging patterns of spatiotemporal organization, in which chaos and order cooperate in novel ways. Here we present initial findings of our exploration of a two-dimensional logistic lattice with the Massively Parallel Processor (MPP) at NASA's Goddard Space Flight Center, a machine capable of 200 megaflops per second. A video tape illustrating these findings is available.

APA, Harvard, Vancouver, ISO, and other styles

33

Pase, Douglas M., Tom MacDonald, and Andrew Meltzer. "The CRAFT Fortran Programming Model." Scientific Programming 3, no. 3 (1994): 227–53. http://dx.doi.org/10.1155/1994/572396.

Full text

Abstract:

Many programming models for massively parallel machines exist, and each has its advantages and disadvantages. In this article we present a programming model that combines features from other programming models that (1) can be efficiently implemented on present and future Cray Research massively parallel processor (MPP) systems and (2) are useful in constructing highly parallel programs. The model supports several styles of programming: message-passing, data parallel, global address (shared data), and work-sharing. These styles may be combined within the same program. The model includes features that allow a user to define a program in terms of the behavior of the system as a whole, where the behavior of individual tasks is implicit from this systemic definition. (In general, features marked as shared are designed to support this perspective.) It also supports an opposite perspective, where a program may be defined in terms of the behaviors of individual tasks, and a program is implicitly the sum of the behaviors of all tasks. (Features marked as private are designed to support this perspective). Users can exploit any combination of either set of features without ambiguity and thus are free to define a program from whatever perspective is most appropriate to the problem at hand.

APA, Harvard, Vancouver, ISO, and other styles

34

Tam, Wing-kin, and Zhi Yang. "Neural Parallel Engine: A toolbox for massively parallel neural signal processing." Journal of Neuroscience Methods 301 (May 2018): 18–33. http://dx.doi.org/10.1016/j.jneumeth.2018.03.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Katagiri, Takahiro, and Yasumasa Kanada. "An efficient implementation of parallel eigenvalue computation for massively parallel processing." Parallel Computing 27, no. 14 (2001): 1831–45. http://dx.doi.org/10.1016/s0167-8191(01)00122-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Kim, Jinsik, and Inwhee Joe. "Chip-Level Defect Analysis with Virtual Bad Wafers Based on Huge Big Data Handling for Semiconductor Production." Electronics 13, no. 11 (2024): 2205. http://dx.doi.org/10.3390/electronics13112205.

Full text

Abstract:

Semiconductors continue to shrink in die size because of benefits like cost savings, lower power consumption, and improved performance. However, this reduction leads to more defects due to increased inter-cell interference. Among the various defect types, customer-found defects are the most costly. Thus, finding the root cause of customer-found defects has become crucial to the quality of semiconductors. Traditional methods involve analyzing the pathways of many low-yield wafers. Yet, because of the extremely limited number of customer-found defects, obtaining significant results is difficult. After the products are provided to customers, they undergo rigorous testing and selection, leading to a very low defect rate. However, since the timing of defect occurrence varies depending on the environment in which the product is used, the quantity of defective samples is often quite small. Unfortunately, with such a low number of samples, typically 10 or fewer, it becomes impossible to investigate the root cause of wafer-level defects using conventional methods. This paper introduces a novel approach to finding the root cause of these rare defective chips for the first time in the semiconductor industry. Defective wafers are identified using rare customer-found chips and chip-level EDS (Electrical Die Sorting) data, and these newly identified defective wafers are termed vBADs (virtual bad wafers). The performance of root cause analysis is dramatically improved with vBADs. However, the chip-level analysis presented here demands substantial computing power. Therefore, MPP (Massive Parallel Processing) architecture is implemented and optimized to handle large volumes of chip-level data within a large architecture infrastructure that can manage big data. This allows for a chip-level defect analysis system that can recommend the relevant EDS test and identify the root cause in real time even with a single defective chip. The experimental results demonstrate that the proposed root cause search can reveal the hidden cause of a single defective chip by amplifying it with 90 vBADs, and system performance improves by a factor of 61.

APA, Harvard, Vancouver, ISO, and other styles

37

Kroc, Jiří. "Emergent Information Processing: Observations, Experiments, and Future Directions." Software 3, no. 1 (2024): 81–106. http://dx.doi.org/10.3390/software3010005.

Full text

Abstract:

Science is currently becoming aware of the challenges in the understanding of the very root mechanisms of massively parallel computations that are observed in literally all scientific disciplines, ranging from cosmology to physics, chemistry, biochemistry, and biology. This leads us to the main motivation and simultaneously to the central thesis of this review: “Can we design artificial, massively parallel, self-organized, emergent, error-resilient computational environments?” The thesis is solely studied on cellular automata. Initially, an overview of the basic building blocks enabling us to reach this end goal is provided. Important information dealing with this topic is reviewed along with highly expressive animations generated by the open-source, Python, cellular automata software GoL-N24. A large number of simulations along with examples and counter-examples, finalized by a list of the future directions, are giving hints and partial answers to the main thesis. Together, these pose the crucial question of whether there is something deeper beyond the Turing machine theoretical description of massively parallel computing. The perspective, future directions, including applications in robotics and biology of this research, are discussed in the light of known information.

APA, Harvard, Vancouver, ISO, and other styles

38

Rahozin, D. V. "Performance analysis of massively parallel programs for graphics processing units." PROBLEMS IN PROGRAMMING, no. 3-4 (December 2022): 51–58. http://dx.doi.org/10.15407/pp2022.03-04.051.

Full text

Abstract:

Any modern Graphics Processing Unit (graphics card) is a good platform to run massively parallel programs. Still, we lack tools to observe and measure performance characteristics of GPU-based software. We state that due to complex memory hierarchy and thou- sands of execution threads the all performance issues are about efficient use of graphics card memory hierarchy. We propose to use GPGPUSim simulator, previously used mostly for graphics card architecture validation, for performance validation for CUDA-based program. We provide examples which show how to use the simulation for performance analysis of massively parallel programs.

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, An, Stefano Ambrogio, Pritish Narayanan, et al. "(Invited) Emerging Nonvolatile Memories for Analog Neuromorphic Computing." ECS Meeting Abstracts MA2024-01, no. 21 (2024): 1293. http://dx.doi.org/10.1149/ma2024-01211293mtgabs.

Full text

Abstract:

Emerging non-volatile memory (NVM) devices, such as STT-MRAM, PCM, RRAM, have been explored for embedded memory and storage applications to replace CMOS-based SRAM/DRAM and Flash devices. Recently, many of these memory devices have been utilized for new computing paradigms beyond Boolean logic and von Neumann architectures. For example, in-memory analog computing reduces data movement between computing and memory units and exploits the intrinsic parallelism in memory arrays. It finds a natural application in deep neural network (DNN) accelerators by implementing high-throughput high-efficiency multiply accumulate (MAC) operations. Here the conductance of memory devices in a crossbar array represents DNN weights and the activations are encoded in input electrical signals (e.g., pulse height or duration). The MAC operation is conducted via the Ohm’s law (multiplication between voltage and conductance) and Kirchhoff’s law (accumulate via current summation) at constant time even for very large networks. DNN has surpassed human performance in various AI applications, e.g., image classification, natural language processing, etc. While general-purpose CPU/GPU and special-purpose digital accelerators provide current and near-term DNN hardware, there are longer-term opportunities for analog DNN accelerators based on emerging memory devices to achieve significantly higher performance and energy-efficiency. At the same time, analog accelerators impose new requirements on these devices beyond traditional memory applications, e.g., analog tunability, gradual and symmetric weight modulation, high precision, etc. Memory devices with analog nature in their physical mechanisms (e.g., filament growth in RRAM) may be optimized to meet these requirements, while some abrupt and asymmetric characteristics (e.g., filament rupture) present challenges. Increasingly large neural network models have been demonstrated on these memory arrays designed as analog accelerators, but they are still orders of magnitude smaller than state-of-the-art DNN models. While analog accelerators enable massively parallel computation, they are also susceptible to unique challenges in analog devices and circuitry (e.g., device variability, circuit noise), which may degrade network performance (e.g., accuracy). To benefit from the massively parallel MAC operation in analog memory arrays, these arrays need to be large enough to efficiently map the layers in modern DNN models. Among emerging NVM devices, PCM has the advantages of maturity and the availability of large-scale arrays, but also face some challenges in device characteristics, e.g., conductance drift, asymmetry, and noise. PCM-based analog DNN accelerators have been demonstrated at advanced technology node with millions of devices and achieved iso-accuracy on increasingly large network models. These accelerators integrate highly efficient analog PCM tiles for MAC operations with advanced CMOS circuitry for auxiliary digital functions. While material/device engineering continues to be explored to improve the analog properties of PCM devices, design and operation innovations can also help to improve the performance of PCM-based DNN weights, e.g., multiple-device-per-weight design, close-loop tunning. In addition, circuit innovations are essential for analog accelerator performance. Fig. 1 shows a 14nm PCM-based DNN inference accelerator, which incorporate design techniques such as 4-PCM weight units, 2D mesh for tile-to-tile communication, pulse-duration-based coding, etc. On top of technology and design innovations, some DNN models can also be modified to be more resilient against hardware imperfection and noise. PCM-based analog accelerators have achieved iso-accuracy on large DNN models with millions of weights. This talk will discuss the progress that we have achieved on PCM-based analog DNN inference accelerators, the challenges of PCM materials and devices, and promising solutions in technology and design. Figure 1

APA, Harvard, Vancouver, ISO, and other styles

40

Heirich, A. "A Scalable Diffusion Algorithm for Dynamic Mapping and Load Balancing on Networks of Arbitrary Topology." International Journal of Foundations of Computer Science 08, no. 03 (1997): 329–46. http://dx.doi.org/10.1142/s0129054197000215.

Full text

Abstract:

The problems of mapping and load balancing applications on arbitrary networks are considered. A novel diffusion algorithm is presented to solve the mapping problem. It complements the well known diffusion algorithms for load balancing which have enjoyed success on massively parallel computers (MPPs). Mapping is more difficult on interconnection networks than on MPPs because of the variations which occur in network topology. Popular mapping algorithms for MPPs which depend on recursive topologies are not applicable to irregular networks. The most celebrated of these MPP algorithms use information from the Laplacian matrix of a graph of communicating processes. The diffusion algorithm presented in this paper is also derived from this Laplacian matrix. The diffusion algorithm works on arbitrary network topologies and is dramatically faster than the celebrated MPP algorithms. It is delay and fault tolerant. Time to convergence depends on initial conditions and is insensitive to problem scale. This excellent scalability, among other features, makes the diffusion algorithm a viable candidate for dynamically mapping and load balancing not only existing MPP systems but also large distributed systems like the Internet, small cluster computers, and networks of workstations.

APA, Harvard, Vancouver, ISO, and other styles

41

NARAYANAN, P. J., and LARRY S. DAVIS. "REPLICATED IMAGE ALGORITHMS AND THEIR ANALYSES ON SIMD MACHINES." International Journal of Pattern Recognition and Artificial Intelligence 06, no. 02n03 (1992): 335–52. http://dx.doi.org/10.1142/s0218001492000217.

Full text

Abstract:

Data parallel processing on processor array architectures has gained popularity in data intensive applications, such as image processing and scientific computing, as massively parallel processor array machines became feasible commercially. The data parallel paradigm of assigning one processing element to each data element results in an inefficient utilization of a large processor array when a relatively small data structure is processed on it. The large degree of parallelism of a massively parallel processor array machine does not result in a faster solution to a problem involving relatively small data structures than the modest degree of parallelism of a machine that is just as large as the data structure. We presented data replication technique to speed up the processing of small data structures on large processor arrays. In this paper, we present replicated data algorithms for digital image convolutions and median filtering, and compare their performance with conventional data parallel algorithms for the same on three popular array interconnection networks, namely, the 2-D mesh, the 3-D mesh, and the hypercube.

APA, Harvard, Vancouver, ISO, and other styles

42

Nakabo, Yoshihiro, Idaku Ishii, and Masatoshi Ishikawa. "1ms Target Tracking System Using Massively Parallel Processing Vision." Journal of the Robotics Society of Japan 15, no. 3 (1997): 417–21. http://dx.doi.org/10.7210/jrsj.15.417.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Guan, Huiwei, and To-Yat Cheung. "Efficient approaches for constructing a massively parallel processing system." Journal of Systems Architecture 46, no. 13 (2000): 1185–90. http://dx.doi.org/10.1016/s1383-7621(00)00019-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Vinkler, Marek, Jiří Bittner, Vlastimil Havran, and Michal Hapala. "Massively Parallel Hierarchical Scene Processing with Applications in Rendering." Computer Graphics Forum 32, no. 8 (2013): 13–25. http://dx.doi.org/10.1111/cgf.12140.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Östermark, Ralf. "Massively parallel processing of recursive multi-period portfolio models." European Journal of Operational Research 259, no. 1 (2017): 344–66. http://dx.doi.org/10.1016/j.ejor.2016.10.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Gokhale, M., B. Holmes, and K. Iobst. "Processing in memory: the Terasys massively parallel PIM array." Computer 28, no. 4 (1995): 23–31. http://dx.doi.org/10.1109/2.375174.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Daly, Kevin B., Jay B. Benziger, Pablo G. Debenedetti, and Athanassios Z. Panagiotopoulos. "Massively parallel chemical potential calculation on graphics processing units." Computer Physics Communications 183, no. 10 (2012): 2054–62. http://dx.doi.org/10.1016/j.cpc.2012.05.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

NAGAI, Moeto. "Massively Parallel Single-Cell Processing Technology for Cell Therapy." Proceedings of Mechanical Engineering Congress, Japan 2022 (2022): F221–01. http://dx.doi.org/10.1299/jsmemecj.2022.f221-01.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

THIELE, LOTHAR, and ULRICH ARZT. "ON THE SYNTHESIS OF MASSIVELY PARALLEL ARCHITECTURES." International Journal of High Speed Electronics and Systems 04, no. 02 (1993): 99–131. http://dx.doi.org/10.1142/s0129156493000078.

Full text

Abstract:

We describe synthesis methods for massively parallel architectures. In particular, a methodology is proposed which leads to a mechanical and provably correct design of systems consisting of memory banks, switches, and processor arrays. In particular, the trajectory is based on the concept of piecewise-linear/regular algorithms and processor arrays. Complex design transformations such as partitioning, clustering, control generation, and flattening/creation of hierarchical levels are embedded in a homogeneous design flow. The final specification of the implementation not only contains the processing elements but also control flow, control processors and interfaces to memory banks and switches.

APA, Harvard, Vancouver, ISO, and other styles

50

Saveetha, V., and S. Sophia. "Optimal Tabu K-Means Clustering Using Massively Parallel Architecture." Journal of Circuits, Systems and Computers 27, no. 13 (2018): 1850199. http://dx.doi.org/10.1142/s0218126618501992.

Full text

Abstract:

Parallel discovery of inherent clusters using massively threaded architectures is the solution for handling computational challenges raised by fat datasets in cluster analysis. The Graphics Processing Unit and Compute Unified Device Architecture form a convincing platform to parallelize clustering algorithms. The parallel K-means algorithm aims at increasing the speedup, but often faces the hitch of falling into local minima. The heuristic search procedure to discover the global optima in the solution space is known as Tabu Search. The K-means clustering solution is fine-tuned by applying parallel implementation of Tabu Search K-means clustering in order to increase efficacy. The aim is to combine optimization characteristic of Tabu Search for calculation of centroids with clustering attitude of K-means and to enhance the solution using processing power of GPU. The parallelization strategy used exhibits increase in speedup. The parallel Tabu-KM algorithm is tested on standard datasets, and performance is compared with sequential K-means, parallel K-means and sequential Tabu K-means algorithms. The experimental results confirm yet another parallelization technique to unravel data clustering problems.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!