To see the other types of publications on this topic, follow the link: Parallel programming techniques.

Journal articles on the topic 'Parallel programming techniques'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Parallel programming techniques.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

PELÁEZ, IGNACIO, FRANCISCO ALMEIDA, and DANIEL GONZÁLEZ. "HIGH LEVEL PARALLEL SKELETONS FOR DYNAMIC PROGRAMMING." Parallel Processing Letters 18, no. 01 (March 2008): 133–47. http://dx.doi.org/10.1142/s0129626408003272.

Full text
Abstract:
Dynamic Programming is an important problem-solving technique used for solving a wide variety of optimization problems. Dynamic Programming programs are commonly designed as individual applications and software tools are usually tailored to specific classes of recurrences and methodologies. That contrasts with some other algorithmic techniques where a single generic program may solve all the instances. We have developed a general skeleton tool providing support for a wide range of dynamic programming methodologies on different parallel architectures. Genericity, flexibility and efficiency are basic issues of the design strategy. Parallelism is supplied to the user in a transparent manner through a common sequential interface. A set of test problems representative of different classes of Dynamic Programming formulations has been used to validate our skeleton on an IBM-SP.
APA, Harvard, Vancouver, ISO, and other styles
2

Mou, Xin Gang, Guo Hua Wei, and Xiao Zhou. "Parallel Programming and Optimization Based on TMS320C6678." Applied Mechanics and Materials 615 (August 2014): 259–64. http://dx.doi.org/10.4028/www.scientific.net/amm.615.259.

Full text
Abstract:
The development of multi-core processors has provided a good solution to applications that require real-time processing and a large number of calculations. However, simply exploiting parallelism in software is hard to make full use of the hardware performance. This paper studies the parallel programming and optimization techniques on TMS320C6678 multicore digital signal processors. We firstly illustrate an implementation of a selected parallel image convolution algorithm by OpenMP. Then several optimization techniques such as compiler intrinsics, cache, DMA are used to further enhance the application performance and achieve a good execution time according to the test results.
APA, Harvard, Vancouver, ISO, and other styles
3

Graham, John R. "Integrating parallel programming techniques into traditional computer science curricula." ACM SIGCSE Bulletin 39, no. 4 (December 2007): 75–78. http://dx.doi.org/10.1145/1345375.1345419.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ibarra, David, and Josep Arnal. "Parallel Programming Techniques Applied to Water Pump Scheduling Problems." Journal of Water Resources Planning and Management 140, no. 7 (July 2014): 06014002. http://dx.doi.org/10.1061/(asce)wr.1943-5452.0000439.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Alghamdi, Ahmed Mohammed, Fathy Elbouraey Eassa, Maher Ali Khamakhem, Abdullah Saad AL-Malaise AL-Ghamdi, Ahmed S. Alfakeeh, Abdullah S. Alshahrani, and Ala A. Alarood. "Parallel Hybrid Testing Techniques for the Dual-Programming Models-Based Programs." Symmetry 12, no. 9 (September 20, 2020): 1555. http://dx.doi.org/10.3390/sym12091555.

Full text
Abstract:
The importance of high-performance computing is increasing, and Exascale systems will be feasible in a few years. These systems can be achieved by enhancing the hardware’s ability as well as the parallelism in the application by integrating more than one programming model. One of the dual-programming model combinations is Message Passing Interface (MPI) + OpenACC, which has several features including increased system parallelism, support for different platforms with more performance, better productivity, and less programming effort. Several testing tools target parallel applications built by using programming models, but more effort is needed, especially for high-level Graphics Processing Unit (GPU)-related programming models. Owing to the integration of different programming models, errors will be more frequent and unpredictable. Testing techniques are required to detect these errors, especially runtime errors resulting from the integration of MPI and OpenACC; studying their behavior is also important, especially some OpenACC runtime errors that cannot be detected by any compiler. In this paper, we enhance the capabilities of ACC_TEST to test the programs built by using the dual-programming models MPI + OpenACC and detect their related errors. Our tool integrated both static and dynamic testing techniques to create ACC_TEST and allowed us to benefit from the advantages of both techniques reducing overheads, enhancing system execution time, and covering a wide range of errors. Finally, ACC_TEST is a parallel testing tool that creates testing threads based on the number of application threads for detecting runtime errors.
APA, Harvard, Vancouver, ISO, and other styles
6

García-Blas, Javier, and Christopher Brown. "High-level programming for heterogeneous and hierarchical parallel systems." International Journal of High Performance Computing Applications 32, no. 6 (November 2018): 804–6. http://dx.doi.org/10.1177/1094342018807840.

Full text
Abstract:
High-Level Heterogeneous and Hierarchical Parallel Systems (HLPGPU) aims to bring together researchers and practitioners to present new results and ongoing work on those aspects of high-level programming relevant, or specific to general-purpose computing on graphics processing units (GPGPUs) and new architectures. The 2016 HLPGPU symposium was an event co-located with the HiPEAC conference in Prague, Czech Republic. HLPGPU is targeted at high-level parallel techniques, including programming models, libraries and languages, algorithmic skeletons, refactoring tools and techniques for parallel patterns, tools and systems to aid parallel programming, heterogeneous computing, timing analysis and statistical performance models.
APA, Harvard, Vancouver, ISO, and other styles
7

PERRI, SIMONA, FRANCESCO RICCA, and MARCO SIRIANNI. "Parallel instantiation of ASP programs: techniques and experiments." Theory and Practice of Logic Programming 13, no. 2 (January 25, 2012): 253–78. http://dx.doi.org/10.1017/s1471068411000652.

Full text
Abstract:
AbstractAnswer-Set Programming (ASP) is a powerful logic-based programming language, which is enjoying increasing interest within the scientific community and (very recently) in industry. The evaluation of Answer-Set Programs is traditionally carried out in two steps. At the first step, an input program undergoes the so-called instantiation (or grounding) process, which produces a program ′ semantically equivalent to , but not containing any variable; in turn, ′ is evaluated by using a backtracking search algorithm in the second step. It is well-known that instantiation is important for the efficiency of the whole evaluation, might become a bottleneck in common situations, is crucial in several real-world applications, and is particularly relevant when huge input data have to be dealt with. At the time of this writing, the available instantiator modules are not able to exploit satisfactorily the latest hardware, featuring multi-core/multi-processor Symmetric MultiProcessing technologies. This paper presents some parallel instantiation techniques, including load-balancing and granularity control heuristics, which allow for the effective exploitation of the processing power offered by modern Symmetric MultiProcessing machines. This is confirmed by an extensive experimental analysis reported herein.
APA, Harvard, Vancouver, ISO, and other styles
8

Sathya, S., R. Hema, and M. Amala. "Parallel Techniques for Linear Programming Problems Using Multiprogramming and RSM." International Journal of Engineering Trends and Technology 13, no. 5 (July 25, 2014): 200–203. http://dx.doi.org/10.14445/22315381/ijett-v13p242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Skinner, Gregg, and Rudolf Eigenmann. "Parallel Performance of a Combustion Chemistry Simulation." Scientific Programming 4, no. 3 (1995): 127–39. http://dx.doi.org/10.1155/1995/342723.

Full text
Abstract:
We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
APA, Harvard, Vancouver, ISO, and other styles
10

El-Neweihi, Emad, Frank Proschan, and Jayaram Sethuraman. "Optimal allocation of components in parallel–series and series–parallel systems." Journal of Applied Probability 23, no. 3 (September 1986): 770–77. http://dx.doi.org/10.2307/3214014.

Full text
Abstract:
This paper shows how majorization and Schur-convex functions can be used to solve the problem of optimal allocation of components to parallel-series and series-parallel systems to maximize the reliability of the system. For parallel-series systems the optimal allocation is completely described and depends only on the ordering of component reliabilities. For series-parallel systems, we describe a partial ordering among allocations that can lead to the optimal allocation. Finally, we describe how these problems can be cast as integer linear programming problems and thus the results obtained in this paper show that when some linear integer programming problems are recast in a different way and the techniques of Schur functions are used, complete solutions can be obtained in some instances and better insight in others.
APA, Harvard, Vancouver, ISO, and other styles
11

El-Neweihi, Emad, Frank Proschan, and Jayaram Sethuraman. "Optimal allocation of components in parallel–series and series–parallel systems." Journal of Applied Probability 23, no. 03 (September 1986): 770–77. http://dx.doi.org/10.1017/s002190020011191x.

Full text
Abstract:
This paper shows how majorization and Schur-convex functions can be used to solve the problem of optimal allocation of components to parallel-series and series-parallel systems to maximize the reliability of the system. For parallel-series systems the optimal allocation is completely described and depends only on the ordering of component reliabilities. For series-parallel systems, we describe a partial ordering among allocations that can lead to the optimal allocation. Finally, we describe how these problems can be cast as integer linear programming problems and thus the results obtained in this paper show that when some linear integer programming problems are recast in a different way and the techniques of Schur functions are used, complete solutions can be obtained in some instances and better insight in others.
APA, Harvard, Vancouver, ISO, and other styles
12

DANELUTTO, MARCO. "EFFICIENT SUPPORT FOR SKELETONS ON WORKSTATION CLUSTERS." Parallel Processing Letters 11, no. 01 (March 2001): 41–56. http://dx.doi.org/10.1142/s0129626401000415.

Full text
Abstract:
Beowulf class clusters are gaining more and more interest as low cost parallel architectures. They deliver reasonable performance at a very reasonable cost, compared to classical MPP machines. Parallel applications are usually developed on clusters using MPI/PVM message passing or HPF programming environments. Here we discuss new implementation strategies to support structured parallel programming environments for clusters based on skeletons. The adoption of structured parallel programming models greatly reduces the time spent in developing new parallel applications on clusters. The adoption of our implementation techniques based on macro data flow allows very efficient parallel applications to be developed on clusters. We discuss experiments that demonstrate the full feasibility of the approach.
APA, Harvard, Vancouver, ISO, and other styles
13

SHEERAN, MARY. "Functional and dynamic programming in the design of parallel prefix networks." Journal of Functional Programming 21, no. 1 (December 6, 2010): 59–114. http://dx.doi.org/10.1017/s0956796810000304.

Full text
Abstract:
AbstractA parallel prefix network of width n takes n inputs, a1, a2, . . ., an, and computes each yi = a1 ○ a2 ○ ⋅ ⋅ ⋅ ○ ai for 1 ≤ i ≤ n, for an associative operator ○. This is one of the fundamental problems in computer science, because it gives insight into how parallel computation can be used to solve an apparently sequential problem. As parallel programming becomes the dominant programming paradigm, parallel prefix or scan is proving to be a very important building block of parallel algorithms and applications. There are many different parallel prefix networks, with different properties such as number of operators, depth and allowed fanout from the operators. In this paper, ideas from functional programming are combined with search to enable a deep exploration of parallel prefix network design. Networks that improve on the best known previous results are generated. It is argued that precise modelling in a functional programming language, together with simple visualization of the networks, gives a new, more experimental, approach to parallel prefix network design, improving on the manual techniques typically employed in the literature. The programming idiom that marries search with higher order functions may well have wider application than the network generation described here.
APA, Harvard, Vancouver, ISO, and other styles
14

Cai, Xing, Hans Petter Langtangen, and Halvard Moe. "On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations." Scientific Programming 13, no. 1 (2005): 31–56. http://dx.doi.org/10.1155/2005/619804.

Full text
Abstract:
This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.
APA, Harvard, Vancouver, ISO, and other styles
15

Ladias, Anastasios, Theodoros Karvounidis, and Dimitrios Ladias. "Classification of the programming styles in scratch using the SOLO taxonomy." Advances in Mobile Learning Educational Research 1, no. 2 (2021): 114–23. http://dx.doi.org/10.25082/amler.2021.02.006.

Full text
Abstract:
The present study attempts to categorize the programming styles of sequential, parallel, and event-driven programming using as criterion, the level of adoption of the structured programming design techniques. These techniques are modularity, hierarchical design, shared code, and parametrization. Applying these techniques to the Scratch programming environment results in a two-dimensional table of representative code. In this table, one dimension is the types of the aforementioned programming styles and the other is the characteristics of structured programming. The calibration of each of the dimensions has been held with the help of the levels of the SOLO taxonomy. This table can develop criteria for evaluating the quality characteristics of codes produced by students in a broader grading system.
APA, Harvard, Vancouver, ISO, and other styles
16

Jin, Xiao, Xing Jin Zhang, Zhi Yun Zheng, Quan Min Li, and Li Ping Lu. "Research on Parallel Computation of Semantic Similarity in Linked Data." Advanced Materials Research 1049-1050 (October 2014): 1320–26. http://dx.doi.org/10.4028/www.scientific.net/amr.1049-1050.1320.

Full text
Abstract:
This paper proposes a novel parallel computing method of semantic similarity in linked data to solve such problems as low efficiency and data dispersion.It combines the existing similarity calculation method with MapReduce parallel computation framework to design the appropriate parallel computing method of similarity. First, three typical similarity computing methods and the parallel programming models are introduced. Then according to the MapReduce programming techniques of cloud computing, the parallel computation of similarity in linked data is proposed. The experimental results show that, compared with the traditional platforms, the parallel computing method of similarity on the Hadoop cluster not only improves the capacity and efficiency in the processing massive data, but also has a better speed-up ratio and augmentability.
APA, Harvard, Vancouver, ISO, and other styles
17

Guerrero, T. M., S. R. Cherry, M. Dahlbom, A. R. Ricci, and E. J. Hoffman. "Fast implementations of 3D PET reconstruction using vector and parallel programming techniques." IEEE Transactions on Nuclear Science 40, no. 4 (August 1993): 1082–86. http://dx.doi.org/10.1109/23.256716.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Qawasmeh, Ahmad, Salah Taamneh, Ashraf H. Aljammal, Nabhan Hamadneh, Mustafa Banikhalaf, and Mohammad Kharabsheh. "Parallelism exploration in sequential algorithms via animation tool." Multiagent and Grid Systems 17, no. 2 (August 23, 2021): 145–58. http://dx.doi.org/10.3233/mgs-210347.

Full text
Abstract:
Different high performance techniques, such as profiling, tracing, and instrumentation, have been used to tune and enhance the performance of parallel applications. However, these techniques do not show how to explore the potential of parallelism in a given application. Animating and visualizing the execution process of a sequential algorithm provide a thorough understanding of its usage and functionality. In this work, an interactive web-based educational animation tool was developed to assist users in analyzing sequential algorithms to detect parallel regions regardless of the used parallel programming model. The tool simplifies algorithms’ learning, and helps students to analyze programs efficiently. Our statistical t-test study on a sample of students showed a significant improvement in their perception of the mechanism and parallelism of applications and an increase in their willingness to learn algorithms and parallel programming.
APA, Harvard, Vancouver, ISO, and other styles
19

del Rio Astorga, David, Manuel F. Dolz, Luis Miguel Sánchez, J. Daniel García, Marco Danelutto, and Massimo Torquati. "Finding parallel patterns through static analysis in C++ applications." International Journal of High Performance Computing Applications 32, no. 6 (March 9, 2017): 779–88. http://dx.doi.org/10.1177/1094342017695639.

Full text
Abstract:
Since the ‘free lunch’ of processor performance is over, parallelism has become the new trend in hardware and architecture design. However, parallel resources deployed in data centers are underused in many cases, given that sequential programming is still deeply rooted in current software development. To address this problem, new methodologies and techniques for parallel programming have been progressively developed. For instance, parallel frameworks, offering programming patterns, allow expressing concurrency in applications to better exploit parallel hardware. Nevertheless, a large portion of production software, from a broad range of scientific and industrial areas, is still developed sequentially. Considering that these software modules contain thousands, or even millions, of lines of code, an extremely large amount of effort is needed to identify parallel regions. To pave the way in this area, this paper presents Parallel Pattern Analyzer Tool, a software component that aids the discovery and annotation of parallel patterns in source codes. This tool simplifies the transformation of sequential source code to parallel. Specifically, we provide support for identifying Map, Farm, and Pipeline parallel patterns and evaluate the quality of the detection for a set of different C++ applications.
APA, Harvard, Vancouver, ISO, and other styles
20

Pal, Amrit, and Manish Kumar. "Frequent Itemset Mining in Large Datasets a Survey." International Journal of Information Retrieval Research 7, no. 4 (October 2017): 37–49. http://dx.doi.org/10.4018/ijirr.2017100103.

Full text
Abstract:
Frequent Itemset Mining is a well-known area in data mining. Most of the techniques available for frequent itemset mining requires complete information about the data which can result in generation of the association rules. The amount of data is increasing day by day taking form of BigData, which require changes in the algorithms for working on such large-scale data. Parallel implementation of the mining techniques can provide solutions to this problem. In this paper a survey of frequent itemset mining techniques is done which can be used in a parallel environment. Programming models like Map Reduce provides efficient architecture for working with BigData, paper also provides information about issues and feasibility about technique to be implemented in such environment.
APA, Harvard, Vancouver, ISO, and other styles
21

Hosseini-Rad, Mina, Majid Abdulrozzagh-Nezzad, and Seyyed-Mohammad Javadi-Moghaddam. "Study of Scheduling in Programming Languages of Multi-Core Processor." Data Science: Journal of Computing and Applied Informatics 2, no. 2 (July 1, 2019): 101–9. http://dx.doi.org/10.32734/jocai.v2.i2-282.

Full text
Abstract:
Over the recent decades, the nature of multi core processors caused changing the serial programming model to parallel mode. There are several programming languages for the parallel multi core processors and processors with different architectures that these languages have faced programmers to challenges to achieve higher performance. In additional, different scheduling methods in the programming languages for the multi core processors have significant impact on efficiency of the programming languages. Therefore, this article addresses the investigation of the conventional scheduling techniques in the programming languages of multi core processors which allows researcher to choose more suitable programing languages by comparing efficiency than application. Several languages such as Cilk++، OpenMP، TBB and PThread were studied and their scheduling efficiency has been investigated by running Quick-Sort and Merge-Sort algorithms as well
APA, Harvard, Vancouver, ISO, and other styles
22

Hosseini-Rad, Mina, Majid Abdolrazzagh-Nezhad, and Seyyed-Mohammad Javadi-Moghaddam. "Study of Scheduling in Programming Languages of Multi-Core Processor." Data Science: Journal of Computing and Applied Informatics 2, no. 2 (August 3, 2018): 101–9. http://dx.doi.org/10.32734/jocai.v2.i2-327.

Full text
Abstract:
Over the recent decades, the nature of multi-core processors caused changing the serial programming model to parallel mode. There are several programming languages for the parallel multi-core processors and processors with different architectures that these languages have faced programmers to challenges to achieve higher performance. In addition, different scheduling methods in the programming languages for the multi-core processors have the significant impact on the efficiency of the programming languages. Therefore, this article addresses the investigation of the conventional scheduling techniques in the programming languages of multi-core processors which allows the researcher to choose more suitable programing languages by comparing efficiency than application. Several languages such as Cilk++، OpenMP، TBB and PThread were studied, and their scheduling efficiency has been investigated by running Quick-Sort and Merge-Sort algorithms as well.
APA, Harvard, Vancouver, ISO, and other styles
23

Martínez, Víctor, Fernando Berzal, and Juan-Carlos Cubero. "NOESIS: A Framework for Complex Network Data Analysis." Complexity 2019 (October 31, 2019): 1–14. http://dx.doi.org/10.1155/2019/1439415.

Full text
Abstract:
Network data mining has attracted a lot of attention since a large number of real-world problems have to deal with complex network data. In this paper, we present NOESIS, an open-source framework for network-based data mining. NOESIS features a large number of techniques and methods for the analysis of structural network properties, network visualization, community detection, link scoring, and link prediction. The proposed framework has been designed following solid design principles and exploits parallel computing using structured parallel programming. NOESIS also provides a stand-alone graphical user interface allowing the use of advanced software analysis techniques to users without prior programming experience. This framework is available under a BSD open-source software license.
APA, Harvard, Vancouver, ISO, and other styles
24

Susungi, Adilla, and Claude Tadonki. "Intermediate Representations for Explicitly Parallel Programs." ACM Computing Surveys 54, no. 5 (June 2021): 1–24. http://dx.doi.org/10.1145/3452299.

Full text
Abstract:
While compilers generally support parallel programming languages and APIs, their internal program representations are mostly designed from the sequential programs standpoint (exceptions include source-to-source parallel compilers, for instance). This makes the integration of compilation techniques dedicated to parallel programs more challenging. In addition, parallelism has various levels and different targets, each of them with specific characteristics and constraints. With the advent of multi-core processors and general purpose accelerators, parallel computing is now a common and pervasive consideration. Thus, software support to parallel programming activities is essential to make this technical transition more realistic and beneficial. The case of compilers is fundamental as they deal with (parallel) programs at a structural level, thus the need for intermediate representations. This article surveys and discusses attempts to provide intermediate representations for the proper support of explicitly parallel programs. We highlight the gap between available contributions and their concrete implementation in compilers and then exhibit possible future research directions.
APA, Harvard, Vancouver, ISO, and other styles
25

Park, Tae-Jung. "CUDA-based Object Oriented Programming Techniques for Efficient Parallel Visualization of 3D Content." Journal of Digital Contents Society 13, no. 2 (June 30, 2012): 169–76. http://dx.doi.org/10.9728/dcs.2012.13.2.169.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Hunt, John A. "Computer-aided parallel EELS techniques: acquisition, processing, & imaging." Proceedings, annual meeting, Electron Microscopy Society of America 47 (August 6, 1989): 398–99. http://dx.doi.org/10.1017/s0424820100153968.

Full text
Abstract:
The recent commercial introduction of the parallel detection electron energy-loss spectrometer has undoubtedly made electron energy-loss spectroscopy (EELS) more viable as a technique for routine microanalysis. Additionally, the increased recording efficiency of parallel EELS (PEELS) warrants the use of more involved acquisition and processing techniques than was necessary, or even possible with serial EELS. This increased complexity places greater demands on the computer systems controlling data acquisition. Multichannel analyzers systems with small resources and limited programming facilities are not capable of exploiting the full capabilities of the PEELS spectrometer.Preliminary efforts of the author with the Gatan PEELS spectrometer were concentrated on development of a flexible acquisition system at National Institutes of Health. Hardware control is performed through machine-language drivers called from high-level languages (HLL) such as FORTRAN and C. The software drivers and hardware were designed to minimize processor involvement in the data collection process, resulting in the capability to collect data while processing continues within the parent HLL. This design simplifies the HLL program structure and minimizes data collection dead time.
APA, Harvard, Vancouver, ISO, and other styles
27

Chickerur, Satyadhyan, Shobhit Dalal, and Supreeth Sharma. "Parallel Rendering Mechanism for Graphics Programming on Multicore Processors." International Journal of Grid and High Performance Computing 5, no. 1 (January 2013): 82–94. http://dx.doi.org/10.4018/jghpc.2013010106.

Full text
Abstract:
The present day technological advancement has resulted in multiple core processors coming into desktops, handhelds, servers and workstations. This is because the present day applications and users demand huge computing power and interactivity. Both of these reasons have resulted in a total design shift in the way the processors are designed and developed. However the change in the hardware has not been accompanied with the change in the way the software has to be written for these multicore processors. In this paper, we intend to provide the integration of OpenGL programs on a platform which supports multicore processors. The paper would result in clear understanding how graphics pipelines can be implemented on multi-core processors to achieve higher computational speeds up with highest thread granularity. The impacts of using too much parallelism are also discussed. An OpenMP API for the thread scheduling of parallel task is discussed in this paper. The tool Intel VTune Performance Analyzer is used to find the hotspots and for software optimization. Comparing both the serial and parallel execution of graphics code shows encouraging results and it has been found that the increase in frame rate has resulted due to parallel programming techniques.
APA, Harvard, Vancouver, ISO, and other styles
28

Pike, Rob, Sean Dorward, Robert Griesemer, and Sean Quinlan. "Interpreting the Data: Parallel Analysis with Sawzall." Scientific Programming 13, no. 4 (2005): 277–98. http://dx.doi.org/10.1155/2005/962135.

Full text
Abstract:
Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design – including the separation into two phases, the form of the programming language, and the properties of the aggregators – exploits the parallelism inherent in having data and computation distributed across many machines.
APA, Harvard, Vancouver, ISO, and other styles
29

Yin, Xiao Hui, Peng Dong Gao, Chu Qiu, and Yong Quan Lu. "Parallel Processing and Performance Optimization of Meteorological Satellite Mass-Data Program." Applied Mechanics and Materials 263-266 (December 2012): 192–97. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.192.

Full text
Abstract:
We implemented parallel processing of polar-orbiting meteorological satellite MERSI data aerosol retrieval program. The parallel implementation was based on Linux cluster architecture. The master-slave parallel programming mode on MPI parallel environment was applied. Performance optimizations are made in load balance, communication overhead, storage and system I/O according to the specific environment. In addition, parallel speed-up ratio and efficiency were analyzed to evaluate the experiment results. Experimental results demonstrate that the parallel techniques and the performance optimization methods proposed here can significantly improve the efficiency of satellite mass-data processing program.
APA, Harvard, Vancouver, ISO, and other styles
30

Huang, Xiaomeng, Xing Huang, Dong Wang, Qi Wu, Yi Li, Shixun Zhang, Yuwen Chen, et al. "OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing." Geoscientific Model Development 12, no. 11 (November 11, 2019): 4729–49. http://dx.doi.org/10.5194/gmd-12-4729-2019.

Full text
Abstract:
Abstract. Rapidly evolving computational techniques are making a large gap between scientific aspiration and code implementation in climate modeling. In this work, we design a simple computing library to bridge the gap and decouple the work of ocean modeling from parallel computing. This library provides 12 basic operators that feature user-friendly interfaces, effective programming, and implicit parallelism. Several state-of-the-art computing techniques, including computing graph and just-in-time compiling, are employed to parallelize the seemingly serial code and speed up the ocean models. These operator interfaces are designed using native Fortran programming language to smooth the learning curve. We further implement a highly readable and efficient ocean model that contains only 1860 lines of code but achieves a 91 % parallel efficiency in strong scaling and 99 % parallel efficiency in weak scaling with 4096 Intel CPU cores. This ocean model also exhibits excellent scalability on the heterogeneous Sunway TaihuLight supercomputer. This work presents a promising alternative tool for the development of ocean models.
APA, Harvard, Vancouver, ISO, and other styles
31

Teijeiro, Diego, Xoán C. Pardo, Patricia González, Julio R. Banga, and Ramón Doallo. "Towards cloud-based parallel metaheuristics." International Journal of High Performance Computing Applications 32, no. 5 (November 28, 2016): 693–705. http://dx.doi.org/10.1177/1094342016679011.

Full text
Abstract:
Many key problems in science and engineering can be formulated and solved using global optimization techniques. In the particular case of computational biology, the development of dynamic (kinetic) models is one of the current key issues. In this context, the problem of parameter estimation (model calibration) remains as a very challenging task. The complexity of the underlying models requires the use of efficient solvers to achieve adequate results in reasonable computation times. Metaheuristics have been the focus of great consideration as an efficient way of solving hard global optimization problems. Even so, in most realistic applications, metaheuristics require a very large computation time to obtain an acceptable result. Therefore, several parallel schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastructures. However, with the emergence of cloud computing, new programming models have been proposed to deal with large-scale data processing on clouds. In this paper we explore the applicability of these new models for global optimization problems using as a case study a set of challenging parameter estimation problems in systems biology. We have developed, using Spark, an island-based parallel version of Differential Evolution. Differential Evolution is a simple population-based metaheuristic that, at the same time, is very popular for being very efficient in real function global optimization. Several experiments were conducted both on a cluster and on the Microsoft Azure public cloud to evaluate the speedup and efficiency of the proposal, concluding that the Spark implementation achieves not only competitive speedup against the serial implementation, but also good scalability when the number of nodes grows. The results can be useful for those interested in using parallel metaheuristics for global optimization problems benefiting from the potential of new cloud programming models.
APA, Harvard, Vancouver, ISO, and other styles
32

Oberhuber, Tomáš, Jakub Klinkovský, and Radek Fučík. "TNL: NUMERICAL LIBRARY FOR MODERN PARALLEL ARCHITECTURES." Acta Polytechnica 61, SI (February 10, 2021): 122–34. http://dx.doi.org/10.14311/ap.2021.61.0122.

Full text
Abstract:
We present Template Numerical Library (TNL, www.tnl-project.org) with native support of modern parallel architectures like multi–core CPUs and GPUs. The library offers an abstract layer for accessing these architectures via unified interface tailored for easy and fast development of high-performance algorithms and numerical solvers. The library is written in C++ and it benefits from template meta–programming techniques. In this paper, we present the most important data structures and algorithms in TNL together with scalability on multi–core CPUs and speed–up on GPUs supporting CUDA.
APA, Harvard, Vancouver, ISO, and other styles
33

Li, Zhi Yong, Zhen Liang Ye, and Chen Tao Liu. "Parallel Programming Methods Based on the Multi-Core DSP TMS320C6670." Applied Mechanics and Materials 198-199 (September 2012): 1487–92. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.1487.

Full text
Abstract:
While the frame rate is higher and the image size is larger, sequence images processing is harder. Good real-time can be ensured by the multi-core DSP in the embedded image processing system. TMS320C6670 which is the multi-core DSP designed by TI corporation is selected as study object. Based on hardware characteristics analyzed, the Data Flow model is adopted as the multi-core processing model. Two data processing subtasks assigning methods are analyzed by comparing their advantages and disadvantages on the system idle time and memory requirements. The data processing subtask assigning flow is design for a serial sequence images processing example. An inter-core data transfer flow design idea is put forward. Using methods and occasion of two kinds of data buffer establishing techniques is studied and defined. An inter-core notification flow design idea is put forward. Using methods and occasion of three notification methods based on the interrupt controller and the Semaphore2 module is studied and defined.
APA, Harvard, Vancouver, ISO, and other styles
34

Östermark, Ralf. "A parallel algorithm for optimizing the capital structure contingent on maximum value at risk." Kybernetes 44, no. 3 (March 2, 2015): 384–405. http://dx.doi.org/10.1108/k-08-2014-0171.

Full text
Abstract:
Purpose – The purpose of this paper is to measure the financial risk and optimal capital structure of a corporation. Design/methodology/approach – Irregular disjunctive programming problems arising in firm models and risk management can be solved by the techniques presented in the paper. Findings – Parallel processing and mathematical modeling provide a fruitful basis for solving ultra-scale non-convex general disjunctive programming (GDP) problems, where the computational challenge in direct mixed-integer non-linear programming (MINLP) formulations or single processor algorithms would be insurmountable. Research limitations/implications – The test is limited to a single firm in an experimental setting. Repeating the test on large sample of firms in future research will indicate the general validity of Monte-Carlo-based VAR estimation. Practical implications – The authors show that the risk surface of the firm can be approximated by integrated use of accounting logic, corporate finance, mathematical programming, stochastic simulation and parallel processing. Originality/value – Parallel processing has potential to simplify large-scale MINLP and GDP problems with non-convex, multi-modal and discontinuous parameter generating functions and to solve them faster and more reliably than conventional approaches on single processors.
APA, Harvard, Vancouver, ISO, and other styles
35

Popa, Bogdan, Dan Selișteanu, and Alexandra Elisabeta Lorincz. "Possibilities of Use for Fractal Techniques as Parameters of Graphic Analysis." Fractal and Fractional 6, no. 11 (November 19, 2022): 686. http://dx.doi.org/10.3390/fractalfract6110686.

Full text
Abstract:
Image processing remains an area that has impact on the software industry and is a field that is permanently developing in both IT and industrial contexts. Nowadays, the demand for fast computing times is becoming increasingly difficult to fulfill in the case of massive computing systems. This article proposes a particular case of efficiency for a specifically developed model for fractal generations. From the point of view of graphic analysis, the application can generate a series of fractal images. This process is analyzed and compared in this study from a programming perspective in terms of both the results at the processor level and the graphical generation possibilities. This paper presents the structure of the software and its implementation for generating fractal images using the Mandelbrot set. Starting from the complex mathematical set, the component iterations of the Mandelbrot algorithm lead to optimization variants for the calculation. The article consists of a presentation of an optimization variant based on applying parallel calculations for fractal generation. The method used in the study assumes a high grade of accuracy regarding the selected mathematical model for fractal generation and does not characterize a method specially built for a certain kind of image. A series of scenarios are analyzed, and details related to differences in terms of calculation times, starting from the more efficient proposed variant, are presented. The developed software implementation is parallelization-based and is optimized for generating a wide variety of fractal images while also providing a test package for the generated environment. The influence of parallel programming is highlighted in terms of its difference to sequential programming to, in turn, highlight recent methods of speeding up computing times. The purpose of the article is to combine the complexity of the mathematical calculation behind the fractal sets with programming techniques to provides an analysis of the graphic results from the point of view of the use of computing resources and working time.
APA, Harvard, Vancouver, ISO, and other styles
36

Burkhart, Helmar, Robert Frank, and Guido Hächler. "Structured Parallel Programming: How Informatics Can Help Overcome the Software Dilemma." Scientific Programming 5, no. 1 (1996): 33–45. http://dx.doi.org/10.1155/1996/570310.

Full text
Abstract:
The state-of-the-art programming of parallel computers is far from being successful. The main challenge today is, therefore, the development of techniques and tools that improve programmers' productivity. Programmability, portability, and reusability are key issues to be solved. In this article we shall report about our ongoing efforts in this direction. After a short discussion of the software dilemma found today, we shall present the Basel approach. We shall summarize our algorithm description methodology and discuss the basic concepts of the proposed skeleton language. An algorithmic example and comments on implementation aspects will explain our work in more detail. We shall summarize the current state of the implementation and conclude with a discussion of related work.
APA, Harvard, Vancouver, ISO, and other styles
37

Özturan, Can, Balaram Sinharoy, and Boleslaw K. Szymanski. "Compiler Technology for Parallel Scientific Computation." Scientific Programming 3, no. 3 (1994): 201–25. http://dx.doi.org/10.1155/1994/243495.

Full text
Abstract:
There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL). Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
APA, Harvard, Vancouver, ISO, and other styles
38

García-García, César, José Luis Fernández-Robles, Victor Larios-Rosillo, and Hervé Luga. "ALFIL." International Journal of Game-Based Learning 2, no. 3 (July 2012): 71–86. http://dx.doi.org/10.4018/ijgbl.2012070105.

Full text
Abstract:
This article presents the current development of a serious game for the simulation of massive evacuations. The purpose of this project is to promote self-protection through awareness of the procedures and different possible scenarios during the evacuation of a massive event. Sophisticated behaviors require massive computational power and it has been necessary to implement several distributed programming techniques to simulate crowds of thousands of people. Even with the current state of computer hardware, the costs of building and operating this hardware is still prohibitive; so, it‘s preferred to apply distributed programming techniques running on specialized parallel computing hardware.
APA, Harvard, Vancouver, ISO, and other styles
39

GRELCK, CLEMENS, and SVEN-BODO SCHOLZ. "SAC — FROM HIGH-LEVEL PROGRAMMING WITH ARRAYS TO EFFICIENT PARALLEL EXECUTION." Parallel Processing Letters 13, no. 03 (September 2003): 401–12. http://dx.doi.org/10.1142/s0129626403001379.

Full text
Abstract:
SAC is a purely functional array processing language designed with numerical applications in mind. It supports generic, high-level program specifications in the style of APL. However, rather than providing a fixed set of built-in array operations, SAC provides means to specify such operations in the language itself in a way that still allows their application to arrays of any rank and size. This paper illustrates the major steps in compiling generic, rank- and shape-invariant SAC specifications into efficiently executable multithreaded code for parallel execution on shared memory multiprocessors. The effectiveness of the compilation techniques is demonstrated by means of a small case study on the PDE1 benchmark, which implements 3-dimensional red/black successive over-relaxation. Comparisons with HPF and ZPL show that despite the genericity of code, SAC achieves highly competitive runtime performance characteristics.
APA, Harvard, Vancouver, ISO, and other styles
40

Stojanovic, Natalija, and Dragan Stojanovic. "Parallelizing Multiple Flow Accumulation Algorithm using CUDA and OpenACC." ISPRS International Journal of Geo-Information 8, no. 9 (September 3, 2019): 386. http://dx.doi.org/10.3390/ijgi8090386.

Full text
Abstract:
Watershed analysis, as a fundamental component of digital terrain analysis, is based on the Digital Elevation Model (DEM), which is a grid (raster) model of the Earth surface and topography. Watershed analysis consists of computationally and data intensive computing algorithms that need to be implemented by leveraging parallel and high-performance computing methods and techniques. In this paper, the Multiple Flow Direction (MFD) algorithm for watershed analysis is implemented and evaluated on multi-core Central Processing Units (CPU) and many-core Graphics Processing Units (GPU), which provides significant improvements in performance and energy usage. The implementation is based on NVIDIA CUDA (Compute Unified Device Architecture) implementation for GPU, as well as on OpenACC (Open ACCelerators), a parallel programming model, and a standard for parallel computing. Both phases of the MFD algorithm (i) iterative DEM preprocessing and (ii) iterative MFD algorithm, are parallelized and run over multi-core CPU and GPU. The evaluation of the proposed solutions is performed with respect to the execution time, energy consumption, and programming effort for algorithm parallelization for different sizes of input data. An experimental evaluation has shown not only the advantage of using OpenACC programming over CUDA programming in implementing the watershed analysis on a GPU in terms of performance, energy consumption, and programming effort, but also significant benefits in implementing it on the multi-core CPU.
APA, Harvard, Vancouver, ISO, and other styles
41

HU, ZHENJIANG, and MASATO TAKEICHI. "CALCULATING AN OPTIMAL HOMOMORPHIC ALGORITHM FOR BRACKET MATCHING." Parallel Processing Letters 09, no. 03 (September 1999): 335–45. http://dx.doi.org/10.1142/s0129626499000311.

Full text
Abstract:
It is widely recognized that a key problem of parallel computation is in the development of both efficient and correct parallel software. Although many advanced language features and compilation techniques have been proposed to alleviate the complexity of parallel programming, much effort is still required to develop parallelism in a formal and systematic way. In this paper, we intend to clarify this point by demonstrating a formal derivation of a correct but efficient homomorphic parallel algorithm for a simple language recognition problem known as bracket matching. To the best of our knowledge, our formal derivation leads to a novel divide-and-conquer parallel algorithm for bracket matching.
APA, Harvard, Vancouver, ISO, and other styles
42

MASSINGILL, BERNA L. "EXPERIMENTS WITH PROGRAM PARALLELIZATION USING ARCHETYPES AND STEPWISE REFINEMENT." Parallel Processing Letters 09, no. 04 (December 1999): 487–98. http://dx.doi.org/10.1142/s0129626499000451.

Full text
Abstract:
Parallel programming continues to be difficult and error-prone, whether starting from specifications or from an existing sequential program. This paper presents (1) a methodology for parallelizing sequential applications and (2) experiments in applying the methodology. The methodology is based on the use of stepwise refinement together with what we call parallel programming archetypes (briefly, abstractions that capture common features of classes of programs), in which most of the work of parallelization is done using familiar sequential tools and techniques, and those parts of the process that cannot be addressed with sequential tools and techniques are addressed with formally-justified transformations. The experiments consist of applying the methodology to sequential application programs, and they provide evidence that the methodology produces correct and reasonably efficient programs at reasonable human-effort cost. Of particular interest is the fact that the aspect of the methodology that is most completely formally justified is the aspect that in practice was the most trouble-free.
APA, Harvard, Vancouver, ISO, and other styles
43

Ginanjar, Arief, and Kusmaya Kusmaya. "Reliability Comparison of High Performance Computing between Single Thread Loop and Multiple Thread Loop using Java-Based Programming at Fingerprint Data Processing." JURNAL SISFOTEK GLOBAL 12, no. 1 (March 28, 2022): 11. http://dx.doi.org/10.38101/sisfotek.v12i1.449.

Full text
Abstract:
High Performance Computing is one of the mechanisms in the programming family with a focus on increasing high performance in any programming environment, especially programming languages that use virtual machine environments such as C/C++, Java and Python. several sub-clusters of information technology such as Big Data, Data Warehouse, Business Intelligence and Artificial Intelligence, the use of HPC is widely applied in sophisticated computer machines that have enterprise data processing capabilities. This research was conducted to compare the ability of HPC with algorithms or java programming techniques in data processing that uses a lot of threads when implemented in ordinary computers used in everyday life, the term java programming technique that uses one thread is called Single Thread Loop then when using multiple threads is called Multiple Thread Loop. Due this comparison between sequential and parallel process so this research try to compare between Single Thread Loop and Multiple Thread Loop in Java Programming. The minimum requirement operating system is to use the MS Windows 7 or 10 and a Unix-based OS using an Intel i5 or i7 processor and use a minimum of 16 GB of RAM.
APA, Harvard, Vancouver, ISO, and other styles
44

Garvie, Marcus, and John Burkardt. "A Parallelizable Integer Linear Programming Approach for Tiling Finite Regions of the Plane with Polyominoes." Algorithms 15, no. 5 (May 12, 2022): 164. http://dx.doi.org/10.3390/a15050164.

Full text
Abstract:
The general problem of tiling finite regions of the plane with polyominoes is NP-complete, and so the associated computational geometry problem rapidly becomes intractable for large instances. Thus, the need to reduce algorithm complexity for tiling is important and continues as a fruitful area of research. Traditional approaches to tiling with polyominoes use backtracking, which is a refinement of the ‘brute-force’ solution procedure for exhaustively finding all solutions to a combinatorial search problem. In this work, we combine checkerboard colouring techniques with a recently introduced integer linear programming (ILP) technique for tiling with polyominoes. The colouring arguments often split large tiling problems into smaller subproblems, each represented as a separate ILP problem. Problems that are amenable to this approach are embarrassingly parallel, and our work provides proof of concept of a parallelizable algorithm. The main goal is to analyze when this approach yields a potential parallel speedup. The novel colouring technique shows excellent promise in yielding a parallel speedup for finding large tiling solutions with ILP, particularly when we seek a single (optimal) solution. We also classify the tiling problems that result from applying our colouring technique according to different criteria and compute representative examples using a combination of MATLAB and CPLEX, a commercial optimization package that can solve ILP problems. The collections of MATLAB programs PARIOMINOES (v3.0.0) and POLYOMINOES (v2.1.4) used to construct the ILP problems are freely available for download.
APA, Harvard, Vancouver, ISO, and other styles
45

Chen, Liyan. "Research on Programming Model and Compilation Optimization Technology of Multi-Core GPU." Journal of Physics: Conference Series 2173, no. 1 (January 1, 2022): 012080. http://dx.doi.org/10.1088/1742-6596/2173/1/012080.

Full text
Abstract:
Abstract GPGPU (General Purpose Computing on Graphics Processing Units) has been widely applied to high performance computing. However, GPU architecture and programming model are different from that of traditional CPU. Accordingly, it is rather challenging to develop efficient GPU applications. This paper focuses on the key techniques of programming model and compiler optimization for many-core GPU, and addresses a number of key theoretical and technical issues. This paper proposes a many-threaded programming model ab-Stream, which would transparentize architecture differences and provide an easy to parallel, easy to program, easy to extend and easy to tune programming model. In addition, this paper proposes memory optimization and data transfer transformation according to data classification. Firstly, this paper proposes data layout pruning based on classification memory, and then proposes Ta T (Transfer after Transformed) for transferring Strided data between CPU and GPU. Experimental results demonstrate that proposed techniques would significantly improve performance for GPGPU applications.
APA, Harvard, Vancouver, ISO, and other styles
46

Tarnawski, Jakub. "New graph algorithms via polyhedral techniques." it - Information Technology 63, no. 3 (April 15, 2021): 177–82. http://dx.doi.org/10.1515/itit-2021-0014.

Full text
Abstract:
Abstract This article gives a short overview of my dissertation, where new algorithms are given for two fundamental graph problems. We develop novel ways of using linear programming formulations, even exponential-sized ones, to extract structure from problem instances and to guide algorithms in making progress. The first part of the dissertation addresses a benchmark problem in combinatorial optimization: the asymmetric traveling salesman problem (ATSP). It consists in finding the shortest tour that visits all vertices of a given edge-weighted directed graph. A ρ-approximation algorithm for ATSP is one that runs in polynomial time and always produces a tour at most ρ times longer than the shortest tour. Finding such an algorithm with constant ρ had been a long-standing open problem. Here we give such an algorithm. The second part of the dissertation addresses the perfect matching problem. We have known since the 1980s that it has efficient parallel algorithms if the use of randomness is allowed. However, we do not know if randomness is necessary – that is, whether the matching problem is in the class NC. We show that it is in the class quasi-NC. That is, we give a deterministic parallel algorithm that runs in poly-logarithmic time on quasi-polynomially many processors.
APA, Harvard, Vancouver, ISO, and other styles
47

FRADET, PASCAL, and JULIEN MALLET. "Compilation of a specialized functional language for massively parallel computers." Journal of Functional Programming 10, no. 6 (November 2000): 561–605. http://dx.doi.org/10.1017/s0956796800003816.

Full text
Abstract:
We propose a parallel specialized language that ensures portable and cost-predictable implementations on parallel computers. The language is basically a first-order, recursion-less, strict functional language equipped with a collection of higher-order functions or skeletons. These skeletons apply on (nested) vectors and can be grouped into four classes: computation, reorganization, communication and mask skeletons. The compilation process is described as a series of transformations and analyses leading to SPMD-like functional programs which can be directly translated into real parallel code. The language restrictions enforce a programming discipline whose benefit is to allow a static, symbolic and accurate cost analysis. The parallel cost takes into account both load balancing and communications, and can be statically evaluated even when the actual size of vectors or the number of processors are unknown. It is used to automatically select the best data distribution among a set of standard distributions. Interestingly, this work can be seen as a cross-fertilization between techniques developed within the FORTRAN parallelization, skeleton and functional programming communities.
APA, Harvard, Vancouver, ISO, and other styles
48

Paek, Yunheung, and David A. Padua. "Compiling for Scalable Multiprocessors with Polaris." Parallel Processing Letters 07, no. 04 (December 1997): 425–36. http://dx.doi.org/10.1142/s0129626497000413.

Full text
Abstract:
Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines. As a consequense, there has been much research on the development of compiler techniques to simplify programming, to increase reliability, and to reduce development costs. For code generation, a compiler applies a number of transformations in areas such as data privatization, data copying and replication, synchronization, and data and work distribution. In this paper, we discuss our recent work on the development and implementation of a few compiler techniques for some of these transformations. We use Polaris, a parallelizing Fortran restructurer developed at Illinois, as the infrastructure to implement our algorithms. The paper includes experimental results obtained by applying our techniques to several benchmark codes.
APA, Harvard, Vancouver, ISO, and other styles
49

Nigro, Libero. "Performance of Parallel K-Means Algorithms in Java." Algorithms 15, no. 4 (March 29, 2022): 117. http://dx.doi.org/10.3390/a15040117.

Full text
Abstract:
K-means is a well-known clustering algorithm often used for its simplicity and potential efficiency. Its properties and limitations have been investigated by many works reported in the literature. K-means, though, suffers from computational problems when dealing with large datasets with many dimensions and great number of clusters. Therefore, many authors have proposed and experimented different techniques for the parallel execution of K-means. This paper describes a novel approach to parallel K-means which, today, is based on commodity multicore machines with shared memory. Two reference implementations in Java are developed and their performances are compared. The first one is structured according to a map/reduce schema that leverages the built-in multi-threaded concurrency automatically provided by Java to parallel streams. The second one, allocated on the available cores, exploits the parallel programming model of the Theatre actor system, which is control-based, totally lock-free, and purposely relies on threads as coarse-grain “programming-in-the-large” units. The experimental results confirm that some good execution performance can be achieved through the implicit and intuitive use of Java concurrency in parallel streams. However, better execution performance can be guaranteed by the modular Theatre implementation which proves more adequate for an exploitation of the computational resources.
APA, Harvard, Vancouver, ISO, and other styles
50

Huang, Lan, Teng Gao, Dalin Li, Zihao Wang, and Kangping Wang. "A Highly Configurable High-Level Synthesis Functional Pattern Library." Electronics 10, no. 5 (February 25, 2021): 532. http://dx.doi.org/10.3390/electronics10050532.

Full text
Abstract:
FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography