Journal articles: 'Implicit parallelism'

1

Trilla, José Manuel Calderón, and Colin Runciman. "Improving implicit parallelism." ACM SIGPLAN Notices 50, no. 12 (2016): 153–64. http://dx.doi.org/10.1145/2887747.2804308.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Harris, Tim, and Satnam Singh. "Feedback directed implicit parallelism." ACM SIGPLAN Notices 42, no. 9 (2007): 251–64. http://dx.doi.org/10.1145/1291220.1291192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Vose, Michael D., and Alden H. Wright. "Form Invariance and Implicit Parallelism." Evolutionary Computation 9, no. 3 (2001): 355–70. http://dx.doi.org/10.1162/106365601750406037.

Full text

Abstract:

Holland's schema theorem (an inequality) may be viewed as an attempt to understand genetic search in terms of a coarse graining of the state space. Stephens and Waelbroeck developed that perspective, sharpening the schema theorem to an equality. Of particular interest is a “form invariance” of their equations; the form is unchanged by the degree of coarse graining. This paper establishes a similar form invariance for the more general model of Vose et al. and uses the attendant machinery as a springboard for an interpretation and discussion of implicit parallelism.

APA, Harvard, Vancouver, ISO, and other styles

4

Bertoni, Alberto, and Marco Dorigo. "Implicit parallelism in genetic algorithms." Artificial Intelligence 61, no. 2 (1993): 307–14. http://dx.doi.org/10.1016/0004-3702(93)90071-i.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Vose, Michael D., and Alden H. Wright. "Erratum: Form Invariance and Implicit Parallelism." Evolutionary Computation 9, no. 4 (2001): 525. http://dx.doi.org/10.1162/10636560152642896.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Ovatman, Tolga, Thomas Weigert, and Feza Buzluca. "Exploring implicit parallelism in class diagrams." Journal of Systems and Software 84, no. 5 (2011): 821–34. http://dx.doi.org/10.1016/j.jss.2011.01.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Alexandrov, Alexander, Asterios Katsifodimos, Georgi Krastev, and Volker Markl. "Implicit Parallelism through Deep Language Embedding." ACM SIGMOD Record 45, no. 1 (2016): 51–58. http://dx.doi.org/10.1145/2949741.2949754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Bik, Aart J. C., and Dennis B. Gannon. "Automatically exploiting implicit parallelism in Java." Concurrency: Practice and Experience 9, no. 6 (1997): 579–619. http://dx.doi.org/10.1002/(sici)1096-9128(199706)9:6<579::aid-cpe309>3.0.co;2-g.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Bic, L., and M. Almouhamed. "The EM-4 under Implicit Parallelism." Journal of Parallel and Distributed Computing 19, no. 3 (1993): 255–61. http://dx.doi.org/10.1006/jpdc.1993.1109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Senghor, Abdourahmane. "Barracuda, an Open Source Framework for Parallelizing Divide and Conquer Algorithm." International Journal on Cybernetics & Informatics 12, no. 2 (2023): 63–75. http://dx.doi.org/10.5121/ijci.2023.120206.

Full text

Abstract:

This paper presents a newly-created Barracuda open-source framework which aims to parallelize Java divide and conquer applications. This framework exploits implicit for-loop parallelism in dividing and merging operations. So, this makes it a mixture of parallel for-loop and task parallelism. It targets shared-memory multiprocessors and hybrid distributed shared-memory architectures. We highlight the effectiveness of the framework and focus on the performance gain and programming effort by using this framework. Barracuda aims at large public actors as well as various application domains. In terms of performance achievement, it is very close to Fork/Join framework while allowing end-users to only focus on refactoring code and experts to have the opportunity to improve it.

APA, Harvard, Vancouver, ISO, and other styles

11

Whitley, Darrell. "Deception, dominance and implicit parallelism in genetic search." Annals of Mathematics and Artificial Intelligence 5, no. 1 (1992): 49–78. http://dx.doi.org/10.1007/bf01530780.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Yang, Xiaozhong, and Lifei Wu. "An Efficient Parallel Approximate Algorithm for Solving Time Fractional Reaction-Diffusion Equations." Mathematical Problems in Engineering 2020 (August 26, 2020): 1–17. http://dx.doi.org/10.1155/2020/4524387.

Full text

Abstract:

In this paper, we construct pure alternative segment explicit-implicit (PASE-I) and implicit-explicit (PASI-E) difference algorithms for time fractional reaction-diffusion equations (FRDEs). They are a kind of difference schemes with intrinsic parallelism and based on classical explicit scheme and classical implicit scheme combined with alternating segment technology. The existence and uniqueness analysis of solutions of the parallel difference schemes are given. Both the theoretical proof and the numerical experiment show that PASE-I and PASI-E schemes are unconditionally stable and convergent with second-order spatial accuracy and 2−α order time accuracy. Compared with implicit scheme and E-I (I-E) scheme, the computational efficiency of PASE-I and PASI-E schemes is greatly improved. PASE-I and PASI-E schemes have obvious parallel computing properties, which shows that the difference schemes with intrinsic parallelism in this paper are feasible to solve the time FRDEs.

APA, Harvard, Vancouver, ISO, and other styles

13

Santos, João, and Ricardo Rocha. "A team-based scheduling model for interfacing or-parallel prolog engines." Computer Science and Information Systems 11, no. 4 (2014): 1435–54. http://dx.doi.org/10.2298/csis131025050s.

Full text

Abstract:

Logic Programming languages, such as Prolog, offer a great potential for the exploitation of implicit parallelism. One of the most noticeable sources of implicit parallelism in Prolog programs is or-parallelism. Or-parallelism arises from the simultaneous evaluation of a subgoal call against the clauses that match that call. Nowadays, multicores and clusters of multicores are becoming the norm and, although, many parallel Prolog systems have been developed in the past, to the best of our knowledge, none of them was specially designed to explore the combination of shared and distributed memory architectures. Conceptually, an or-parallel Prolog system consists of two components: an or-parallel engine (i.e., a set of independent Prolog engines which we named a team of workers) and a scheduler. In this work, we propose a team-based scheduling model to efficiently exploit parallelism between different or-parallel engines running on top of clusters of multicores. Our proposal defines a layered approach where a second-level scheduler specifies a clean interface for scheduling work between the base or-parallel engines, thus enabling different scheduling combinations to be used for distributing work among workers inside a team and among teams.

APA, Harvard, Vancouver, ISO, and other styles

14

Pan, Yueyue, Lifei Wu, and Xiaozhong Yang. "A New Class of Difference Methods with Intrinsic Parallelism for Burgers–Fisher Equation." Mathematical Problems in Engineering 2020 (August 14, 2020): 1–17. http://dx.doi.org/10.1155/2020/9162563.

Full text

Abstract:

This paper proposes a new class of difference methods with intrinsic parallelism for solving the Burgers–Fisher equation. A new class of parallel difference schemes of pure alternating segment explicit-implicit (PASE-I) and pure alternating segment implicit-explicit (PASI-E) are constructed by taking simple classical explicit and implicit schemes, combined with the alternating segment technique. The existence, uniqueness, linear absolute stability, and convergence for the solutions of PASE-I and PASI-E schemes are well illustrated. Both theoretical analysis and numerical experiments show that PASE-I and PASI-E schemes are linearly absolute stable, with 2-order time accuracy and 2-order spatial accuracy. Compared with the implicit scheme and the Crank–Nicolson (C-N) scheme, the computational efficiency of the PASE-I (PASI-E) scheme is greatly improved. The PASE-I and PASI-E schemes have obvious parallel computing properties, which show that the difference methods with intrinsic parallelism in this paper are feasible to solve the Burgers–Fisher equation.

APA, Harvard, Vancouver, ISO, and other styles

15

Coullon, Hélène, and Sébastien Limet. "The SIPSim implicit parallelism model and the SkelGIS library." Concurrency and Computation: Practice and Experience 28, no. 7 (2015): 2120–44. http://dx.doi.org/10.1002/cpe.3494.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Yang, Xiao Zhong, and Fan Zhang. "A New Parallel Difference Numerical Method for the Payment of Dividend Black-Scholes Equation." Advanced Materials Research 756-759 (September 2013): 2744–49. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.2744.

Full text

Abstract:

A new alternating segment explicit-implicit and alternating segment implicit-explicit methods for solving the payment of dividend Black-Scholes equation are presented. These new methods have several advantages such as: good parallelism, unconditional stability, convergence and better accuracy. Numerical experiments show that the methods improve the calculation speed greatly.

APA, Harvard, Vancouver, ISO, and other styles

17

SANTOS COSTA, VíTOR, INÊS DUTRA, and RICARDO ROCHA. "Threads and or-parallelism unified." Theory and Practice of Logic Programming 10, no. 4-6 (2010): 417–32. http://dx.doi.org/10.1017/s1471068410000190.

Full text

Abstract:

AbstractOne of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world applications in low cost multi-core architectures. To achieve these goals, we revive and redesign the YapOr system to exploit or-parallelism based on a multi-threaded implementation. Our new approach takes full advantage of the state-of-the-art fast and optimized YAP Prolog engine and shares the underlying execution environment, scheduler and most of the data structures used to support YapOr's model. Initial experiments with our new approach consistently achieve almost linear speedups for most of the applications, proving itself as a good alternative for exploiting implicit parallelism in the currently available low cost multi-core architectures.

APA, Harvard, Vancouver, ISO, and other styles

18

Coullon, Hélène, Jose-Maria Fullana, Pierre-Yves Lagrée, Sébastien Limet, and Xiaofei Wang. "Blood Flow Arterial Network Simulation with the Implicit Parallelism Library SkelGIS." Procedia Computer Science 29 (2014): 102–12. http://dx.doi.org/10.1016/j.procs.2014.05.010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Barboteu, M., P. Alart, and F. Lebon. "A Modified Element-by-Element Preconditioner for Elastostatics." Journal of Applied Mechanics 65, no. 2 (1998): 531–33. http://dx.doi.org/10.1115/1.2789088.

Full text

Abstract:

In this paper we present in detail a modified element-by-element strategy. This kind of method is well known to be effective for large-scale problems because of its implicit parallelism. Numerical experiments described in this paper confirm the efficiency of this solver.

APA, Harvard, Vancouver, ISO, and other styles

20

Baum, Eric B., Dan Boneh, and Charles Garrett. "Where Genetic Algorithms Excel." Evolutionary Computation 9, no. 1 (2001): 93–124. http://dx.doi.org/10.1162/10636560151075130.

Full text

Abstract:

We analyze the performance of a genetic algorithm (GA) we call Culling, and a variety of other algorithms, on a problem we refer to as the Additive Search Problem (ASP). We show that the problem of learning the Ising perceptron is reducible to a noisy version of ASP. Noisy ASP is the first problem we are aware of where a genetic-type algorithm bests all known competitors. We generalize ASP to k-ASP to study whether GAs will achieve “implicit parallelism” in a problem with many more schemata. GAs fail to achieve this implicit parallelism, but we describe an algorithm we call Explicitly Parallel Search that succeeds. We also compute the optimal culling point for selective breeding, which turns out to be independent of the fitness function or the population distribution. We also analyze a mean field theoretic algorithm performing similarly to Culling on many problems. These results provide insight into when and how GAs can beat competing methods.

APA, Harvard, Vancouver, ISO, and other styles

21

Piparo, Danilo, Philippe Canal, Guilherme Amadio, et al. "A Parallelised ROOT for Future HEP Data Processing." EPJ Web of Conferences 214 (2019): 05033. http://dx.doi.org/10.1051/epjconf/201921405033.

Full text

Abstract:

In the coming years, HEP data processing will need to exploit parallelism on present and future hardware resources to sustain the bandwidth requirements. As one of the cornerstones of the HEP software ecosystem, ROOT embraced an ambitious parallelisation plan which delivered compelling results. In this contribution the strategy is characterised as well as its evolution in the medium term. The units of the ROOT framework are discussed where task and data parallelism have been introduced, with runtime and scaling measurements. We will give an overview of concurrent operations in ROOT, for instance in the areas of I/O (reading and writing of data), fitting / minimization, and data analysis. This paper introduces the programming model and use cases for explicit and implicit parallelism, where the former is explicit in user code and the latter is implicitly managed by ROOT internally.

APA, Harvard, Vancouver, ISO, and other styles

22

Xue, Guanyu, and Hui Feng. "An Alternating Segment Explicit-Implicit Scheme with Intrinsic Parallelism for Burgers’ Equation." Journal of Computational and Theoretical Transport 49, no. 1 (2020): 15–30. http://dx.doi.org/10.1080/23324309.2019.1709081.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Lin, Yuan, and David Padua. "On the Automatic Parallelization of Sparse and Irregular Fortran Programs." Scientific Programming 7, no. 3-4 (1999): 231–46. http://dx.doi.org/10.1155/1999/108763.

Full text

Abstract:

Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.

APA, Harvard, Vancouver, ISO, and other styles

24

Escobar, Juan José, Julio Ortega, Jesús González, Miguel Damas, and Antonio Francisco Díaz. "Parallel High-dimensional Multi-objective Feature Selection for EEG Classification with Dynamic Workload Balancing on CPU–GPU Architectures." Cluster Computing 20, no. 3 (2017): 1881–97. https://doi.org/10.1007/s10586-017-0980-7.

Full text

Abstract:

Many bioinformatics applications that analyse large volumes of high-dimensional data comprise complex problems requiring metaheuristics approaches with different types of implicit parallelism. For example, although functional parallelism would be used to accelerate evolutionary algorithms, the fitness evaluation of the population could imply the computation of cost functions with data parallelism. This way, heterogeneous parallel architectures, including Central Processing Unit (CPU) microprocessors with multiple superscalar cores and accelerators such as Graphics Processing Units (GPUs) could be very useful. This paper aims to take advantage of such CPU-GPU heterogeneous architectures to accelerate Electroencephalogram (EEG) classification and feature selection problems by evolutionary multi-objective optimization, in the context of Brain Computing Interface (BCI) tasks. In this paper, we have used the OpenCL framework to develop parallel master-worker codes implementing an evolutionary multi-objective feature selection procedure in which the individuals of the population are dynamically distributed among the available CPU and GPU cores.

APA, Harvard, Vancouver, ISO, and other styles

25

Xue, Guanyu, Yunjie Gong, and Hui Feng. "The Splitting Crank–Nicolson Scheme with Intrinsic Parallelism for Solving Parabolic Equations." Journal of Function Spaces 2020 (March 30, 2020): 1–12. http://dx.doi.org/10.1155/2020/8571625.

Full text

Abstract:

In this paper, a splitting Crank–Nicolson (SC-N) scheme with intrinsic parallelism is proposed for parabolic equations. The new algorithm splits the Crank–Nicolson scheme into two domain decomposition methods, each one is applied to compute the values at (n + 1)th time level by use of known numerical solutions at n-th time level, respectively. Then, the average of the above two values is chosen to be the numerical solutions at (n + 1)th time level. The new algorithm obtains accuracy of the Crank–Nicolson scheme while maintaining parallelism and unconditional stability. This algorithm can be extended to solve two-dimensional parabolic equations by alternating direction implicit (ADI) technique. Numerical experiments illustrate the accuracy and efficiency of the new algorithm.

APA, Harvard, Vancouver, ISO, and other styles

26

Guo, Ge-yang, and Bo Liu. "A New Alternating Segment Crank-Nicolson Scheme for the Fourth-Order Parabolic Equation." ISRN Applied Mathematics 2013 (August 5, 2013): 1–9. http://dx.doi.org/10.1155/2013/370789.

Full text

Abstract:

A group of asymmetric difference schemes to approach the fourth-order parabolic equation is given. According to these schemes and the Crank-Nicolson scheme, an alternating segment Crank-Nicolson scheme with intrinsic parallelism is constructed. The truncation errors and the stability are discussed. Numerical simulations show that this new scheme has unconditional stability and high accuracy and convergency, and it is in preference to the implicit scheme method.

APA, Harvard, Vancouver, ISO, and other styles

27

COLE, MURRAY I. "PARALLEL PROGRAMMING WITH LIST HOMOMORPHISMS." Parallel Processing Letters 05, no. 02 (1995): 191–203. http://dx.doi.org/10.1142/s0129626495000175.

Full text

Abstract:

We review the use of the Bird-Meertens Formalism as a vehicle for the construction of programs with massive implicit parallelism. We show that a simple result from the theory, concerning the expression of list homomorphisms, can help us in our search for parallel algorithms by suggesting an informal methodology which is applicable when the original result is not, and demonstrate its application to a variety of problems. One of these, a language recognition algorithm, produces a program which exploits nested parallelism. Our main purpose is to show that an understanding of the homomorphism lemma can be helpful in producing parallel programs for problems which are "not quite" list homomorphisms themselves. A more general goal is to illustrate the benefits which can arise from taking a little theory with a pinch of pragmatic salt.

APA, Harvard, Vancouver, ISO, and other styles

28

Amid, Ehsan, and Manfred K. Warmuth. "An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (2020): 3179–86. http://dx.doi.org/10.1609/aaai.v34i04.5715.

Full text

Abstract:

We shed new insights on the two commonly used updates for the online k-PCA problem, namely, Krasulina's and Oja's updates. We show that Krasulina's update corresponds to a projected gradient descent step on the Stiefel manifold of orthonormal k-frames, while Oja's update amounts to a gradient descent step using the unprojected gradient. Following these observations, we derive a more implicit form of Krasulina's k-PCA update, i.e. a version that uses the information of the future gradient as much as possible. Most interestingly, our implicit Krasulina update avoids the costly QR-decomposition step by bypassing the orthonormality constraint. A related update, called the Sanger's rule, can be seen as an explicit approximation of our implicit update. We show that the new update in fact corresponds to an online EM step applied to a probabilistic k-PCA model. The probabilistic view of the update allows us to combine multiple models in a distributed setting. We show experimentally that the implicit Krasulina update yields superior convergence while being significantly faster. We also give strong evidence that the new update can benefit from parallelism and is more stable w.r.t. tuning of the learning rate.

APA, Harvard, Vancouver, ISO, and other styles

29

Huang, Xiaomeng, Xing Huang, Dong Wang, et al. "OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing." Geoscientific Model Development 12, no. 11 (2019): 4729–49. http://dx.doi.org/10.5194/gmd-12-4729-2019.

Full text

Abstract:

Abstract. Rapidly evolving computational techniques are making a large gap between scientific aspiration and code implementation in climate modeling. In this work, we design a simple computing library to bridge the gap and decouple the work of ocean modeling from parallel computing. This library provides 12 basic operators that feature user-friendly interfaces, effective programming, and implicit parallelism. Several state-of-the-art computing techniques, including computing graph and just-in-time compiling, are employed to parallelize the seemingly serial code and speed up the ocean models. These operator interfaces are designed using native Fortran programming language to smooth the learning curve. We further implement a highly readable and efficient ocean model that contains only 1860 lines of code but achieves a 91 % parallel efficiency in strong scaling and 99 % parallel efficiency in weak scaling with 4096 Intel CPU cores. This ocean model also exhibits excellent scalability on the heterogeneous Sunway TaihuLight supercomputer. This work presents a promising alternative tool for the development of ocean models.

APA, Harvard, Vancouver, ISO, and other styles

30

Hu, Xiaodong, Zhonghua Lu, Jian Zhang, et al. "A parallel algorithm for chimera grid with implicit hole cutting method." International Journal of High Performance Computing Applications 34, no. 2 (2019): 169–77. http://dx.doi.org/10.1177/1094342019845042.

Full text

Abstract:

The chimera grid methods have been widely used in the simulation of flow over complex configurations and unsteady moving boundary process. Lee and Baeder presented the implicit hole cutting (IHC) method, which improves the practicability and robustness of chimera grid method. But the excessive time consumption of this method restricts the scalability of parallelism. In this article, based on the parallel implementation of IHC method with structured multi-block grid, the factors which restrict the performance and efficiency are analyzed. Cartesian auxiliary grid is introduced to reduce the communication and computing cost. Finally, test cases are presented to demonstrate the effectiveness of this algorithm, and the calculation and data communication are reduced on the premise of maintaining accuracy.

APA, Harvard, Vancouver, ISO, and other styles

31

Borhanifar, A., and Reza Abazari. "An Unconditionally Stable Parallel Difference Scheme for Telegraph Equation." Mathematical Problems in Engineering 2009 (2009): 1–17. http://dx.doi.org/10.1155/2009/969610.

Full text

Abstract:

We use an unconditionally stable parallel difference scheme to solve telegraph equation. This method is based on domain decomposition concept and using asymmetric Saul'yev schemes for internal nodes of each sub-domain and alternating group implicit method for sub-domain's interfacial nodes. This new method has several advantages such as: good parallelism, unconditional stability and better accuracy than original Saul'yev schemes. The details of implementation and proving stability are briefly discussed. Numerical experiments on stability and accuracy are also presented.

APA, Harvard, Vancouver, ISO, and other styles

32

DENG, LIANG, HANLI BAI, FANG WANG, and QINGXIN XU. "CPU/GPU COMPUTING FOR AN IMPLICIT MULTI-BLOCK COMPRESSIBLE NAVIER-STOKES SOLVER ON HETEROGENEOUS PLATFORM." International Journal of Modern Physics: Conference Series 42 (January 2016): 1660163. http://dx.doi.org/10.1142/s2010194516601630.

Full text

Abstract:

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

APA, Harvard, Vancouver, ISO, and other styles

33

Dandi, Yatin, Luis Barba, and Martin Jaggi. "Implicit Gradient Alignment in Distributed and Federated Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 6454–62. http://dx.doi.org/10.1609/aaai.v36i6.20597.

Full text

Abstract:

A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients or mini-batches due to heterogeneity and stochasticity of the distributed data. In this work, we show that data heterogeneity can in fact be exploited to improve generalization performance through implicit regularization. One way to alleviate the effects of heterogeneity is to encourage the alignment of gradients across different clients throughout training. Our analysis reveals that this goal can be accomplished by utilizing the right optimization method that replicates the implicit regularization effect of SGD, leading to gradient alignment as well as improvements in test accuracies. Since the existence of this regularization in SGD completely relies on the sequential use of different mini-batches during training, it is inherently absent when training with large mini-batches. To obtain the generalization benefits of this regularization while increasing parallelism, we propose a novel GradAlign algorithm that induces the same implicit regularization while allowing the use of arbitrarily large batches in each update. We experimentally validate the benefits of our algorithm in different distributed and federated learning settings.

APA, Harvard, Vancouver, ISO, and other styles

34

Groh, Micah, Norman Buchanan, Derek Doyle, James B. Kowalkowski, Marc Paterno, and Saba Sehrish. "PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics." EPJ Web of Conferences 251 (2021): 03033. http://dx.doi.org/10.1051/epjconf/202125103033.

Full text

Abstract:

Modern experiments in high energy physics analyze millions of events recorded in particle detectors to select the events of interest and make measurements of physics parameters. These data can often be stored as tabular data in files with detector information and reconstructed quantities. Most current techniques for event selection in these files lack the scalability needed for high performance computing environments. We describe our work to develop a high energy physics analysis framework suitable for high performance computing. This new framework utilizes modern tools for reading files and implicit data parallelism. Framework users analyze tabular data using standard, easy-to-use data analysis techniques in Python while the framework handles the file manipulations and parallelism without the user needing advanced experience in parallel programming. In future versions, we hope to provide a framework that can be utilized on a personal computer or a high performance computing cluster with little change to the user code.

APA, Harvard, Vancouver, ISO, and other styles

35

Garstad, Benjamin. "NEBUCHADNEZZAR AND ALEXANDER IN THE EXCERPTA LATINA BARBARI." Iraq 78 (March 2, 2016): 25–48. http://dx.doi.org/10.1017/irq.2015.8.

Full text

Abstract:

The late antique Christian chronicle preserved as theExcerpta Latina Barbaricontains a brief, but extraordinary notice on the Babylonian king Nebuchadnezzar; many of its unusual details can be understood in the contexts of traditional stories about Nebuchadnezzar and the interests of the work itself. The best clue to the meaning of the passage on Nebuchadnezzar is theExcerpta's closely parallel passage on Alexander the Great. In theExcerptaNebuchadnezzar and Alexander reflect one another and in a sense compete with one another. Many of the odd details of the notice on Nebuchadnezzar can be explained as directing the reader toward this parallelism. The parallelism itself seems to serve two purposes. First, to provide symmetry to theExcerpta's idiosyncratic account of world history in which Alexander liberates the world conquered by Nebuchadnezzar. And second, to show Nebuchadnezzar subtly outdoing Alexander, so that Alexander's encounter with the God of the Jews, as it is found in theExcerpta, can be provided with an implicit interpretation and characterization.

APA, Harvard, Vancouver, ISO, and other styles

36

Chen, Lei, Kalyanmoy Deb, and Hai-Lin Liu. "Explicit Control of Implicit Parallelism in Decomposition-Based Evolutionary Many-Objective Optimization Algorithms [Research Frontier]." IEEE Computational Intelligence Magazine 14, no. 4 (2019): 52–64. http://dx.doi.org/10.1109/mci.2019.2937612.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Cui, Yun Fei, Xin Ming Li, Ke Wei Dong, and Ji Lu Zhu. "Cloud Computing Resource Scheduling Method Research Based on Improved Genetic Algorithm." Advanced Materials Research 271-273 (July 2011): 552–57. http://dx.doi.org/10.4028/www.scientific.net/amr.271-273.552.

Full text

Abstract:

Resource scheduling is the key port of cloud computing resource management. One excellent method may enhance the efficiency of the whole cloud computing system, and effectively share resource in wide area. Genetic Algorithm has adaptability, global optimization ability and implicit parallelism, which is not in other methods. For the sake of scheduling effective resource to accomplish relevant task, improved genetic algorithm is adopted in cloud computing resource scheduling research. Finally, a simulation based on cloudsim is carried out, which proves the correctness and validity of the scheduling method mentioned in this paper.

APA, Harvard, Vancouver, ISO, and other styles

38

Buwalda, Floris J. L., Erik De Goede, Maxim Knepflé, and Cornelis Vuik. "Comparison of an Explicit and Implicit Time Integration Method on GPUs for Shallow Water Flows on Structured Grids." Water 15, no. 6 (2023): 1165. http://dx.doi.org/10.3390/w15061165.

Full text

Abstract:

The accuracy, stability and computational efficiency of numerical methods on central processing units (CPUs) for the depth-averaged shallow water equations were well covered in the literature. A large number of these methods were already developed and compared. However, on graphics processing units (GPUs), such comparisons are relatively scarce. In this paper, we present the results of comparing two time-integration methods for the shallow water equations on structured grids. An explicit and a semi-implicit time integration method were considered. For the semi-implicit method, the performance of several iterative solvers was compared. The implementation of the semi-implicit method on a GPU in this study was a novel approach for the shallow water equations. This also holds for the repeated red black (RRB) solver that was found to be very efficient on a GPU. Additionally, the results of both methods were compared with several CPU-based software systems for the shallow water flows on structured grids. On a GPU, the simulations were 25 to 75 times faster than on a CPU. Theory predicts an explicit method to be best suited for a GPU due to the higher level of inherent parallelism. It was found that both the explicit and the semi-implicit methods ran efficiently on a GPU. For very shallow applications, the explicit method was preferred because the stability condition on the time step was not very restrictive. However, for deep water applications, we expect the semi-implicit method to be preferred.

APA, Harvard, Vancouver, ISO, and other styles

39

Metivet, Thibaut, Vincent Chabannes, Mourad Ismail, and Christophe Prud’homme. "High-Order Finite-Element Framework for the Efficient Simulation of Multifluid Flows." Mathematics 6, no. 10 (2018): 203. http://dx.doi.org/10.3390/math6100203.

Full text

Abstract:

In this paper, we present a comprehensive framework for the simulation of Multifluid flows based on the implicit level-set representation of interfaces and on an efficient solving strategy of the Navier-Stokes equations. The mathematical framework relies on a modular coupling approach between the level-set advection and the fluid equations. The space discretization is performed with possibly high-order stable finite elements while the time discretization features implicit Backward Differentation Formulae of arbitrary order. This framework has been implemented within the Feel++ library, and features seamless distributed parallelism with fast assembly procedures for the algebraic systems and efficient preconditioning strategies for their resolution. We also present simulation results for a three-dimensional Multifluid benchmark, and highlight the importance of using high-order finite elements for the level-set discretization for problems involving the geometry of the interfaces, such as the curvature or its derivatives.

APA, Harvard, Vancouver, ISO, and other styles

40

Li, Jianlin. "Efficient parallelism in Breadth-First Search: A comprehensive analysis and implementation." Applied and Computational Engineering 36, no. 1 (2024): 185–91. http://dx.doi.org/10.54254/2755-2721/36/20230443.

Full text

Abstract:

The Breadth-First Search (BFS) entails a systematic traversal of a given graph, G = (V, E), layer by layer, starting from a specific vertex. Recognized as a cornerstone methodology for graph exploration, the importance of BFS has skyrocketed, especially with the increasing demands of graph-based data processing. However, as the vertex count expands, traditional serial implementations reveal their limitations, faltering in terms of time and space efficiency. This paper aims to contrast the efficiencies of standard BFS with its parallelized iteration. Introducing a shared-memory model of level-synchronous parallel BFS, the approach integrates optimizations to navigate the challenges posed by implicit barriers and critical sections. Employing the Graph500 benchmark, this parallel methodology is meticulously evaluated, highlighting the speedup concerning various thread counts. Initial findings unveil a compelling pattern: speedup generally correlates positively with the number of active threads. However, if the thread count breaches the system's inherent capacity, the speedup hits a plateau, showing only marginal fluctuations without significant increases. These statistical revelations not only vouch for the advantages of BFS parallelization but also emphasize a critical insight: judiciously increasing thread count, up to a system-specified limit, can yield peak efficiency.

APA, Harvard, Vancouver, ISO, and other styles

41

Gong, Chunye, Weimin Bao, Guojian Tang, Yuewen Jiang, and Jie Liu. "A Domain Decomposition Method for Time Fractional Reaction-Diffusion Equation." Scientific World Journal 2014 (2014): 1–5. http://dx.doi.org/10.1155/2014/681707.

Full text

Abstract:

The computational complexity of one-dimensional time fractional reaction-diffusion equation isO(N2M)compared withO(NM)for classical integer reaction-diffusion equation. Parallel computing is used to overcome this challenge. Domain decomposition method (DDM) embodies large potential for parallelization of the numerical solution for fractional equations and serves as a basis for distributed, parallel computations. A domain decomposition algorithm for time fractional reaction-diffusion equation with implicit finite difference method is proposed. The domain decomposition algorithm keeps the same parallelism but needs much fewer iterations, compared with Jacobi iteration in each time step. Numerical experiments are used to verify the efficiency of the obtained algorithm.

APA, Harvard, Vancouver, ISO, and other styles

42

Xiao, Xue, Qing Hong Wu, and Ying Zhang. "Recognition of Paper Currency Research Based on AGA-BP Neural Network." Advanced Materials Research 989-994 (July 2014): 3968–72. http://dx.doi.org/10.4028/www.scientific.net/amr.989-994.3968.

Full text

Abstract:

The genetic algorithm is a randomized search method for a class of reference biological evolution of the law evolved, with global implicit parallelism inherent and better optimization. This paper presents an adaptive genetic algorithm to optimize the use of BP neural network method, namely the structure of weights and thresholds to optimize BP neural network to achieve the recognition of banknotes oriented. Experimental results show that after using genetic algorithms to optimize BP neural network controller can accurately and quickly achieved recognition effect on banknote recognition accuracy compared to traditional BP neural network has been greatly improved, improved network adaptive capacity and generalization ability.

APA, Harvard, Vancouver, ISO, and other styles

43

ROSENBLUETH, DAVID A. "Chain programs for writing deterministic metainterpreters." Theory and Practice of Logic Programming 2, no. 2 (2002): 203–32. http://dx.doi.org/10.1017/s147106840100134x.

Full text

Abstract:

Many metainterpreters found in the logic programming literature are nondeterministic in the sense that the selection of program clauses is not determined. Examples are the familiar ‘demo’ and ‘vanilla’ metainterpreters. For some applications this nondeterminism is convenient. In some cases, however, a deterministic metainterpreter, having an explicit selection of clauses, is needed. Such cases include (1) conversion of OR parallelism into AND parallelism for ‘committed-choice’ processors, (2) logic-based, imperative-language implementation of search strategies, and (3) simulation of bounded-resource reasoning. Deterministic metainterpreters are difficult to write because the programmer must be concerned about the set of unifiers of the children of a node in the derivation tree. We argue that it is both possible and advantageous to write these metainterpreters by reasoning in terms of object programs converted into a syntactically restricted form that we call ‘chain’ form, where we can forget about unification, except for unit clauses. We give two transformations converting logic programs into chain form, one for ‘moded’ programs (implicit in two existing exhaustive-traversal methods for committed-choice execution), and one for arbitrary definite programs. As illustrations of our approach we show examples of the three applications mentioned above.

APA, Harvard, Vancouver, ISO, and other styles

44

Tang, Qiang, Xiaomeng Huang, Lei Lin, et al. "MERF v3.0, a highly computationally efficient non-hydrostatic ocean model with implicit parallelism: Algorithms and validation experiments." Ocean Modelling 167 (November 2021): 101877. http://dx.doi.org/10.1016/j.ocemod.2021.101877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

DUBOIS, D., J. L. MARICHAL, H. PRADE, M. ROUBENS, and R. SABBADIN. "THE USE OF THE DISCRETE SUGENO INTEGRAL IN DECISION-MAKING: A SURVERY." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 09, no. 05 (2001): 539–61. http://dx.doi.org/10.1142/s0218488501001058.

Full text

Abstract:

An overview of the use of the discrete Sugeno integral as either an aggregation tool or a preference functional is presented in the qualitative framework of two decision paradigms: multi-criteria decision-making and decision-making under uncertainty. The parallelism between the representation theorems in both settings is stressed, even if a basic requirement like the idempotency of the aggregation scheme should be explicitely stated in multi-criteria decision-making, while its counterpart is implicit in decision under uncertainty by equating the utility of a constant act with the utility of its consequence. Important particular cases of Sugeno integrals such as prioritized minimum and maximum operators, their ordered versions, and Boolean max-min functions are studied.

APA, Harvard, Vancouver, ISO, and other styles

46

AREIAS, MIGUEL, and RICARDO ROCHA. "Table space designs for implicit and explicit concurrent tabled evaluation." Theory and Practice of Logic Programming 18, no. 5-6 (2018): 950–92. http://dx.doi.org/10.1017/s147106841800039x.

Full text

Abstract:

AbstractOne of the main advantages of Prolog is its potential for theimplicit exploitation of parallelismand, as a high-level language, Prolog is also often used as a means toexplicitly control concurrent tasks. Tabling is a powerful implementation technique that overcomes some limitations of traditional Prolog systems in dealing with recursion and redundant sub-computations. Given these advantages, the question that arises is if tabling has also the potential for the exploitation of concurrency/parallelism. On one hand, tabling still exploits a search space as traditional Prolog but, on the other hand, the concurrent model of tabling is necessarily far more complex, since it also introduces concurrency on the access to the tables. In this paper, we summarize Yap's main contributions to concurrent tabled evaluation and we describe the design and implementation challenges of several alternative table space designs for implicit and explicit concurrent tabled evaluation that represent different trade-offs between concurrency and memory usage. We also motivate for the advantages of usingfixed-sizeandlock-freedata structures, elaborate on the key role that the engine'smemory allocatorplays on such environments, and discuss how Yap's mode-directed tabling support can be extended to concurrent evaluation. Finally, we present our future perspectives toward an efficient and novel concurrent framework which integrates both implicit and explicit concurrent tabled evaluation in a single Prolog engine.

APA, Harvard, Vancouver, ISO, and other styles

47

Bradmetz, Joël, and Claire Bonnefoy-Claudet. "Do young children acquire the meaning of to know and to believe simultaneously or not?" International Journal of Behavioral Development 27, no. 2 (2003): 109–15. http://dx.doi.org/10.1080/01650250244000065.

Full text

Abstract:

The conceptual meaning and linguistic use of to know are usually considered to occur earlier than those of to believe. However, the data supporting this claim do not take into account some sources of variation: The difference in the assessment between comprehension and production and the link established between action and representation in standard tasks like that of Wimmer and Perner(1983). The authors counter this claim and attempt to demonstrate a developmental parallelism between the two epistemic operators to know and to believe. This parallelism would be due to the absence of a link between belief and action in a first phase, both developing in a modular system but linked to implicit or explicit access to information, contrary to the usual conception in the literature. Three experiments are reported. The first and the second showed an equal difficulty level between to know and to believe in comprehension in both a declarative and a procedural false belief task and, to the contrary, a lag between the comprehension of to believe and the prediction of a declaration or an action based on a false belief. The third demonstrated that earlier success in attributing a false belief to the other was not a false positive.

APA, Harvard, Vancouver, ISO, and other styles

48

Cai, Yang Jun, and Zhao Le. "Study on Custom Service Combination Based on BPEL." Advanced Materials Research 605-607 (December 2012): 2451–56. http://dx.doi.org/10.4028/www.scientific.net/amr.605-607.2451.

Full text

Abstract:

A custom service combination based on Business Process Execution Language was put forward. It mainly studied one-off recipient and keeping the business logic order unchanged. It was proposed a solution that made the serial workflow realize the local parallelism from three aspects such as message dependency, a directed acyclic graph to workflow, and the implicit message dependency. The algorithm of implementation was also discussed. A custom service combination application of ‘stocking house transactions’ and ‘stocking house mortgage’ showed the feasibility and validity of this algorithm. The system generates the BPEL by analyzing various business stakeholders’ Web Service Description Language interface thereby determining the dependence of each business order, together with maintaining the existing business logic. The BPEL enhances the local parallel processing, and makes the overall processing time reduced.

APA, Harvard, Vancouver, ISO, and other styles

49

BASTOUL, CÉDRIC, and PAUL FEAUTRIER. "ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY." Parallel Processing Letters 15, no. 01n02 (2005): 3–17. http://dx.doi.org/10.1142/s0129626405002027.

Full text

Abstract:

Program transformations are one of the most valuable compiler techniques to improve parallelism or data locality. However, restructuring compilers have a hard time coping with data dependences. A typical solution is to focus on program parts where the dependences are simple enough to enable any transformation. For more complex problems is only addressed the question of checking whether a transformation is legal or not. In this paper we propose to go further. Starting from a transformation with no guarantee on legality, we show how we can correct it for dependence satisfaction. Two directions are explored: first when transformation properties can be explicitly expressed and second when they are implicit as in the data locality transformation case. Generating code having the best properties is a direct application of this result.

APA, Harvard, Vancouver, ISO, and other styles

50

Chen, Anka He, Ziheng Liu, Yin Yang, and Cem Yuksel. "Vertex Block Descent." ACM Transactions on Graphics 43, no. 4 (2024): 1–16. http://dx.doi.org/10.1145/3658179.

Full text

Abstract:

We introduce vertex block descent, a block coordinate descent solution for the variational form of implicit Euler through vertex-level Gauss-Seidel iterations. It operates with local vertex position updates that achieve reductions in global variational energy with maximized parallelism. This forms a physics solver that can achieve numerical convergence with unconditional stability and exceptional computation performance. It can also fit in a given computation budget by simply limiting the iteration count while maintaining its stability and superior convergence rate. We present and evaluate our method in the context of elastic body dynamics, providing details of all essential components and showing that it outperforms alternative techniques. In addition, we discuss and show examples of how our method can be used for other simulation systems, including particle-based simulations and rigid bodies.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Implicit parallelism'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles