Conecte-se

Bibliografias temáticas / CPU-GPU Partitioning / Artigos de revistas

Siga este link para ver outros tipos de publicações sobre o tema: CPU-GPU Partitioning.

Artigos de revistas sobre o tema "CPU-GPU Partitioning"

Autor: Grafiati

Publicado: 22 de fevereiro de 2025

Última modificação: 31 de julho de 2025

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Veja os 46 melhores artigos de revistas para estudos sobre o assunto "CPU-GPU Partitioning".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Veja os artigos de revistas das mais diversas áreas científicas e compile uma bibliografia correta.

1

Benatia, Akrem, Weixing Ji, Yizhuo Wang, and Feng Shi. "Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms." International Journal of High Performance Computing Applications 34, no. 1 (2019): 66–80. http://dx.doi.org/10.1177/1094342019886628.

Texto completo da fonte

Resumo:

Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available processing units. The partitioning problem is more challenging with the existence of many s

Estilos ABNT, Harvard, Vancouver, APA, etc.

2

Narayana, Divyaprabha Kabbal, and Sudarshan Tekal Subramanyam Babu. "Optimal task partitioning to minimize failure in heterogeneous computational platform." International Journal of Electrical and Computer Engineering (IJECE) 15, no. 1 (2025): 1079. http://dx.doi.org/10.11591/ijece.v15i1.pp1079-1088.

Texto completo da fonte

Resumo:

The increased energy consumption by heterogeneous cloud platforms surges the carbon emissions and reduces system reliability, thus, making workload scheduling an extremely challenging process. The dynamic voltage- frequency scaling (DVFS) technique provides an efficient mechanism in improving the energy efficiency of cloud platform; however, employing DVFS reduces reliability and increases the failure rate of resource scheduling. Most of the current workload scheduling methods have failed to optimize the energy and reliability together under a central processing unit - graphical processing uni

Estilos ABNT, Harvard, Vancouver, APA, etc.

3

Huijing Yang and Tingwen Yu. "Two novel cache management mechanisms on CPU-GPU heterogeneous processors." Research Briefs on Information and Communication Technology Evolution 7 (June 15, 2021): 1–8. http://dx.doi.org/10.56801/rebicte.v7i.113.

Texto completo da fonte

Resumo:

Heterogeneous multicore processors that take full advantage of CPUs and GPUs within the samechip raise an emerging challenge for sharing a series of on-chip resources, particularly Last-LevelCache (LLC) resources. Since the GPU core has good parallelism and memory latency tolerance,the majority of the LLC space is utilized by GPU applications. Under the current cache managementpolicies, the LLC sharing of CPU applications can be remarkably decreased due to the existence ofGPU workloads, thus seriously affecting the overall performance. To alleviate the unfair contentionwithin CPUs and GPUs for

Estilos ABNT, Harvard, Vancouver, APA, etc.

4

Narayana, Divyaprabha Kabbal, and Sudarshan Tekal Subramanyam Babu. "Optimal task partitioning to minimize failure in heterogeneous computational platform." International Journal of Electrical and Computer Engineering (IJECE) 15 (February 1, 2025): 1079–88. https://doi.org/10.11591/ijece.v15i1.pp1079-1088.

Texto completo da fonte

Resumo:

The increased energy consumption by heterogeneous cloud platforms surges the carbon emissions and reduces system reliability, thus, making workload scheduling an extremely challenging process. The dynamic voltage-frequency scaling (DVFS) technique provides an efficient mechanism in improving the energy efficiency of cloud platform; however, employing DVFS reduces reliability and increases the failure rate of resource scheduling. Most of the current workload scheduling methods have failed to optimize the energy and reliability together under a central processing un

Estilos ABNT, Harvard, Vancouver, APA, etc.

5

Fang, Juan, Mengxuan Wang, and Zelin Wei. "A memory scheduling strategy for eliminating memory access interference in heterogeneous system." Journal of Supercomputing 76, no. 4 (2020): 3129–54. http://dx.doi.org/10.1007/s11227-019-03135-7.

Texto completo da fonte

Resumo:

AbstractMultiple CPUs and GPUs are integrated on the same chip to share memory, and access requests between cores are interfering with each other. Memory requests from the GPU seriously interfere with the CPU memory access performance. Requests between multiple CPUs are intertwined when accessing memory, and its performance is greatly affected. The difference in access latency between GPU cores increases the average latency of memory accesses. In order to solve the problems encountered in the shared memory of heterogeneous multi-core systems, we propose a step-by-step memory scheduling strateg

Estilos ABNT, Harvard, Vancouver, APA, etc.

6

MERRILL, DUANE, and ANDREW GRIMSHAW. "HIGH PERFORMANCE AND SCALABLE RADIX SORTING: A CASE STUDY OF IMPLEMENTING DYNAMIC PARALLELISM FOR GPU COMPUTING." Parallel Processing Letters 21, no. 02 (2011): 245–72. http://dx.doi.org/10.1142/s0129626411000187.

Texto completo da fonte

Resumo:

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compar

Estilos ABNT, Harvard, Vancouver, APA, etc.

7

Vilches, Antonio, Rafael Asenjo, Angeles Navarro, Francisco Corbera, Rub́en Gran, and María Garzarán. "Adaptive Partitioning for Irregular Applications on Heterogeneous CPU-GPU Chips." Procedia Computer Science 51 (2015): 140–49. http://dx.doi.org/10.1016/j.procs.2015.05.213.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

8

Sung, Hanul, Hyeonsang Eom, and HeonYoung Yeom. "The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU." KIISE Transactions on Computing Practices 20, no. 9 (2014): 507–12. http://dx.doi.org/10.5626/ktcp.2014.20.9.507.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

9

Wang, Shunjiang, Baoming Pu, Ming Li, Weichun Ge, Qianwei Liu, and Yujie Pei. "State Estimation Based on Ensemble DA–DSVM in Power System." International Journal of Software Engineering and Knowledge Engineering 29, no. 05 (2019): 653–69. http://dx.doi.org/10.1142/s0218194019400023.

Texto completo da fonte

Resumo:

This paper investigates the state estimation problem of power systems. A novel, fast and accurate state estimation algorithm is presented to solve this problem based on the one-dimensional denoising autoencoder and deep support vector machine (1D DA–DSVM). Besides, for further reducing the computation burden, a partitioning method is presented to divide the power system into several sub-networks and the proposed algorithm can be applied to each sub-network. A hybrid computing architecture of Central Processing Unit (CPU) and Graphics Processing Unit (GPU) is employed in the overall state estim

Estilos ABNT, Harvard, Vancouver, APA, etc.

10

Park, Sungwoo, Seyeon Oh, and Min-Soo Kim. "cuMatch: A GPU-based Memory-Efficient Worst-case Optimal Join Processing Method for Subgraph Queries with Complex Patterns." Proceedings of the ACM on Management of Data 3, no. 3 (2025): 1–28. https://doi.org/10.1145/3725398.

Texto completo da fonte

Resumo:

Subgraph queries are widely used but face significant challenges due to complex patterns such as negative and optional edges. While worst-case optimal joins have proven effective for subgraph queries with regular patterns, no method has been proposed that can process queries involving complex patterns in a single multi-way join. Existing CPU-based and GPU-based methods experience intermediate data explosion when processing complex patterns following regular patterns. In addition, GPU-based methods struggle with issues of wasted GPU memory and redundant computation. In this paper, we propose cu

Estilos ABNT, Harvard, Vancouver, APA, etc.

11

Barreiros, Willian, Alba C. M. A. Melo, Jun Kong, et al. "Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioning." Journal of Parallel and Distributed Computing 164 (June 2022): 40–54. http://dx.doi.org/10.1016/j.jpdc.2022.02.004.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

12

Singh, Amit Kumar, Alok Prakash, Karunakar Reddy Basireddy, Geoff V. Merrett, and Bashir M. Al-Hashimi. "Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs." ACM Transactions on Embedded Computing Systems 16, no. 5s (2017): 1–22. http://dx.doi.org/10.1145/3126548.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

13

Hou, Neng, Fazhi He, Yi Zhou, Yilin Chen, and Xiaohu Yan. "A Parallel Genetic Algorithm With Dispersion Correction for HW/SW Partitioning on Multi-Core CPU and Many-Core GPU." IEEE Access 6 (2018): 883–98. http://dx.doi.org/10.1109/access.2017.2776295.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

14

Mahmud, Shohaib, Haiying Shen, and Anand Iyer. "PACER: Accelerating Distributed GNN Training Using Communication-Efficient Partition Refinement and Caching." Proceedings of the ACM on Networking 2, CoNEXT4 (2024): 1–18. http://dx.doi.org/10.1145/3697805.

Texto completo da fonte

Resumo:

Despite recent breakthroughs in distributed Graph Neural Network (GNN) training, large-scale graphs still generate significant network communication overhead, decreasing time and resource efficiency. Although recently proposed partitioning or caching methods try to reduce communication inefficiencies and overheads, they are not sufficiently effective due to their sampling pattern-agnostic nature. This paper proposes a Pipelined Partition Aware Caching and Communication Efficient Refinement System (Pacer), a communication-efficient distributed GNN training system. First, Pacer intelligently est

Estilos ABNT, Harvard, Vancouver, APA, etc.

15

Chen, Hao, Anqi Wei, and Ye Zhang. "Three-level parallel-set partitioning in hierarchical trees coding based on the collaborative CPU and GPU for remote sensing images compression." Journal of Applied Remote Sensing 11, no. 04 (2017): 1. http://dx.doi.org/10.1117/1.jrs.11.045015.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

16

Wu, Qunyong, Yuhang Wang, Haoyu Sun, Han Lin, and Zhiyuan Zhao. "A System Coupled GIS and CFD for Atmospheric Pollution Dispersion Simulation in Urban Blocks." Atmosphere 14, no. 5 (2023): 832. http://dx.doi.org/10.3390/atmos14050832.

Texto completo da fonte

Resumo:

Atmospheric pollution is a critical issue in public health systems. The simulation of atmospheric pollution dispersion in urban blocks, using CFD, faces several challenges, including the complexity and inefficiency of existing CFD software, time-consuming construction of CFD urban block geometry, and limited visualization and analysis capabilities of simulation outputs. To address these challenges, we have developed a prototype system that couples 3DGIS and CFD for simulating, visualizing, and analyzing atmospheric pollution dispersion. Specifically, a parallel algorithm for coordinate transfo

Estilos ABNT, Harvard, Vancouver, APA, etc.

17

Giannoula, Christina, Ivan Fernandez, Juan Gómez Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. "SparseP." Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, no. 1 (2022): 1–49. http://dx.doi.org/10.1145/3508041.

Texto completo da fonte

Resumo:

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the Sparse Matrix Vector Multiplication (SpMV) kernel. SpMV has been

Estilos ABNT, Harvard, Vancouver, APA, etc.

18

Kumar, P. S. Jagadeesh, Tracy Lin Huan, and Yang Yung. "Computational Paradigm and Quantitative Optimization to Parallel Processing Performance of Still Image Compression." Circulation in Computer Science 2, no. 4 (2017): 11–17. http://dx.doi.org/10.22632/ccs-2017-252-02.

Texto completo da fonte

Resumo:

Fashionable and staggering evolution in inferring the parallel processing routine coupled with the necessity to amass and distribute huge magnitude of digital records especially still images has fetched an amount of confronts for researchers and other stakeholders. These disputes exorbitantly outlay and maneuvers the digital information among others, subsists the spotlight of the research civilization in topical days and encompasses the lead to the exploration of image compression methods that can accomplish exceptional outcomes. One of those practices is the parallel processing of a diversity

Estilos ABNT, Harvard, Vancouver, APA, etc.

19

Duan, Jiaang, Shiyou Qian, Hanwen Hu, Dingyu Yang, Jian Cao, and Guangtao Xue. "PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms." ACM SIGMETRICS Performance Evaluation Review 53, no. 1 (2025): 151–53. https://doi.org/10.1145/3744970.3727307.

Texto completo da fonte

Resumo:

The fusion of serverless computing and deep learning (DL) has led to serverless inference, offering a promising approach for developing and deploying scalable and cost-efficient deep learning inference services (DLISs). However, the challenge of cold start presents a significant obstacle for DLISs, where DL model size greatly impacts latency. Existing studies mitigate cold starts by extending keep-alive times, which unfortunately leads to decreased resource utilization efficiency. To address this issue, we introduce PipeCo, a system designed to alleviate DLIS cold start. The core concept of Pi

Estilos ABNT, Harvard, Vancouver, APA, etc.

20

Duan, Jiaang, Shiyou Qian, Hanwen Hu, Dingyu Yang, Jian Cao, and Guangtao Xue. "PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms." Proceedings of the ACM on Measurement and Analysis of Computing Systems 9, no. 2 (2025): 1–23. https://doi.org/10.1145/3727125.

Texto completo da fonte

Resumo:

The fusion of serverless computing and deep learning (DL) has led to serverless inference, offering a promising approach for developing and deploying scalable and cost-efficient deep learning inference services (DLISs). However, the challenge of cold start presents a significant obstacle for DLISs, where DL model size greatly impacts latency. Existing studies mitigate cold starts by extending keep-alive times, which unfortunately leads to decreased resource utilization efficiency. To address this issue, we introduce PipeCo, a system designed to alleviate DLIS cold start. The core concept of Pi

Estilos ABNT, Harvard, Vancouver, APA, etc.

21

TANAKA, SATOSHI, KYOKO HASEGAWA, SUSUMU NAKATA, et al. "GRID-INDEPENDENT METROPOLIS SAMPLING FOR VOLUME VISUALIZATION." International Journal of Modeling, Simulation, and Scientific Computing 01, no. 02 (2010): 199–218. http://dx.doi.org/10.1142/s1793962310000158.

Texto completo da fonte

Resumo:

We propose a method of sampling regular and irregular-grid volume data for visualization. The method is based on the Metropolis algorithm that is a type of Monte Carlo technique. Our method enables "importance sampling" of local regions of interest in the visualization by generating sample points intensively in regions where a user-specified transfer function takes the peak values. The generated sample-point distribution is independent of the grid structure of the given volume data. Therefore, our method is applicable to irregular grids as well as regular grids. We demonstrate the effectivenes

Estilos ABNT, Harvard, Vancouver, APA, etc.

22

Gu, Yufeng, Arun Subramaniyan, Tim Dunn, et al. "GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis." Communications of the ACM 68, no. 05 (2025): 81–90. https://doi.org/10.1145/3712168.

Texto completo da fonte

Resumo:

Genomics is playing an important role in transforming healthcare. Genetic data, however, is being produced at a rate that far outpaces Moore's Law. Many efforts have been made to accelerate genomics kernels on modern commodity hardware, such as CPUs and GPUs, as well as custom accelerators (ASICs) for specific genomics kernels. While ASICs provide higher performance and energy efficiency than general-purpose hardware, they incur a high hardware-design cost. Moreover, to extract the best performance, ASICs tend to have significantly different architectures for different kernels. The divergence

Estilos ABNT, Harvard, Vancouver, APA, etc.

23

Bloch, Aurelien, Simone Casale-Brunet, and Marco Mattavelli. "Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms by Dynamic Network Execution." Journal of Low Power Electronics and Applications 12, no. 3 (2022): 36. http://dx.doi.org/10.3390/jlpea12030036.

Texto completo da fonte

Resumo:

The performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the processing on the various different processing units. In other words, it depends on the assumptions and parameters that define the partitioning, mapping, scheduling, and allocation of data exchanges among the various processing elements of the platform executing the program. The advantage of programs written in languages using the dataflow model of computation (MoC) is that executing the program with different configurations and parameter settings does n

Estilos ABNT, Harvard, Vancouver, APA, etc.

24

Gallet, Benoit, and Michael Gowanlock. "Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies." Data Science and Engineering, October 21, 2020. http://dx.doi.org/10.1007/s41019-020-00145-x.

Texto completo da fonte

Resumo:

Abstract Given two datasets (or tables) A and B and a search distance $$\epsilon$$ ϵ , the distance similarity join, denoted as $$A \ltimes _\epsilon B$$ A ⋉ ϵ B , finds the pairs of points ($$p_a$$ p a , $$p_b$$ p b ), where $$p_a \in A$$ p a ∈ A and $$p_b \in B$$ p b ∈ B , and such that the distance between $$p_a$$ p a and $$p_b$$ p b is $$\le \epsilon$$ ≤ ϵ . If $$A = B$$ A = B , then the similarity join is equivalent to a similarity self-join, denoted as $$A \bowtie _\epsilon A$$ A ⋈ ϵ A . We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance

Estilos ABNT, Harvard, Vancouver, APA, etc.

25

Campos, Cristian, Rafael Asenjo, Javier Hormigo, and Angeles Navarro. "Leveraging SYCL for Heterogeneous cDTW Computation on CPU, GPU, and FPGA." Concurrency and Computation: Practice and Experience 37, no. 15-17 (2025). https://doi.org/10.1002/cpe.70142.

Texto completo da fonte

Resumo:

ABSTRACTOne of the most time‐consuming kernels of a recent epileptic seizure detection application is the computation of the constrained Dynamic Time Warping (cDTW) Distance Matrix. In this paper, we explore the design space of heterogeneous CPU, GPU, and FPGA implementations of this kernel using SYCL as a programming model. First, we optimize the CPU implementation leveraging the SIMD capability of SYCL and compare it with the latest C++26 SIMD library. Next, we tune the SYCL code to run on an on‐chip GPU, iGPU, as well as on a discrete NVIDIA GPU, dGPU. We also develop a SYCL implementation

Estilos ABNT, Harvard, Vancouver, APA, etc.

26

Lee, Wan Luan, Dian-Lun Lin, Shui Jiang, et al. "G-kway: Multilevel GPU-Accelerated k-way Graph Partitioner using Task Graph Parallelism." ACM Transactions on Design Automation of Electronic Systems, May 3, 2025. https://doi.org/10.1145/3734522.

Texto completo da fonte

Resumo:

Graph partitioning is important for the design of many CAD algorithms. However, as the graph size continues to grow, graph partitioning becomes increasingly time-consuming. Recent research has introduced parallel graph partitioners using either multi-core CPUs or GPUs. However, the speedup of existing CPU graph partitioners is typically limited to a few cores, while the performance of GPU-based solutions is algorithmically limited by available GPU memory. To overcome these challenges, we propose G-kway, an efficient multilevel GPU-accelerated k -way graph partitioner. G-kway introduces an effe

Estilos ABNT, Harvard, Vancouver, APA, etc.

27

Wu, Zhenlin, Haosong Zhao, Hongyuan Liu, Wujie Wen, and Jiajia Li. "gHyPart: GPU-friendly End-to-End Hypergraph Partitioner." ACM Transactions on Architecture and Code Optimization, January 10, 2025. https://doi.org/10.1145/3711925.

Texto completo da fonte

Resumo:

Hypergraph partitioning finds practical applications in various fields, such as high-performance computing and circuit partitioning in VLSI physical design, where high-performance solutions often demand substantial parallelism beyond what existing CPU-based solutions can offer. While GPUs are promising in this regard, their potential in hypergraph partitioning remains unexplored. In this work, we first develop an end-to-end deterministic hypergraph partitioner on GPUs, ported from state-of-the-art multi-threaded CPU work, and identify three major performance challenges by characterizing its pe

Estilos ABNT, Harvard, Vancouver, APA, etc.

28

"Improving Processing Speed of Real-Time Stereo Matching using Heterogenous CPU/GPU Model." International Journal of Innovative Technology and Exploring Engineering 9, no. 5 (2020): 1983–87. http://dx.doi.org/10.35940/ijitee.e2982.039520.

Texto completo da fonte

Resumo:

This paper presents an improvement of the processing speed of the stereo matching problem. The time required for stereo matching represents a problem for many real time applications such as robot navigation , self-driving vehicles and object tracking. In this work, a real-time stereo matching system is proposed that utilizes the parallelism of Graphics Processing Unit (GPU). An area based stereo matching system is used to generate the disparity map. Four different sequential and parallel computational models are used to analyze the time consumed by the stereo matching. The models are: 1) Seque

Estilos ABNT, Harvard, Vancouver, APA, etc.

29

Lin, Ning, and Venkata Dinavahi. "Parallel High-Fidelity Electromagnetic Transient Simulation of Large-Scale Multi-Terminal DC Grids." November 19, 2018. https://doi.org/10.5281/zenodo.7685832.

Texto completo da fonte

Resumo:

Electromagnetic transient (EMT) simulation of power electronics conducted on the CPU slows down as the system scales up. Thus, the massively parallelism of the graphics processing unit (GPU) is utilized to expedite the simulation of the multi-terminal DC (MTDC) grid, where detailed models of the semiconductor switches are adopted to provide comprehensive device-level information. As the large number of nodes leads to an inefcient solution of the DC grid, three levels of circuit partitioning are applied, i.e., the transmission line-based natural separation of converter stations, splitting of th

Estilos ABNT, Harvard, Vancouver, APA, etc.

30

Bloch, Aurelien, Simone Casale-Brunet, and Marco Mattavelli. "Design Space Exploration for Partitioning Dataflow Program on CPU-GPU Heterogeneous System." Journal of Signal Processing Systems, July 31, 2023. http://dx.doi.org/10.1007/s11265-023-01884-6.

Texto completo da fonte

Resumo:

AbstractDataflow programming is a methodology that enables the development of high-level, parametric programs that are independent of the underlying platform. This approach is particularly useful for heterogeneous platforms, as it eliminates the need to rewrite application software for each configuration. Instead, it only requires new low-level implementation code, which is typically automatically generated through code generation tools. The performance of programs running on heterogeneous parallel platforms is highly dependent on the partitioning and mapping of computation to different proces

Estilos ABNT, Harvard, Vancouver, APA, etc.

31

Kemmler, Samuel, Christoph Rettinger, Ulrich Rüde, Pablo Cuéllar, and Harald Köstler. "Efficiency and scalability of fully-resolved fluid-particle simulations on heterogeneous CPU-GPU architectures." International Journal of High Performance Computing Applications, January 10, 2025. https://doi.org/10.1177/10943420241313385.

Texto completo da fonte

Resumo:

Current supercomputers often have a heterogeneous architecture using both conventional Central Processing Units (CPUs) and Graphics Processing Units (GPUs). At the same time, numerical simulation tasks frequently involve multiphysics scenarios whose components run on different hardware due to multiple reasons, e.g., architectural requirements, pragmatism, etc. This leads naturally to a software design where different simulation modules are mapped to different subsystems of the heterogeneous architecture. We present a detailed performance analysis for such a hybrid four-way coupled simulation o

Estilos ABNT, Harvard, Vancouver, APA, etc.

32

Ali, Teymoor, Deepayan Bhowmik, and Robert Nicol. "Energy aware computer vision algorithm deployment on heterogeneous architectures." Discover Electronics 2, no. 1 (2025). https://doi.org/10.1007/s44291-025-00078-7.

Texto completo da fonte

Resumo:

Abstract Computer vision algorithms, specifically convolutional neural networks (CNNs) and feature extraction algorithms, have become increasingly pervasive in many vision tasks. As algorithm complexity grows, it raises computational and memory requirements, which poses a challenge to embedded vision systems with limited resources. Heterogeneous architectures have recently gained momentum as a new path forward for energy efficiency and faster computation, as they allow for the effective utilisation of various processing units, such as Central Processing Unit (CPU), Graphics Processing Unit (GP

Estilos ABNT, Harvard, Vancouver, APA, etc.

33

Shokrani Baigi, Ahmad, Abdorreza Savadi, and Mahmoud Naghibzadeh. "Optimizing sparse matrix partitioning in a heterogeneous CPU-GPU system for high-performance." Computing 107, no. 4 (2025). https://doi.org/10.1007/s00607-025-01456-5.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

34

Lin, Ning, and Venkata Dinavahi. "Exact Nonlinear Micromodeling for Fine-Grained Parallel EMT Simulation of MTDC Grid Interaction With Wind Farm." August 1, 2019. https://doi.org/10.5281/zenodo.7683216.

Texto completo da fonte

Resumo:

Detailed high-order models of the insulated gate bipolar transistor (IGBT) and the diode are rarely included in power converters for large-scale system-level electromagnetic transient (EMT) simulation on the CPU, due to the nonlinear characteristics albeit they are more accurate. The massively parallel architecture of the graphics processing unit (GPU) enables a lower computational burden by avoiding the computation of  complex devices repetitively in a sequential manner and thus is utilized in this paper to simulate the wind farm-integrated multiterminal dc (MTdc) grid based on the modul

Estilos ABNT, Harvard, Vancouver, APA, etc.

35

THOMAS, beatrice, Roman LE GOFF LATIMIER, Hamid BEN AHMED, Gurvan JODIN, Abdelhafid EL OUARDI, and Samir BOUAZIZ. "Optimized Cpu-Gpu Partitioning for an Admm Algorithm Applied to a Peer to Peer Energy Market." SSRN Electronic Journal, 2022. http://dx.doi.org/10.2139/ssrn.4186889.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

36

Mu, Yifei, Ce Yu, Chao Sun, et al. "3DT-CM: A Low-complexity Cross-matching Algorithm for Large Astronomical Catalogues using 3d-tree Approach." Research in Astronomy and Astrophysics, August 8, 2023. http://dx.doi.org/10.1088/1674-4527/acee50.

Texto completo da fonte

Resumo:

Abstract Location-based cross-matching is a preprocessing step in astronomy that aims to identify records belonging to the same celestial body based on the angular distance formula. The traditional approach involves comparing each record in one catalogue with every record in the other catalogue, resulting in a one-to-one comparison with high computational complexity. To reduce the computational time, index partitioning methods are used to divide the sky into regions and perform local cross-matching. In addition, cross-matching algorithms have been adopted on high-performance architectures to i

Estilos ABNT, Harvard, Vancouver, APA, etc.

37

Magalhães, W. F., M. C. De Farias, H. M. Gomes, L. B. Marinho, G. S. Aguiar, and P. Silveira. "Evaluating Edge-Cloud Computing Trade-Offs for Mobile Object Detection and Classification with Deep Learning." Journal of Information and Data Management 11, no. 1 (2020). http://dx.doi.org/10.5753/jidm.2020.2026.

Texto completo da fonte

Resumo:

Internet-of-Things (IoT) applications based on Artificial Intelligence, such as mobile object detection and recognition from images and videos, may greatly benefit from inferences made by state-of-the-art Deep Neural Network(DNN) models. However, adopting such models in IoT applications poses an important challenge since DNNs usually require lots of computational resources (i.e. memory, disk, CPU/GPU, and power), which may prevent them to run on resource-limited edge devices. On the other hand, moving the heavy computation to the Cloud may significantly increase running costs and latency of Io

Estilos ABNT, Harvard, Vancouver, APA, etc.

38

Sahebi, Amin, Marco Barbone, Marco Procaccini, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi. "Distributed large-scale graph processing on FPGAs." Journal of Big Data 10, no. 1 (2023). http://dx.doi.org/10.1186/s40537-023-00756-x.

Texto completo da fonte

Resumo:

AbstractProcessing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited devic

Estilos ABNT, Harvard, Vancouver, APA, etc.

39

Schmidt, Bertil, Felix Kallenborn, Alejandro Chacon, and Christian Hundt. "CUDASW++4.0: ultra-fast GPU-based Smith–Waterman protein sequence database search." BMC Bioinformatics 25, no. 1 (2024). http://dx.doi.org/10.1186/s12859-024-05965-6.

Texto completo da fonte

Resumo:

Abstract Background The maximal sensitivity for local pairwise alignment makes the Smith-Waterman algorithm a popular choice for protein sequence database search. However, its quadratic time complexity makes it compute-intensive. Unfortunately, current state-of-the-art software tools are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. This motivates the need for more efficient implementations. Results CUDASW++4.0 is a fast software tool for scanning protein sequence databases with the Smith-Waterman algorithm on CUDA-enabled GP

Estilos ABNT, Harvard, Vancouver, APA, etc.

40

Yanamala, Rama Muni Reddy, and Muralidhar Pullakandam. "Empowering edge devices: FPGA‐based 16‐bit fixed‐point accelerator with SVD for CNN on 32‐bit memory‐limited systems." International Journal of Circuit Theory and Applications, February 13, 2024. http://dx.doi.org/10.1002/cta.3957.

Texto completo da fonte

Resumo:

AbstractConvolutional neural networks (CNNs) are now often used in deep learning and computer vision applications. Its convolutional layer accounts for most calculations and should be computed fast in a local edge device. Field‐programmable gate arrays (FPGAs) have been adequately explored as promising hardware accelerators for CNNs due to their high performance, energy efficiency, and reconfigurability. This paper developed an efficient FPGA‐based 16‐bit fixed‐point hardware accelerator unit for deep learning applications on the 32‐bit low‐memory edge device (PYNQ‐Z2 board). Additionally, sin

Estilos ABNT, Harvard, Vancouver, APA, etc.

41

Liu, Chaoqiang, Xiaofei Liao, Long Zheng, et al. "L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous Platform." ACM Transactions on Reconfigurable Technology and Systems, March 14, 2024. http://dx.doi.org/10.1145/3652609.

Texto completo da fonte

Resumo:

Due to the high complexity of constructing exact k -nearest neighbor graphs, approximate construction has become a popular research topic. The NN-Descent algorithm is one of the representative in-memory algorithms. To effectively handle large datasets, existing state-of-the-art solutions combine the divide-and-conquer approach and the NN-Descent algorithm, where large datasets are divided into multiple partitions, and a subgraph is constructed for each partition before all the subgraphs are merged, reducing the memory pressure significantly. However, such solutions fail to address inefficienci

Estilos ABNT, Harvard, Vancouver, APA, etc.

42

Aghapour, Ehsan, Dolly Sapra, Andy Pimentel, and Anuj Pathania. "ARM-CO-UP: ARM CO operative U tilization of P rocessors." ACM Transactions on Design Automation of Electronic Systems, April 8, 2024. http://dx.doi.org/10.1145/3656472.

Texto completo da fonte

Resumo:

HMPSoCs combine different processors on a single chip. They enable powerful embedded devices, which increasingly perform ML inference tasks at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using different processors, such as CPUs, GPUs, and NPUs. HMPSoCs can potentially overcome the limitation of low single-processor CNN inference performance and efficiency by cooperative use of multiple processors. However, standard inference frameworks for edge devices typically utilize only a single processor. We present the ARM-CO-UP framework built on the ARM-CL library. The AR

Estilos ABNT, Harvard, Vancouver, APA, etc.

43

Vera-Parra, Nelson Enrique, Danilo Alfonso López-Sarmiento, and Cristian Alejandro Rojas-Quintero. "HETEROGENEOUS COMPUTING TO ACCELERATE THE SEARCH OF SUPER K-MERS BASED ON MINIMIZERS." International Journal of Computing, December 30, 2020, 525–32. http://dx.doi.org/10.47839/ijc.19.4.1985.

Texto completo da fonte

Resumo:

The k-mers processing techniques based on partitioning of the data set on the disk using minimizer-type seeds have led to a significant reduction in memory requirements; however, it has added processes (search and distribution of super k-mers) that can be intensive given the large volume of data. This paper presents a massive parallel processing model in order to enable the efficient use of heterogeneous computation to accelerate the search of super k-mers based on seeds (minimizers or signatures). The model includes three main contributions: a new data structure called CISK for representing t

Estilos ABNT, Harvard, Vancouver, APA, etc.

44

Karp, Martin, Estela Suarez, Jan H. Meinke, et al. "Experience and analysis of scalable high-fidelity computational fluid dynamics on modular supercomputing architectures." International Journal of High Performance Computing Applications, November 28, 2024. http://dx.doi.org/10.1177/10943420241303163.

Texto completo da fonte

Resumo:

The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster modu

Estilos ABNT, Harvard, Vancouver, APA, etc.

45

Zhang, Yajie, Ce Yu, Chao Sun, et al. "HLC2: a Highly Efficient Cross-matching Framework for Large Astronomical Catalogues on Heterogeneous Computing Environments." Monthly Notices of the Royal Astronomical Society, January 10, 2023. http://dx.doi.org/10.1093/mnras/stad067.

Texto completo da fonte

Resumo:

Abstract Cross-matching operation, which is to find corresponding data for the same celestial object or region from multiple catalogues, is indispensable to astronomical data analysis and research. Due to the large amount of astronomical catalogues generated by the ongoing and next generation large-scale sky surveys, the time complexity of the cross-matching is increasing dramatically. Heterogeneous computing environments provide a theoretical possibility to accelerate the cross-matching, but the performance advantages of heterogeneous computing resources have not been fully utilized. To meet

Estilos ABNT, Harvard, Vancouver, APA, etc.

46

Lin, Ning, and Venkata Dinavahi. "Variable Time-Stepping Modular Multilevel Converter Model for Fast and Parallel Transient Simulation of Multiterminal DC Grid." September 1, 2019. https://doi.org/10.5281/zenodo.7685899.

Texto completo da fonte

Resumo:

The efficiency of multiterminal dc (MTDC) grid simulation decreases with an expansion of its scale and the inclusion of accurate component models. Thus, the variable time-stepping scheme is proposed in this paper to expedite the electromagnetic transient computation. A number of criteria are proposed to evaluate the time-step and regulate it dynamically during simulation. Meanwhile, as the accuracy of results is heavily reliant on the switch model in the modular multilevel converter, the nonlinear behavioral model with a greater accuracy is proposed in addition to the classic ideal model, and

Estilos ABNT, Harvard, Vancouver, APA, etc.

Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!