Siga este enlace para ver otros tipos de publicaciones sobre el tema: GPU-CPU.

Artículos de revistas sobre el tema "GPU-CPU"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "GPU-CPU".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Zhu, Ziyu, Xiaochun Tang, and Quan Zhao. "A unified schedule policy of distributed machine learning framework for CPU-GPU cluster." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 39, no. 3 (2021): 529–38. http://dx.doi.org/10.1051/jnwpu/20213930529.

Texto completo
Resumen
With the widespread using of GPU hardware facilities, more and more distributed machine learning applications have begun to use CPU-GPU hybrid cluster resources to improve the efficiency of algorithms. However, the existing distributed machine learning scheduling framework either only considers task scheduling on CPU resources or only considers task scheduling on GPU resources. Even considering the difference between CPU and GPU resources, it is difficult to improve the resource usage of the entire system. In other words, the key challenge in using CPU-GPU clusters for distributed machine lear
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Cui, Pengjie, Haotian Liu, Bo Tang, and Ye Yuan. "CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processor." Proceedings of the VLDB Endowment 17, no. 6 (2024): 1405–17. http://dx.doi.org/10.14778/3648160.3648179.

Texto completo
Resumen
In recent years, many CPU-GPU heterogeneous graph processing systems have been developed in both academic and industrial to facilitate large-scale graph processing in various applications, e.g., social networks and biological networks. However, the performance of existing systems can be significantly improved by addressing two prevailing challenges: GPU memory over-subscription and efficient CPU-GPU cooperative processing. In this work, we propose CGgraph, an ultra-fast CPU-GPU graph processing system to address these challenges. In particular, CGgraph overcomes GPU-memory over-subscription by
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Lee, Taekhee, and Young J. Kim. "Massively parallel motion planning algorithms under uncertainty using POMDP." International Journal of Robotics Research 35, no. 8 (2015): 928–42. http://dx.doi.org/10.1177/0278364915594856.

Texto completo
Resumen
We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Yogatama, Bobbi W., Weiwei Gong, and Xiangyao Yu. "Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMS." Proceedings of the VLDB Endowment 15, no. 11 (2022): 2491–503. http://dx.doi.org/10.14778/3551793.3551809.

Texto completo
Resumen
There has been a growing interest in using GPU to accelerate data analytics due to its massive parallelism and high memory bandwidth. The main constraint of using GPU for data analytics is the limited capacity of GPU memory. Heterogeneous CPU-GPU query execution is a compelling approach to mitigate the limited GPU memory capacity and PCIe bandwidth. However, the design space of heterogeneous CPU-GPU query execution has not been fully explored. We aim to improve state-of-the-art CPU-GPU data analytics engine by optimizing data placement and heterogeneous query execution. First, we introduce a s
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Raju, K., and Niranjan N Chiplunkar. "PERFORMANCE ENHANCEMENT OF CUDA APPLICATIONS BY OVERLAPPING DATA TRANSFER AND KERNEL EXECUTION." Applied Computer Science 17, no. 3 (2021): 5–18. http://dx.doi.org/10.35784/acs-2021-17.

Texto completo
Resumen
The CPU-GPU combination is a widely used heterogeneous computing system in which the CPU and GPU have different address spaces. Since the GPU cannot directly access the CPU memory, prior to invoking the GPU function the input data must be available on the GPU memory. On completion of GPU function, the results of computation are transferred to CPU memory. The CPU-GPU data transfer happens through PCI-Express bus. The PCI-E bandwidth is much lesser than that of GPU memory. The speed at which the data is transferred is limited by the PCI-E bandwidth. Hence, the PCI-E acts as a performance bottlen
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Power, Jason, Joel Hestness, Marc S. Orr, Mark D. Hill, and David A. Wood. "gem5-gpu: A Heterogeneous CPU-GPU Simulator." IEEE Computer Architecture Letters 14, no. 1 (2015): 34–36. http://dx.doi.org/10.1109/lca.2014.2299539.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Abdusalomov, Saidmalikxon Mannop o`g`li. "CPU VA GPU FARQLARI." CENTRAL ASIAN JOURNAL OF EDUCATION AND INNOVATION 2, no. 5 (2023): 168–70. https://doi.org/10.5281/zenodo.7935842.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Liu, Gaogao, Wenbo Yang, Peng Li, et al. "MIMO Radar Parallel Simulation System Based on CPU/GPU Architecture." Sensors 22, no. 1 (2022): 396. http://dx.doi.org/10.3390/s22010396.

Texto completo
Resumen
The data volume and computation task of MIMO radar is huge; a very high-speed computation is necessary for its real-time processing. In this paper, we mainly study the time division MIMO radar signal processing flow, propose an improved MIMO radar signal processing algorithm, raising the MIMO radar algorithm processing speed combined with the previous algorithms, and, on this basis, a parallel simulation system for the MIMO radar based on the CPU/GPU architecture is proposed. The outer layer of the framework is coarse-grained with OpenMP for acceleration on the CPU, and the inner layer of fine
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Zou, Yong Ning, Jue Wang, and Jian Wei Li. "Cutting Display of Industrial CT Volume Data Based on GPU." Advanced Materials Research 271-273 (July 2011): 1096–102. http://dx.doi.org/10.4028/www.scientific.net/amr.271-273.1096.

Texto completo
Resumen
The rapid development of Graphic Processor Units (GPU) in recent years in terms of performance and programmability has attracted the attention of those seeking to leverage alternative architectures for better performance than that which commodity CPU can provide. This paper presents a new algorithm for cutting display of computed tomography volume data on the GPU. We first introduce the programming model of the GPU and outline the implementation of techniques for oblique plane cutting display of volume data on both the CPU and GPU. We compare the approaches and present performance results for
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Jiang, Ronglin, Shugang Jiang, Yu Zhang, Ying Xu, Lei Xu, and Dandan Zhang. "GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform." International Journal of Antennas and Propagation 2014 (2014): 1–8. http://dx.doi.org/10.1155/2014/321081.

Texto completo
Resumen
This paper introduces a (finite difference time domain) FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI) and Open Multiprocessing (OpenMP). Since both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations f
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Yogatama, Bobbi, Weiwei Gong, and Xiangyao Yu. "Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs." Proceedings of the VLDB Endowment 17, no. 13 (2024): 4709–22. https://doi.org/10.14778/3704965.3704977.

Texto completo
Resumen
GPU-accelerated databases have been gaining popularity in recent years due to their massive parallelism and high memory bandwidth. The limited GPU memory capacity, however, is still a major bottleneck for GPU databases. Existing approaches have attempted to address this limitation by using (1) hybrid CPU-GPU DBMS or (2) multi-GPU DBMS. We aim to improve prior solutions further by leveraging both hybrid CPU-GPU DBMS and multi-GPU DBMS at the same time. In particular, we explore the design space and optimize the data placement and query execution in hybrid CPU and multi-GPU DBMS. To improve data
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Semenenko, Julija, Aliaksei Kolesau, Vadimas Starikovičius, Artūras Mackūnas, and Dmitrij Šešok. "COMPARISON OF GPU AND CPU EFFICIENCY WHILE SOLVING HEAT CONDUCTION PROBLEMS." Mokslas - Lietuvos ateitis 12 (November 24, 2020): 1–5. http://dx.doi.org/10.3846/mla.2020.13500.

Texto completo
Resumen
Overview of GPU usage while solving different engineering problems, comparison between CPU and GPU computations and overview of the heat conduction problem are provided in this paper. The Jacobi iterative algorithm was implemented by using Python, TensorFlow GPU library and NVIDIA CUDA technology. Numerical experiments were conducted with 6 CPUs and 4 GPUs. The fastest used GPU completed the calculations 19 times faster than the slowest CPU. On average, GPU was from 9 to 11 times faster than CPU. Significant relative speed-up in GPU calculations starts when the matrix contains at least 4002 fl
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Hu, Peng, Zixiong Zhao, Aofei Ji, et al. "A GPU-Accelerated and LTS-Based Finite Volume Shallow Water Model." Water 14, no. 6 (2022): 922. http://dx.doi.org/10.3390/w14060922.

Texto completo
Resumen
This paper presents a GPU (Graphics Processing Unit)-accelerated and LTS (Local-time-Step)-based finite volume Shallow Water Model (SWM). The model performance is compared against the other five model versions (Single CPU versions with/without LTS, Multi-CPU versions with/without LTS, and a GPU version) by simulating three flow scenarios: an idealized dam-break flow; an experimental dam-break flow; a field-scale scenario of tidal flows. Satisfactory agreements between simulation results and the available measured data/reference solutions (water level, flow velocity) indicate that all the six S
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Ai, Xin, Qiange Wang, Chunyu Cao, et al. "NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments." Proceedings of the VLDB Endowment 17, no. 8 (2024): 1995–2008. http://dx.doi.org/10.14778/3659437.3659453.

Texto completo
Resumen
Graph Neural Networks (GNNs) have shown exceptional performance across a wide range of applications. Current frameworks leverage CPU-GPU heterogeneous environments for GNN model training, incorporating mini-batch and sampling techniques to mitigate GPU memory constraints. In such settings, sample-based GNN training can be divided into three phases: sampling, gathering, and training. Existing GNN systems deploy various task orchestration methods to execute each phase on either the CPU or GPU. However, through comprehensive experimentation and analysis, we observe that these task orchestration a
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Gyurjyan, Vardan, and Sebastian Mancilla. "Heterogeneous data-processing optimization with CLARA’s adaptive workflow orchestrator." EPJ Web of Conferences 245 (2020): 05020. http://dx.doi.org/10.1051/epjconf/202024505020.

Texto completo
Resumen
The hardware landscape used in HEP and NP is changing from homogeneous multi-core systems towards heterogeneous systems with many different computing units, each with their own characteristics. To achieve maximum performance with data processing, the main challenge is to place the right computing on the right hardware. In this paper, we discuss CLAS12 charge particle tracking workflow orchestration that allows us to utilize both CPU and GPU to improve the performance. The tracking application algorithm was decomposed into micro-services that are deployed on CPU and GPU processing units, where
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Agibalov, Oleg, and Nikolay Ventsov. "On the issue of fuzzy timing estimations of the algorithms running at GPU and CPU architectures." E3S Web of Conferences 135 (2019): 01082. http://dx.doi.org/10.1051/e3sconf/201913501082.

Texto completo
Resumen
We consider the task of comparing fuzzy estimates of the execution parameters of genetic algorithms implemented at GPU (graphics processing unit’ GPU) and CPU (central processing unit) architectures. Fuzzy estimates are calculated based on the averaged dependencies of the genetic algorithms running time at GPU and CPU architectures from the number of individuals in the populations processed by the algorithm. The analysis of the averaged dependences of the genetic algorithms running time at GPU and CPU-architectures showed that it is possible to process 10’000 chromosomes at GPU-architecture or
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Fortin, Pierre, and Maxime Touche. "Dual tree traversal on integrated GPUs for astrophysical N-body simulations." International Journal of High Performance Computing Applications 33, no. 5 (2019): 960–72. http://dx.doi.org/10.1177/1094342019840806.

Texto completo
Resumen
In astrophysical N-body simulations, O( N) fast multipole methods (FMMs) with dual tree traversal (DTT) on multi-core CPUs are faster than O( N log N) CPU tree-codes but can still be outperformed by GPU ones. In this article, we aim at combining the best algorithm, namely FMM with DTT, with the most powerful hardware currently available, namely GPUs. In the astrophysical context requiring low accuracies and non-uniform particle distributions, we show that such combination can be achieved thanks to a hybrid CPU-GPU algorithm on integrated GPUs: while the DTT is performed on the CPU cores, the f
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Liu, Changyuan. "Study on the Particle Sorting Performance for Reactor Monte Carlo Neutron Transport on Apple Unified Memory GPUs." EPJ Web of Conferences 302 (2024): 04001. http://dx.doi.org/10.1051/epjconf/202430204001.

Texto completo
Resumen
In simulation of nuclear reactor physics using the Monte Carlo neutron transport method on GPUs, the sorting of particles plays a significant role in performance of calculation. Traditionally, CPUs and GPUs are separated devices connected at low data transfer rate and high data transfer latency. Emerging computing chips tend to integrate CPUs and GPUs. One example is the Apple silicon chips with unified memory. Such unified memory chips have opened doors for new strategies of collaboration between CPUs and GPUs for Monte Carlo neutron transport. Sorting particles on CPU and transport on GPU is
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Cao, Wei, Zheng Hua Wang, and Chuan Fu Xu. "An Out-of-Core Method for CFD Simulation in Heterogeneous Environment." Advanced Materials Research 753-755 (August 2013): 2912–15. http://dx.doi.org/10.4028/www.scientific.net/amr.753-755.2912.

Texto completo
Resumen
In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. However, in most computational fluid dynamics (CFD) simulations, the computational capacity of CPU was ignored. In this paper, we propose a hybrid parallel programming model to utilize the computational capacity of both CPU and GPU. Considering the memory amount of CPU and GPU, we also propose an out-of-core method to increase the simulation scale on single node. The experiment results show that the programming model can utilize the computational c
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Yang, Min Kyu, and Jae-Seung Jeong. "Optimized Hybrid Central Processing Unit–Graphics Processing Unit Workflow for Accelerating Advanced Encryption Standard Encryption: Performance Evaluation and Computational Modeling." Applied Sciences 15, no. 7 (2025): 3863. https://doi.org/10.3390/app15073863.

Texto completo
Resumen
This study addresses the growing demand for scalable data encryption by evaluating the performance of AES (Advanced Encryption Standard) encryption and decryption using CBC (Cipher Block Chaining) and CTR (Counter Mode) modes across various CPU (Central Processing Unit) and GPU (Graphics Processing Unit) hardware models. The objective is to highlight GPU acceleration benefits and propose an optimized hybrid CPU–GPU workflow for large-scale data security. Methods include benchmarking encryption performance with provided data, mathematical models, and computational analysis. The results indicate
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Shim, Hyungwook, Myeongju Ko, and Minho Seo. "Decomposition analysis of influencing factors of GPU-centric supercomputing demand: LMDI-based approach." Edelweiss Applied Science and Technology 9, no. 2 (2025): 208–17. https://doi.org/10.55214/25768484.v9i2.4455.

Texto completo
Resumen
With the introduction of AI technology, the supercomputing industry is transitioning from CPU-centric to GPU-centric, and many countries are making efforts to build new GPU-centric resources. The purpose of this paper is to discover new factors in demand management for efficient construction and operation of future national supercomputing GPU resources. Reflecting industry characteristics, we decompose the factors affecting existing CPU use into intensity effect, structure effect, and production effect indicators targeting CPU-only resources and GPU-only resources, and compare and analyze the
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Chad Ferrino, Abuda, and Tae Young Choe. "Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU." Tehnički glasnik 19, no. 3 (2025): 461–72. https://doi.org/10.31803/tg-20240112104444.

Texto completo
Resumen
One objective of GPU scheduling in cloud systems is to minimize the completion times of given deep learning models. This is important for deep learning in cloud environments because deep learning workloads require a lot of time to finish, and misallocation of these workloads can create a huge increase in job completion time. Difficulties of GPU scheduling come from a diverse type of parameters including model architectures and GPU types. Some of these model architectures are CPU-intensive rather than GPU intensive which creates a different hardware requirement when training different models. T
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Tang, Wenjie, Wentong Cai, Yiping Yao, Xiao Song, and Feng Zhu. "An alternative approach for collaborative simulation execution on a CPU+GPU hybrid system." SIMULATION 96, no. 3 (2019): 347–61. http://dx.doi.org/10.1177/0037549719885178.

Texto completo
Resumen
In the past few years, the graphics processing unit (GPU) has been widely used to accelerate time-consuming models in simulations. Since both model computation and simulation management are main factors that affect the performance of large-scale simulations, only accelerating model computation will limit the potential speedup. Moreover, models that can be well accelerated by a GPU could be insufficient, especially for simulations with many lightweight models. Traditionally, the parallel discrete event simulation (PDES) method is used to solve this class of simulation, but most PDES simulators
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Hadi, N. A., S. A. Halim, N. S. M. Lazim, and N. Alias. "Performance of CPU GPU Parallel Architecture on Segmentation and Geometrical Features Extraction of Malaysian Herb Leaves." Malaysian Journal of Mathematical Sciences 16, no. 2 (2022): 363–77. http://dx.doi.org/10.47836/mjms.16.2.12.

Texto completo
Resumen
Image recognition includes the segmentation of image boundary geometrical features extraction and classification is used in the particular image database development. The ultimate challenge in this task is it is computationally expensive. This paper highlighted a CPU GPU architecture for image segmentation and features extraction processes of 125 images of Malaysian Herb Leaves. Two GPUs and three kernels are utilized in the CPU GPU platform using MATLAB software. Each of herb image has pixel dimensions 16161080. The segmentation process uses the Sobel operator which is then used to extract th
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

CHEN, LIN, DESHI YE, and GUOCHUAN ZHANG. "ONLINE SCHEDULING OF MIXED CPU-GPU JOBS." International Journal of Foundations of Computer Science 25, no. 06 (2014): 745–61. http://dx.doi.org/10.1142/s0129054114500312.

Texto completo
Resumen
We consider the online scheduling problem in a CPU-GPU cluster. In this problem there are two sets of processors, the CPU processors and the GPU processors. Each job has two distinct processing times, one for the CPU processor and the other for the GPU processor. Once a job is released, a decision should be made immediately about which processor it should be assigned to. The goal is to minimize the makespan, i.e., the largest completion time among all the processors. Such a problem could be seen as an intermediate model between the scheduling problem on identical machines and unrelated machine
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Liu, Zhi Yuan, and Xue Zhang Zhao. "Research and Implementation of Image Rotation Based on CUDA." Advanced Materials Research 216 (March 2011): 708–12. http://dx.doi.org/10.4028/www.scientific.net/amr.216.708.

Texto completo
Resumen
GPU technology release CPU from burdensome graphic computing task. The nVIDIA company, the main GPU producer, adds CUDA technology in new GPU models which enhances GPU function greatly and has much advantage in computing complex matrix. General algorithms of image rotation and the structure of CUDA are introduced in this paper. An example of rotating an image by using HALCON based on CPU instruction extensions and CUDA technology is to prove the advantage of CUDA by comparing two results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Tao, Yu-Bo, Hai Lin, and Hu Jun Bao. "FROM CPU TO GPU: GPU-BASED ELECTROMAGNETIC COMPUTING (GPUECO)." Progress In Electromagnetics Research 81 (2008): 1–19. http://dx.doi.org/10.2528/pier07121302.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Ma, Haifeng. "Development of a CPU-GPU heterogeneous platform based on a nonlinear parallel algorithm." Nonlinear Engineering 11, no. 1 (2022): 215–22. http://dx.doi.org/10.1515/nleng-2022-0027.

Texto completo
Resumen
Abstract In order to seek a refined model analysis software platform that can balance both the computational accuracy and computational efficiency, a CPU-GPU heterogeneous platform based on a nonlinear parallel algorithm is developed. The modular design method is adopted to complete the architecture construction of structural nonlinear analysis software, clarify the basic analysis steps of nonlinear finite element problems, so as to determine the structure of the software system, conduct module division, and clarify the function, interface, and call relationship of each module. The results sho
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Silva, Bruno, Luiz Guerreiro Lopes, and Fábio Mendonça. "Multithreaded and GPU-Based Implementations of a Modified Particle Swarm Optimization Algorithm with Application to Solving Large-Scale Systems of Nonlinear Equations." Electronics 14, no. 3 (2025): 584. https://doi.org/10.3390/electronics14030584.

Texto completo
Resumen
This paper presents a novel Graphics Processing Unit (GPU) accelerated implementation of a modified Particle Swarm Optimization (PSO) algorithm specifically designed to solve large-scale Systems of Nonlinear Equations (SNEs). The proposed GPU-based parallel version of the PSO algorithm uses the inherent parallelism of modern hardware architectures. Its performance is compared against both sequential and multithreaded Central Processing Unit (CPU) implementations. The primary objective is to evaluate the efficiency and scalability of PSO across different hardware platforms with a focus on solvi
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Woźniak, Jarosław. "Wykorzystanie CPU i GPU do obliczeń w Matlabie." Journal of Computer Sciences Institute 10 (March 30, 2019): 32–35. http://dx.doi.org/10.35784/jcsi.191.

Texto completo
Resumen
W artykule zostały przedstawione wybrane rozwiązania wykorzystujące procesory CPU oraz procesory graficzne GPU do obliczeń w środowisku Matlab. Porównywano różne metody wykonywania obliczeń na CPU, jak i na GPU. Zostały wskazane różnice, wady, zalety oraz skutki stosowania wybranych sposobów obliczeń.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Janiak, Adam, Wladyslaw Janiak, and Maciej Lichtenstein. "Tabu Search on GPU." JUCS - Journal of Universal Computer Science 14, no. (14) (2008): 2416–27. https://doi.org/10.3217/jucs-014-14-2416.

Texto completo
Resumen
Nowadays Personal Computers (PCs) are often equipped with powerful, multi-core CPU. However, the processing power of the modern PC does not depend only of the processing power of the CPU and can be increased by proper use of the GPGPU, i.e. General-Purpose Computation Using Graphics Hardware. Modern graphics hardware, initially developed for computer graphics generation, appeared to be flexible enough for general-purpose computations. In this paper we present the implementation of two optimization algorithms based on the tabu search technique, namely for the traveling salsesman problem and the
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Yoo, Seohwan, Sunjun Hwang, Hayeon Park, Jin Choi, and Chang-Gun Lee. "Hardware Interrupt-Aware CPU/GPU Scheduling on Heterogeneous Multicore and GPU System." KIISE Transactions on Computing Practices 29, no. 1 (2023): 10–14. http://dx.doi.org/10.5626/ktcp.2022.29.1.10.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Ayush, Bhardwaj, and B. Ramesh K. "Designing a Graphics Processing Unit with advanced Arithmetic Logic Unit Resulting Improved Performance." Research and Applications: Emerging Technologies 6, no. 3 (2024): 38–46. https://doi.org/10.5281/zenodo.12720907.

Texto completo
Resumen
<em>This paper explores microprocessor intricacies, particularly the central processing unit (CPU) and the graphics processing unit (GPU). The CPU, dubbed a computer's brain, features critical components like the Control Unit (CU), Arithmetic Logic Unit (ALU), and Memory Unit (MU), orchestrating instruction execution and system resource management. Contrarily, GPUs, initially for graphics rendering, now excel in parallel processing, aiding tasks beyond graphics. It compares CPU and GPU architectures, emphasizing their parallel processing and memory hierarchy. The graphics rendering pipeline's
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Wang, Qihan, Zhen Peng, Bin Ren, Jie Chen, and Robert G. Edwards. "MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation." ACM Transactions on Architecture and Code Optimization 19, no. 2 (2022): 1–26. http://dx.doi.org/10.1145/3506705.

Texto completo
Resumen
The many-body correlation function is a fundamental computation kernel in modern physics computing applications, e.g., Hadron Contractions in Lattice quantum chromodynamics (QCD). This kernel is both computation and memory intensive, involving a series of tensor contractions, and thus usually runs on accelerators like GPUs. Existing optimizations on many-body correlation mainly focus on individual tensor contractions (e.g., cuBLAS libraries and others). In contrast, this work discovers a new optimization dimension for many-body correlation by exploring the optimization opportunities among tens
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Borcovas, Evaldas, and Gintautas Daunys. "CPU AND GPU (CUDA) TEMPLATE MATCHING COMPARISON / CPU IR GPU (CUDA) PALYGINIMAS VYKDANT ŠABLONŲ ATITIKTIES ALGORITMĄ." Mokslas – Lietuvos ateitis 6, no. 2 (2014): 129–33. http://dx.doi.org/10.3846/mla.2014.16.

Texto completo
Resumen
Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I), NVidia GeForce GT320M CUDAcompliable graphics card (GPU I) and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II), NVidiaGeForce GTX 560 CUDA compat
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Paul, Indrani, Vignesh Ravi, Srilatha Manne, Manish Arora, and Sudhakar Yalamanchili. "Coordinated Energy Management in Heterogeneous Processors." Scientific Programming 22, no. 2 (2014): 93–108. http://dx.doi.org/10.1155/2014/210762.

Texto completo
Resumen
This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC) applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Arucu, Muhammet, and Teodor Iliev. "Performance Evaluation of FPGA, GPU, and CPU in FIR Filter Implementation for Semiconductor-Based Systems." Journal of Low Power Electronics and Applications 15, no. 3 (2025): 40. https://doi.org/10.3390/jlpea15030040.

Texto completo
Resumen
This study presents a comprehensive performance evaluation of field-programmable gate array (FPGA), graphics processing unit (GPU), and central processing unit (CPU) platforms for implementing finite impulse response (FIR) filters in semiconductor-based digital signal processing (DSP) systems. Utilizing a standardized FIR filter designed with the Kaiser window method, we compare computational efficiency, latency, and energy consumption across the ZYNQ XC7Z020 FPGA, Tesla K80 GPU, and Arm-based CPU, achieving processing times of 0.004 s, 0.008 s, and 0.107 s, respectively, with FPGA power consu
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Wang, Zhe, Yao Shen, and Zhou Lei. "EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm." Applied Sciences 15, no. 7 (2025): 3693. https://doi.org/10.3390/app15073693.

Texto completo
Resumen
With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throughput limitations. Since CPU-based platforms struggle to support real-time big data analysis, the GPU is introduced to support real-time GA analysis. Most GPU GA algorithms are based on hashing methods, and these algorithms experience performance degradation when the load factor of the hash table is
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Campeanu, Gabriel, and Mehrdad Saadatmand. "A Two-Layer Component-Based Allocation for Embedded Systems with GPUs." Designs 3, no. 1 (2019): 6. http://dx.doi.org/10.3390/designs3010006.

Texto completo
Resumen
Component-based development is a software engineering paradigm that can facilitate the construction of embedded systems and tackle its complexities. The modern embedded systems have more and more demanding requirements. One way to cope with such a versatile and growing set of requirements is to employ heterogeneous processing power, i.e., CPU–GPU architectures. The new CPU–GPU embedded boards deliver an increased performance but also introduce additional complexity and challenges. In this work, we address the component-to-hardware allocation for CPU–GPU embedded systems. The allocation for suc
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Handa, Pooja, Meenu Kalra, and Rajesh Sachdeva. "A Survey on Green Computing using GPU in Image Processing." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 14, no. 10 (2015): 6135–41. http://dx.doi.org/10.24297/ijct.v14i10.1834.

Texto completo
Resumen
Green computing is the process of reducing the power consumed by a computer and thereby reducing carbon emissions. The total power consumed by the computer excluding the monitor at its fully computative load is equal to the sum of the power consumed by the GPU in its idle state and the CPU at its full state. Recently, there have been tremendous interests in the acceleration of general computing applications using a Graphics Processing Unit (GPU). Now the GPU provides the computing powers not only for fast processing of graphics applications, but also for general computationally complex data in
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Ding, Li, Zhaomiao Dong, Huagang He, and Qibin Zheng. "A Hybrid GPU and CPU Parallel Computing Method to Accelerate Millimeter-Wave Imaging." Electronics 12, no. 4 (2023): 840. http://dx.doi.org/10.3390/electronics12040840.

Texto completo
Resumen
The range migration algorithm (RMA) based on Fourier transformation is widely applied in millimeter-wave (MMW) close-range imaging because of its few operations and small approximation. However, its interpolation stage is not effective due to the involved intensive logic controls, which limits the speed performance in a graphics processing unit (GPU) platform. Therefore, in this paper, we present an acceleration optimization method based on the hybrid GPU and central processing unit (CPU) parallel computation for implementing the RMA. The proposed method exploits the strong logic-control capab
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

GARBA, MICHAEL T., and HORACIO GONZÁLEZ–VÉLEZ. "ASYMPTOTIC PEAK UTILISATION IN HETEROGENEOUS PARALLEL CPU/GPU PIPELINES: A DECENTRALISED QUEUE MONITORING STRATEGY." Parallel Processing Letters 22, no. 02 (2012): 1240008. http://dx.doi.org/10.1142/s0129626412400087.

Texto completo
Resumen
Widespread heterogeneous parallelism is unavoidable given the emergence of General-Purpose computing on graphics processing units (GPGPU). The characteristics of a Graphics Processing Unit (GPU)—including significant memory transfer latency and complex performance characteristics—demand new approaches to ensuring that all available computational resources are efficiently utilised. This paper considers the simple case of a divisible workload based on widely-used numerical linear algebra routines and the challenges that prevent efficient use of all resources available to a naive SPMD application
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Chen, Yong, Hai Jin, Han Jiang, Dechao Xu, Ran Zheng, and Haocheng Liu. "Implementation and Optimization of GPU-Based Static State Security Analysis in Power Systems." Mobile Information Systems 2017 (2017): 1–10. http://dx.doi.org/10.1155/2017/1897476.

Texto completo
Resumen
Static state security analysis (SSSA) is one of the most important computations to check whether a power system is in normal and secure operating state. It is a challenge to satisfy real-time requirements with CPU-based concurrent methods due to the intensive computations. A sensitivity analysis-based method with Graphics processing unit (GPU) is proposed for power systems, which can reduce calculation time by 40% compared to the execution on a 4-core CPU. The proposed method involves load flow analysis and sensitivity analysis. In load flow analysis, a multifrontal method for sparse LU factor
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Ngo, Long Thanh, Dzung Dinh Nguyen, Long The Pham, and Cuong Manh Luong. "Speedup of Interval Type 2 Fuzzy Logic Systems Based on GPU for Robot Navigation." Advances in Fuzzy Systems 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/698062.

Texto completo
Resumen
As the number of rules and sample rate for type 2 fuzzy logic systems (T2FLSs) increases, the speed of calculations becomes a problem. The T2FLS has a large membership value of inherent algorithmic parallelism that modern CPU architectures do not exploit. In the T2FLS, many rules and algorithms can be speedup on a graphics processing unit (GPU) as long as the majority of computation a various stages and components are not dependent on each other. This paper demonstrates how to install interval type 2 fuzzy logic systems (IT2-FLSs) on the GPU and experiments for obstacle avoidance behavior of r
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Echeverribar, Isabel, Mario Morales-Hernández, Pilar Brufau, and Pilar García-Navarro. "Analysis of the performance of a hybrid CPU/GPU 1D2D coupled model for real flood cases." Journal of Hydroinformatics 22, no. 5 (2020): 1198–216. http://dx.doi.org/10.2166/hydro.2020.032.

Texto completo
Resumen
Abstract Coupled 1D2D models emerged as an efficient solution for a two-dimensional (2D) representation of the floodplain combined with a fast one-dimensional (1D) schematization of the main channel. At the same time, high-performance computing (HPC) has appeared as an efficient tool for model acceleration. In this work, a previously validated 1D2D Central Processing Unit (CPU) model is combined with an HPC technique for fast and accurate flood simulation. Due to the speed of 1D schemes, a hybrid CPU/GPU model that runs the 1D main channel on CPU and accelerates the 2D floodplain with a Graphi
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Min, Seung Won, Kun Wu, Sitao Huang, et al. "Large graph convolutional network training with GPU-oriented data communication architecture." Proceedings of the VLDB Endowment 14, no. 11 (2021): 2087–100. http://dx.doi.org/10.14778/3476249.3476264.

Texto completo
Resumen
Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features before sending them to the GPUs. This approach, however, puts tremendous pressure on host memory bandwidth and the CPU. This is because the CPU needs to (1) read sparse f
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Lee, Chien Yu, H. S. Lin, and H. T. Yau. "Using Graphic Hardware to Accelerate Pocketing Tool-Path Generation." Applied Mechanics and Materials 311 (February 2013): 135–40. http://dx.doi.org/10.4028/www.scientific.net/amm.311.135.

Texto completo
Resumen
In this paper, we propose a new approach to accelerate the pocketing tool-path generation by using graphic hardware (graphic processing units, GPU). The intersections among tool-path elements can be eliminated with higher efficiency from GPU-based Voronoi diagrams. According to our experimental results, the GPU-based computation speed was seven to eight times faster than that of CPU-based computation. In addition, the difference of tool-path geometry between the CPU-based and GPU-based methods was insignificant. Therefore, the GPU-method can be efficiently used to accelerate the computation wh
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Abramowicz, Kamil, and Przemysław Borczuk. "Comparative analysis of the performance of Unity and Unreal Engine game engines in 3D games." Journal of Computer Sciences Institute 30 (March 20, 2024): 53–60. http://dx.doi.org/10.35784/jcsi.5473.

Texto completo
Resumen
The article compared the performance of the Unity and Unreal Engine game engines based on tests conducted on two nearly identical games. The research focused on frames per second, CPU usage, RAM, and GPU memory. The results showed that Unity achieved a better average frame rate. Unreal Engine required more RAM and GPU resources. Analyzing CPU load values revealed that on the first system, Unity demanded less CPU usage. However, on the second system, Unreal Engine used over 10 percentage points less CPU. The conclusions from the research partially confirm the hypothesis that Unity requires fewe
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Wasiljew, A., and K. Murawski. "A new CUDA-based GPU implementation of the two-dimensional Athena code." Bulletin of the Polish Academy of Sciences: Technical Sciences 61, no. 1 (2013): 239–50. http://dx.doi.org/10.2478/bpasts-2013-0023.

Texto completo
Resumen
Abstract We present a new version of the Athena code, which solves magnetohydrodynamic equations in two-dimensional space. This new implementation, which we have named Athena-GPU, uses CUDA architecture to allow the code execution on Graphical Processor Unit (GPU). The Athena-GPU code is an unofficial, modified version of the Athena code which was originally designed for Central Processor Unit (CPU) architecture. We perform numerical tests based on the original Athena-CPU code and its GPU counterpart to make a performance analysis, which includes execution time, precision differences and accur
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Tramm, John, Paul Romano, Patrick Shriwise, et al. "Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs." EPJ Web of Conferences 302 (2024): 04010. http://dx.doi.org/10.1051/epjconf/202430204010.

Texto completo
Resumen
OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC’s GPU performance is compared to both the traditional CPU-based version of OpenMC as well as several other state-of-the-art CPU-based Monte Carlo particle transport applications. We also provide historical c
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!