Log in

Relevant bibliographies by topics / NVIDIA GPGUs / Journal articles

To see the other types of publications on this topic, follow the link: NVIDIA GPGUs.

Journal articles on the topic 'NVIDIA GPGUs'

Author: Grafiati

Published: 3 June 2025

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'NVIDIA GPGUs.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Xu, Kaifeng. "NVIDIAs Research and Development Investment: Impact on Financial Performance and Market Valuation." Advances in Economics, Management and Political Sciences 148, no. 1 (2025): 109–17. https://doi.org/10.54254/2754-1169/2024.ld19178.

Full text

Abstract:

This paper provides a detailed financial analysis of NVIDIA Corporation (NVIDIA), a leading technology firm renowned for its advancements in graphics processing units (GPUs), artificial intelligence (AI), data center solutions, autonomous driving, and professional visualization technologies. The analysis delves into NVIDIA's revenue recognition, research and development (R&D) investments, inventory management strategies, and overarching strategic objectives. Utilizing key financial data from fiscal 2024 and the second quarter of fiscal 2025, this study evaluates NVIDIAs recent performance

APA, Harvard, Vancouver, ISO, and other styles

2

Liu, Junjing. "A Financial Analysis and Valuation of NVIDIA." Advances in Economics, Management and Political Sciences 148, no. 1 (2025): 137–42. https://doi.org/10.54254/2754-1169/2024.ld19183.

Full text

Abstract:

This paper analyzes Nvidias financial data and its strategic shift towards the AI and data center markets, which the company has recently entered. Nvidia, initially renowned for its GPUs, has now expanded its expertise into AI, computing, and self-driving cars. The calculation of Nvidias financial ratios for the years 2021-2023 reveals strong liquidity, solvency, and profitability indicators, despite external threats such as export restrictions on American microcircuits in China and increased competition. The company has successfully minimized its reliance on debt and is well-positioned to pro

APA, Harvard, Vancouver, ISO, and other styles

3

Bi, Yujiang, Shun Xu, and Yunheng Ma. "Running Qiskit on ROCm Platform." EPJ Web of Conferences 295 (2024): 11022. http://dx.doi.org/10.1051/epjconf/202429511022.

Full text

Abstract:

Qiskit is one of the common quantum computing frameworks and and the qiskit-aer package can accelerating quantum circuit simulation using NVIDIA GPU with the help of THRUST. AMD ROCm framework similar to CUDA, a heterogeneous computing framework supporting both the NVIDIA and AMD GPUs provides the possibility to porting Qiskit/Qiskit-Aer from CUDA platform to its own. We present the porting progress of Qiskit/QiskitAer and preliminary performance test on both NVIDA and AMD GPUs. Our results show that Qiskit/Qiskit-Aer cand work well on AMD GPUs with the help of ROCm/HIP, and has comparable per

APA, Harvard, Vancouver, ISO, and other styles

4

Zhang, Rui, and Lei Hu. "Research on NVIDIA's Development Strategy." International Journal of Global Economics and Management 5, no. 2 (2024): 79–84. https://doi.org/10.62051/ijgem.v5n2.10.

Full text

Abstract:

NVIDIA, as the world's leading supplier of graphics processing units (GPUs) and artificial intelligence (AI) computing hardware, has achieved rapid development and expansion in recent years. This paper studies the evolution of NVIDIA's development strategy and the factors for its success by analyzing NVIDIA's development history, industry environment and competitive landscape. At the same time, this paper also explores the challenges faced by NVIDIA and the direction of its future development.

APA, Harvard, Vancouver, ISO, and other styles

5

Chen, Shujie. "Research on Nvidia Investment Strategies and Analysis." Highlights in Business, Economics and Management 24 (January 22, 2024): 2234–40. http://dx.doi.org/10.54097/vzd0m812.

Full text

Abstract:

Under the background of macroeconomic factors such as the global economic downturn and imperfect trade policies, sound investment decisions and risk management are all important. This research paper takes Nvidia as an investment sample for investors to conduct a comprehensive analysis and complete overview. Based on an analysis of Nvidia’s annual reports and market value from 2021 to 2023, the study found that Nvidia has shown amazing income growth and profitability, solidifying what is happening as a precursor in the advancement and semiconductor industry. This shows significant growth potent

APA, Harvard, Vancouver, ISO, and other styles

6

Le, Xi. "The Application of DCF in Company Valuation: Case of NVIDIA." Highlights in Business, Economics and Management 39 (August 8, 2024): 244–51. http://dx.doi.org/10.54097/a50yxz91.

Full text

Abstract:

Company valuation has always been a crucial theme in financial analysis and corporate management. It plays an important role in both corporate strategic adjustment and investment selection. NVIDIA is an American leading technology company dedicated to expanding into various areas including graphics processing units (GPUs), artificial intelligence (AI), autonomous vehicles, data centers, and other products. The stock of NVIDIA has been rising for several years and attract much attention from investors. Therefore, this paper combines DCF model with Fundamental analysis to value NVIDIA. Results o

APA, Harvard, Vancouver, ISO, and other styles

7

Bähr, Pascal R., Bruno Lang, Peer Ueberholz, Marton Ady, and Roberto Kersevan. "Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs." International Journal of High Performance Computing Applications 36, no. 2 (2021): 141–52. http://dx.doi.org/10.1177/10943420211056654.

Full text

Abstract:

Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel ru

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Dong, Hua You Su, Wen Mei, Li Xuan Wang, and Chun Yuan Zhang. "Scalable Parallel Motion Estimation on Muti-GPU System." Applied Mechanics and Materials 347-350 (August 2013): 3708–14. http://dx.doi.org/10.4028/www.scientific.net/amm.347-350.3708.

Full text

Abstract:

With NVIDIA’s parallel computing architecture CUDA, using GPU to speed up compute-intensive applications has become a research focus in recent years. In this paper, we proposed a scalable method for multi-GPU system to accelerate motion estimation algorithm, which is the most time consuming process in video encoding. Based on the analysis of data dependency and multi-GPU architecture, a parallel computing model and a communication model are designed. We tested our parallel algorithm and analyzed the performance with 10 standard video sequences in different resolutions using 4 NVIDIA GTX460 GPU

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Hui, Bo Yang, and Zhangxin Chen. "Accelerating algebraic multigrid solvers on NVIDIA GPUs." Computers & Mathematics with Applications 70, no. 5 (2015): 1162–81. http://dx.doi.org/10.1016/j.camwa.2015.07.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lin, Chun-Yuan, Jin Ye, Che-Lun Hung, Chung-Hung Wang, Min Su, and Jianjun Tan. "Constructing a Bioinformatics Platform with Web and Mobile Services Based on NVIDIA Jetson TK1." International Journal of Grid and High Performance Computing 7, no. 4 (2015): 57–73. http://dx.doi.org/10.4018/ijghpc.2015100105.

Full text

Abstract:

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board

APA, Harvard, Vancouver, ISO, and other styles

11

Koszczał, Grzegorz, Mariusz Matuszek, and Paweł Czarnul. "Comparison and analysis of software and hardware energy measurement methods for a CPU+GPU system and selected parallel applications." Computer Science and Information Systems, no. 00 (2025): 23. https://doi.org/10.2298/csis240722023k.

Full text

Abstract:

In this paper authors extend upon their previous research on power capped optimization of performance-energy metrics of deep neural networks training workloads. A professional power meter Yokogawa WT-310E is used, as well as Intel RAPL and Nvidia NVML interfaces, to examine power consumption of a much more comprehensive set of multi-GPU and multi-CPU workloads, including: selected kernels from NAS Parallel Benchmarks for CPUs and GPUs as well as Horovod-Python Xception deep neural network training using several GPUs. A comparison and discussion of results obtained by both power measurement met

APA, Harvard, Vancouver, ISO, and other styles

12

Song, Yifan. "NVIDIA's Market Strategy and Innovation: The Fusion of Technological Leadership and Brand Building." Advances in Economics, Management and Political Sciences 137, no. 1 (2024): 143–50. https://doi.org/10.54254/2754-1169/2024.18671.

Full text

Abstract:

With the rapid development of artificial intelligence (AI) and digital currencies, the global demand for computing power has surged sharply, creating an urgent need for advanced technological solutions. Graphics processing units (GPUs), as a key measure of a company's computing capability, present unprecedented opportunities for manufacturers to innovate and expand their market reach. However, these opportunities are accompanied by significant challenges, including intense competition and the need for continuous technological advancement. In this rapidly evolving landscape, standing out has be

APA, Harvard, Vancouver, ISO, and other styles

13

Špeťko, Matej, Ondřej Vysocký, Branislav Jansík, and Lubomír Říha. "DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation." Energies 14, no. 2 (2021): 376. http://dx.doi.org/10.3390/en14020376.

Full text

Abstract:

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addi

APA, Harvard, Vancouver, ISO, and other styles

14

Kim, Youngtae, and Gyuhyeon Hwang. "Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs." Journal of KIISE 42, no. 12 (2015): 1467–73. http://dx.doi.org/10.5626/jok.2015.42.12.1467.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Dhanuskodi, Gobikrishna, Sudeshna Guha, Vidhya Krishnan, et al. "Creating the First Confidential GPUs." Communications of the ACM 67, no. 1 (2023): 60–67. http://dx.doi.org/10.1145/3626827.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Xu, Jingheng, Guangwen Yang, Haohuan Fu, et al. "Optimizing Finite Volume Method Solvers on Nvidia GPUs." IEEE Transactions on Parallel and Distributed Systems 30, no. 12 (2019): 2790–805. http://dx.doi.org/10.1109/tpds.2019.2926084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

李, 静海, 云. 张, 蔚. 葛, et al. "Lattice Boltzmann simulation on Nvidia and AMD GPUs." Chinese Science Bulletin 54, no. 20 (2009): 3177–84. http://dx.doi.org/10.1360/972009-1347.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Gloster, Andrew, Lennon Ó Náraigh, and Khang Ee Pang. "cuPentBatch—A batched pentadiagonal solver for NVIDIA GPUs." Computer Physics Communications 241 (August 2019): 113–21. http://dx.doi.org/10.1016/j.cpc.2019.03.016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Huang, Xuanteng, Xianwei Zhang, Panfei Yang, and Nong Xiao. "Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS." Applied Sciences 13, no. 24 (2023): 13022. http://dx.doi.org/10.3390/app132413022.

Full text

Abstract:

GPUs have been broadly used to accelerate big data analytics, scientific computing and machine intelligence. Particularly, matrix multiplication and convolution are two principal operations that use a large proportion of steps in modern data analysis and deep neural networks. These performance-critical operations are often offloaded to the GPU to obtain substantial improvements in end-to-end latency. In addition, multifarious workload characteristics and complicated processing phases in big data demand a customizable yet performant operator library. To this end, GPU vendors, including NVIDIA a

APA, Harvard, Vancouver, ISO, and other styles

20

Dhanuskodi, Gobikrishna, Sudeshna Guha, Vidhya Krishnan, et al. "Creating the First Confidential GPUs." Queue 21, no. 4 (2023): 68–93. http://dx.doi.org/10.1145/3623393.3623391.

Full text

Abstract:

Today's datacenter GPU has a long and storied 3D graphics heritage. In the 1990s, graphics chips for PCs and consoles had fixed pipelines for geometry, rasterization, and pixels using integer and fixed-point arithmetic. In 1999, NVIDIA invented the modern GPU, which put a set of programmable cores at the heart of the chip, enabling rich 3D scene generation with great efficiency. It did not take long for developers and researchers to realize 'I could run compute on those parallel cores, and it would be blazing fast.' In 2004, Ian Buck created Brook at Stanford, the first compute library for GPU

APA, Harvard, Vancouver, ISO, and other styles

21

Leinhauser, Matthew, René Widera, Sergei Bastrakov, Alexander Debus, Michael Bussmann, and Sunita Chandrasekaran. "Metrics and Design of an Instruction Roofline Model for AMD GPUs." ACM Transactions on Parallel Computing 9, no. 1 (2022): 1–14. http://dx.doi.org/10.1145/3505285.

Full text

Abstract:

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this article, we design an instruction roofline model for AMD GPUs using AMD’s ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application’s perf

APA, Harvard, Vancouver, ISO, and other styles

22

FUJIMOTO, NORIYUKI. "DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE." Parallel Processing Letters 18, no. 04 (2008): 511–30. http://dx.doi.org/10.1142/s0129626408003545.

Full text

Abstract:

Recently GPUs have acquired the ability to perform fast general purpose computation by running thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on the NVIDIA CUDA architecture. The experiments are conducted on a PC with GeForce 8800GTX and 2.0 GHz Intel Xeon E5335 CPU. The results show that the proposed algorithm runs a maximum of 11.19 times faster than NVIDIA's BLAS library CUBLAS 1.1 on the GPU and 35.15 times faster than the Intel Math Kernel Library 9.1 on a single core x86 with SSE3 SIMD instructions. The performance of Jacobi'

APA, Harvard, Vancouver, ISO, and other styles

23

Bocci, Andrea. "CMS High Level Trigger performance comparison on CPUs and GPUs." Journal of Physics: Conference Series 2438, no. 1 (2023): 012016. http://dx.doi.org/10.1088/1742-6596/2438/1/012016.

Full text

Abstract:

Abstract At the start of the upcoming LHC Run-3, CMS will deploy a heterogeneous High Level Trigger (HLT) farm composed of x86 CPUs and NVIDIA GPUs. In order to guarantee that the HLT can run on machines without any GPU accelerators - for example as part of the large scale Monte Carlo production running on the grid, or when individual developers need to optimise specific triggers - the HLT reconstruction has been implemented both for NVIDIA GPUs and for traditional CPUs. This contribution will describe how the CMS software used online and offline (CMSSW) can transparently switch between the tw

APA, Harvard, Vancouver, ISO, and other styles

24

Myasishchev, A., S. Lienkov, V. Dzhulii, and I. Muliar. "USING GPU NVIDIA FOR LINEAR ALGEBRA PROLEMS." Collection of scientific works of the Military Institute of Kyiv National Taras Shevchenko University, no. 64 (2019): 144–57. http://dx.doi.org/10.17721/2519-481x/2019/64-14.

Full text

Abstract:

Research goals and objectives: the purpose of the article is to study the feasibility of graphics processors using in solving linear equations systems and calculating matrix multiplication as compared with conventional multi-core processors. The peculiarities of the MAGMA and CUBLAS libraries use for various graphics processors are considered. A performance comparison is made between the Tesla C2075 and GeForce GTX 480 GPUs and a six-core AMD processor. Subject of research: the software is developed basing on the MAGMA and CUBLAS libraries for the purpose of the NVIDIA Tesla C2075 and GeForce

APA, Harvard, Vancouver, ISO, and other styles

25

Al-Kharusi, Ibrahim, and David W. Walker. "Locality properties of 3D data orderings with application to parallel molecular dynamics simulations." International Journal of High Performance Computing Applications 33, no. 5 (2019): 998–1018. http://dx.doi.org/10.1177/1094342019846282.

Full text

Abstract:

Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give result

APA, Harvard, Vancouver, ISO, and other styles

26

Tabani, Hamid, Fabio Mazzocchetti, Pedro Benedicte, Jaume Abella, and Francisco J. Cazorla. "Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs." Journal of Parallel and Distributed Computing 152 (June 2021): 21–32. http://dx.doi.org/10.1016/j.jpdc.2021.02.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Ying, Lu Peng, Bin Li, Jih-Kwon Peir, and Jianmin Chen. "Performance and Power Comparisons between NVIDIA and ATI GPUS." International Journal of Computer Science and Information Technology 6, no. 6 (2014): 1–22. http://dx.doi.org/10.5121/ijcsit.2014.6601.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Lashgar, Ahmad, and Amirali Baniasadi. "Efficient implementation of OpenACC cache directive on NVIDIA GPUs." International Journal of High Performance Computing and Networking 13, no. 1 (2019): 35. http://dx.doi.org/10.1504/ijhpcn.2019.097047.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Baniasadi, Amirali, and Ahmad Lashgar. "Efficient implementation of OpenACC cache directive on NVIDIA GPUs." International Journal of High Performance Computing and Networking 13, no. 1 (2019): 35. http://dx.doi.org/10.1504/ijhpcn.2019.10018085.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Marak, Laszlo. "Implementing the Multi-Layer Perceptron Algorithm on NVidia GPUs." International Journal of Engineering & Technology 13, no. 2 (2024): 398–408. https://doi.org/10.14419/r2hvcq88.

Full text

Abstract:

With the adoption of machine learning algorithms for image processing tasks and the ever growing need for embedded device applications, the developers use several methods to optimize the computational efficiency of their applications. Optimization of algorithms can be challenging and developers must apply non-trivial strategies to exploit the computational resources of computer architectures more efficiently. In this article we are describing an efficient GPU implementation for the Multi-Layer Perceptron (MLP) algorithm. The MLP is a basic algorithm for machine learning and artificial intellig

APA, Harvard, Vancouver, ISO, and other styles

31

Ernst, Dominik, Georg Hager, Jonas Thies, and Gerhard Wellein. "Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs." International Journal of High Performance Computing Applications 35, no. 1 (2020): 5–19. http://dx.doi.org/10.1177/1094342020965661.

Full text

Abstract:

General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. NVIDIA’s current CUBLAS implementation delivers only a fraction of the potential performance as indicated by the roofline model in this case. We describe the challenges and key characteristics of an implementation that can achieve close to optimal performance. We further evaluate different strategies of parallelization and thr

APA, Harvard, Vancouver, ISO, and other styles

32

Chilingaryan, Suren, Andrei Shkarin, Roman Shkarin, Matthias Vogelgesang, and Sergey Tsapko. "Benchmark for FFT Libraries." Applied Mechanics and Materials 756 (April 2015): 673–77. http://dx.doi.org/10.4028/www.scientific.net/amm.756.673.

Full text

Abstract:

There are various vendors of FFT libraries, but there is no software available for it automatic benchmarking on all available devices. In this article an application that allows easy measure the performance and precision of various FFT libraries on the available GPUs and CPUs is presented. This application has been used to find out the fastest FFT library for NVIDIA GTX TESLA and NVIDIA GTX TITAN. The obtained results shown that the best implementation is provided by cuFFT library developed by NVIDIA.

APA, Harvard, Vancouver, ISO, and other styles

33

Kommera, Pranay Reddy, Vinay Ramakrishnaiah, Christine Sweeney, Jeffrey Donatelli, and Petrus H. Zwart. "GPU-accelerated multitiered iterative phasing algorithm for fluctuation X-ray scattering." Journal of Applied Crystallography 54, no. 4 (2021): 1179–88. http://dx.doi.org/10.1107/s1600576721005744.

Full text

Abstract:

The multitiered iterative phasing (MTIP) algorithm is used to determine the biological structures of macromolecules from fluctuation scattering data. It is an iterative algorithm that reconstructs the electron density of the sample by matching the computed fluctuation X-ray scattering data to the external observations, and by simultaneously enforcing constraints in real and Fourier space. This paper presents the first ever MTIP algorithm acceleration efforts on contemporary graphics processing units (GPUs). The Compute Unified Device Architecture (CUDA) programming model is used to accelerate

APA, Harvard, Vancouver, ISO, and other styles

34

DeTar, Carleton, Steven Gottlieb, Ruizi Li, and Doug Toussaint. "MILC Code Performance on High End CPU and GPU Supercomputer Clusters." EPJ Web of Conferences 175 (2018): 02009. http://dx.doi.org/10.1051/epjconf/201817502009.

Full text

Abstract:

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentr

APA, Harvard, Vancouver, ISO, and other styles

35

Beceiro, Bieito, Jorge González-Domínguez, Laura Morán-Fernández, Veronica Bolon-Canedo, and Juan Touriño. "CUDA acceleration of MI-based feature selection methods." Journal of Parallel and Distributed Computing 190 (August 5, 2024): 104901. https://doi.org/10.1016/j.jpdc.2024.104901.

Full text

Abstract:

Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of th

APA, Harvard, Vancouver, ISO, and other styles

36

Gilman, Guin, and Robert J. Walls. "Characterizing concurrency mechanisms for NVIDIA GPUs under deep learning workloads." Performance Evaluation 151 (November 2021): 102234. http://dx.doi.org/10.1016/j.peva.2021.102234.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Jorda, Marc, Pedro Valero-Lara, and Antonio J. Pena. "Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs." IEEE Access 7 (2019): 70461–73. http://dx.doi.org/10.1109/access.2019.2918851.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

White, Jack, Karel Adámek, Jayanta Roy, Sofia Dimoudi, Scott M. Ransom, and Wesley Armour. "Bits Missing: Finding Exotic Pulsars Using bfloat16 on NVIDIA GPUs." Astrophysical Journal Supplement Series 265, no. 1 (2023): 13. http://dx.doi.org/10.3847/1538-4365/acb351.

Full text

Abstract:

Abstract The Fourier domain acceleration search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy data sets. This paper quantifies the sensitivity impact of reducing numerical precision in the graphics processing unit (GPU)-accelerated FDAS pipeline of the AstroAccelerate (AA) software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU-accelerated fast Fourier transforms. AA has been modified to use bfloat16 (and IEEE-754 double-precisi

APA, Harvard, Vancouver, ISO, and other styles

39

White, Jack, Karel Adámek, Jayanta Roy, Scott M. Ransom, and Wesley Armour. "Pulscan: Binary Pulsar Detection Using Unmatched Filters on NVIDIA GPUs." Astrophysical Journal Supplement Series 279, no. 1 (2025): 8. https://doi.org/10.3847/1538-4365/adc89e.

Full text

Abstract:

Abstract The Fourier domain acceleration search (FDAS) and Fourier domain jerk search (FDJS) are proven matched-filtering techniques for detecting binary pulsar signatures in time-domain radio astronomy data sets. Next-generation radio telescopes such as the SPOTLIGHT project at the Giant Metrewave Radio Telescope (GMRT) produce data at rates that mandate real-time processing, as storage of the entire captured data set for subsequent offline processing is infeasible. The computational demands of FDAS and FDJS make them challenging to implement in real-time detection pipelines, requiring costly

APA, Harvard, Vancouver, ISO, and other styles

40

Gilman, Guin, and Robert J. Walls. "Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads (Extended Abstract)." ACM SIGMETRICS Performance Evaluation Review 49, no. 3 (2022): 32–34. http://dx.doi.org/10.1145/3529113.3529124.

Full text

Abstract:

Hazelwood et al. observed that at Facebook data centers, variations in user activity (e.g. due to diurnal load) resulted in low utilization periods with large pools of idle resources [4]. To make use of these resources, they proposed using machine learning training tasks. Analagous lowutilization periods have also been observed at the scale of individual GPUs when using both GPU-based inference [1] and training [6]. The proposed solution to this latter problem was colocating additional inference or training tasks on a single GPU.We go a step further than these previous studies by considering t

APA, Harvard, Vancouver, ISO, and other styles

41

Chu, Chen, Jian Wang, Sen Ke Hou, Qi Lv, Guo Qiang Ma, and Xiao Yong Ji. "A Comparative Study of Color Space Conversion on Homogeneous and Heterogeneous Multicore." Applied Mechanics and Materials 519-520 (February 2014): 724–28. http://dx.doi.org/10.4028/www.scientific.net/amm.519-520.724.

Full text

Abstract:

Color space conversion (CSC) is an important kernel in the area of image and video processing applications including video compression. As a matrix math, this operation consumes up to 40% of processing time of a highly optimized decoder. Therefore, techniques which efficiently implement this conversion are desired. Multicore processors provide an opportunity to increase the performance of CSC by exploiting data parallelism. In this paper, we present three novel approaches for efficient implementation of color space conversion suitable for homogeneous and heterogeneous multicore. We compare the

APA, Harvard, Vancouver, ISO, and other styles

42

Sorokin, Maksym V. "Parallelization of numerical solutions of shallow water equations by the finite volume method for implementation on multiprocessor systems and graphics processors." Environmental safety and natural resources 46, no. 2 (2023): 163–93. http://dx.doi.org/10.32347/2411-4049.2023.2.163-193.

Full text

Abstract:

An overview of approaches to parallelization of grid-based numerical methods for solving shallow water equations for multiprocessor systems and graphics processors is presented. A multithreaded approach for shared-memory computing systems implemented on the basis of the OpenMP programming interface and a geometric decomposition approach with message-passing using the MPI library for distributed-memory computers are described. Multithreading for programming GPUs based on the OpenACC software interface is considered. For the COASTOX-UN system of two-dimensional modeling of hydrodynamics, sedimen

APA, Harvard, Vancouver, ISO, and other styles

43

Blyth, Simon. "Meeting the challenge of JUNO simulation with Opticks: GPU optical photon acceleration via NVIDIA® OptiXTM." EPJ Web of Conferences 245 (2020): 11003. http://dx.doi.org/10.1051/epjconf/202024511003.

Full text

Abstract:

Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementa

APA, Harvard, Vancouver, ISO, and other styles

44

Zhang, Kaili. "Analyzing NVIDIA’s Stock Market Reaction Following the Launch of ChatGPT." SHS Web of Conferences 218 (2025): 01034. https://doi.org/10.1051/shsconf/202521801034.

Full text

Abstract:

This study employs an event study methodology to thoroughly analyze the short-term and long-term impact of ChatGPT’s launch on NVIDIA’s stock price. The findings reveal that the initial release of ChatGPT significantly boosted market enthusiasm for investing in NVIDIA, driven by its central role in AI computing infrastructure (e.g., surging demand for GPUs), which propelled short-term stock price gains. However, in the long run, NVIDIA’s stock performance is constrained by multiple factors, including intensified industry competition (e.g., technological catch-up by rivals like AMD and Intel),

APA, Harvard, Vancouver, ISO, and other styles

45

Ortega-Arranz, Hector, Yuri Torres, Arturo Gonzalez-Escribano, and Diego R. Llanos. "Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria." Journal of Supercomputing 70, no. 2 (2014): 786–98. http://dx.doi.org/10.1007/s11227-014-1212-z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Karami, Ali, Farshad Khunjush, and Seyyed Ali Mirsoleimani. "A statistical performance analyzer framework for OpenCL kernels on Nvidia GPUs." Journal of Supercomputing 71, no. 8 (2014): 2900–2921. http://dx.doi.org/10.1007/s11227-014-1338-z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Vázquez, F., J. J. Fernández, and E. M. Garzón. "A new approach for sparse matrix vector product on NVIDIA GPUs." Concurrency and Computation: Practice and Experience 23, no. 8 (2010): 815–26. http://dx.doi.org/10.1002/cpe.1658.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Cao, Kai, Qizhong Wu, Lingling Wang, et al. "GPU-HADVPPM4HIP V1.0: using the heterogeneous-compute interface for portability (HIP) to speed up the piecewise parabolic method in the CAMx (v6.10) air quality model on China's domestic GPU-like accelerator." Geoscientific Model Development 17, no. 17 (2024): 6887–901. http://dx.doi.org/10.5194/gmd-17-6887-2024.

Full text

Abstract:

Abstract. Graphics processing units (GPUs) are becoming a compelling acceleration strategy for geoscience numerical models due to their powerful computing performance. In this study, AMD's heterogeneous-compute interface for portability (HIP) was implemented to port the GPU acceleration version of the piecewise parabolic method (PPM) solver (GPU-HADVPPM) from NVIDIA GPUs to China's domestic GPU-like accelerators like GPU-HADVPPM4HIP. Further, it introduced the multi-level hybrid parallelism scheme to improve the total computational performance of the HIP version of the CAMx (Comprehensive Air

APA, Harvard, Vancouver, ISO, and other styles

49

Blyth, Simon. "Opticks : GPU Optical Photon Simulation for Particle Physics using NVIDIA® OptiXTM." EPJ Web of Conferences 214 (2019): 02027. http://dx.doi.org/10.1051/epjconf/201921402027.

Full text

Abstract:

Opticks is an open source project that integrates the NVIDIA OptiX GPU ray tracing engine with Geant4 toolkit based simulations. Massive parallelism brings drastic performance improvements with optical photon simulation speedup expected to exceed 1000 times Geant4 with workstation GPUs. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented as CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved

APA, Harvard, Vancouver, ISO, and other styles

50

Horie, Satoru, and Alex Fukunaga. "Block-Parallel IDA* for GPUs." Proceedings of the International Symposium on Combinatorial Search 8, no. 1 (2021): 134–38. http://dx.doi.org/10.1609/socs.v8i1.18440.

Full text

Abstract:

We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to a highly optimized sequential IDA* implementation on a

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!