Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: NVIDIA GPGUs.

Artykuły w czasopismach na temat „NVIDIA GPGUs”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „NVIDIA GPGUs”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Xu, Kaifeng. "NVIDIAs Research and Development Investment: Impact on Financial Performance and Market Valuation." Advances in Economics, Management and Political Sciences 148, no. 1 (2025): 109–17. https://doi.org/10.54254/2754-1169/2024.ld19178.

Pełny tekst źródła
Streszczenie:
This paper provides a detailed financial analysis of NVIDIA Corporation (NVIDIA), a leading technology firm renowned for its advancements in graphics processing units (GPUs), artificial intelligence (AI), data center solutions, autonomous driving, and professional visualization technologies. The analysis delves into NVIDIA's revenue recognition, research and development (R&D) investments, inventory management strategies, and overarching strategic objectives. Utilizing key financial data from fiscal 2024 and the second quarter of fiscal 2025, this study evaluates NVIDIAs recent performance
Style APA, Harvard, Vancouver, ISO itp.
2

Liu, Junjing. "A Financial Analysis and Valuation of NVIDIA." Advances in Economics, Management and Political Sciences 148, no. 1 (2025): 137–42. https://doi.org/10.54254/2754-1169/2024.ld19183.

Pełny tekst źródła
Streszczenie:
This paper analyzes Nvidias financial data and its strategic shift towards the AI and data center markets, which the company has recently entered. Nvidia, initially renowned for its GPUs, has now expanded its expertise into AI, computing, and self-driving cars. The calculation of Nvidias financial ratios for the years 2021-2023 reveals strong liquidity, solvency, and profitability indicators, despite external threats such as export restrictions on American microcircuits in China and increased competition. The company has successfully minimized its reliance on debt and is well-positioned to pro
Style APA, Harvard, Vancouver, ISO itp.
3

Bi, Yujiang, Shun Xu, and Yunheng Ma. "Running Qiskit on ROCm Platform." EPJ Web of Conferences 295 (2024): 11022. http://dx.doi.org/10.1051/epjconf/202429511022.

Pełny tekst źródła
Streszczenie:
Qiskit is one of the common quantum computing frameworks and and the qiskit-aer package can accelerating quantum circuit simulation using NVIDIA GPU with the help of THRUST. AMD ROCm framework similar to CUDA, a heterogeneous computing framework supporting both the NVIDIA and AMD GPUs provides the possibility to porting Qiskit/Qiskit-Aer from CUDA platform to its own. We present the porting progress of Qiskit/QiskitAer and preliminary performance test on both NVIDA and AMD GPUs. Our results show that Qiskit/Qiskit-Aer cand work well on AMD GPUs with the help of ROCm/HIP, and has comparable per
Style APA, Harvard, Vancouver, ISO itp.
4

Zhang, Rui, and Lei Hu. "Research on NVIDIA's Development Strategy." International Journal of Global Economics and Management 5, no. 2 (2024): 79–84. https://doi.org/10.62051/ijgem.v5n2.10.

Pełny tekst źródła
Streszczenie:
NVIDIA, as the world's leading supplier of graphics processing units (GPUs) and artificial intelligence (AI) computing hardware, has achieved rapid development and expansion in recent years. This paper studies the evolution of NVIDIA's development strategy and the factors for its success by analyzing NVIDIA's development history, industry environment and competitive landscape. At the same time, this paper also explores the challenges faced by NVIDIA and the direction of its future development.
Style APA, Harvard, Vancouver, ISO itp.
5

Chen, Shujie. "Research on Nvidia Investment Strategies and Analysis." Highlights in Business, Economics and Management 24 (January 22, 2024): 2234–40. http://dx.doi.org/10.54097/vzd0m812.

Pełny tekst źródła
Streszczenie:
Under the background of macroeconomic factors such as the global economic downturn and imperfect trade policies, sound investment decisions and risk management are all important. This research paper takes Nvidia as an investment sample for investors to conduct a comprehensive analysis and complete overview. Based on an analysis of Nvidia’s annual reports and market value from 2021 to 2023, the study found that Nvidia has shown amazing income growth and profitability, solidifying what is happening as a precursor in the advancement and semiconductor industry. This shows significant growth potent
Style APA, Harvard, Vancouver, ISO itp.
6

Le, Xi. "The Application of DCF in Company Valuation: Case of NVIDIA." Highlights in Business, Economics and Management 39 (August 8, 2024): 244–51. http://dx.doi.org/10.54097/a50yxz91.

Pełny tekst źródła
Streszczenie:
Company valuation has always been a crucial theme in financial analysis and corporate management. It plays an important role in both corporate strategic adjustment and investment selection. NVIDIA is an American leading technology company dedicated to expanding into various areas including graphics processing units (GPUs), artificial intelligence (AI), autonomous vehicles, data centers, and other products. The stock of NVIDIA has been rising for several years and attract much attention from investors. Therefore, this paper combines DCF model with Fundamental analysis to value NVIDIA. Results o
Style APA, Harvard, Vancouver, ISO itp.
7

Bähr, Pascal R., Bruno Lang, Peer Ueberholz, Marton Ady, and Roberto Kersevan. "Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs." International Journal of High Performance Computing Applications 36, no. 2 (2021): 141–52. http://dx.doi.org/10.1177/10943420211056654.

Pełny tekst źródła
Streszczenie:
Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel ru
Style APA, Harvard, Vancouver, ISO itp.
8

Chen, Dong, Hua You Su, Wen Mei, Li Xuan Wang, and Chun Yuan Zhang. "Scalable Parallel Motion Estimation on Muti-GPU System." Applied Mechanics and Materials 347-350 (August 2013): 3708–14. http://dx.doi.org/10.4028/www.scientific.net/amm.347-350.3708.

Pełny tekst źródła
Streszczenie:
With NVIDIA’s parallel computing architecture CUDA, using GPU to speed up compute-intensive applications has become a research focus in recent years. In this paper, we proposed a scalable method for multi-GPU system to accelerate motion estimation algorithm, which is the most time consuming process in video encoding. Based on the analysis of data dependency and multi-GPU architecture, a parallel computing model and a communication model are designed. We tested our parallel algorithm and analyzed the performance with 10 standard video sequences in different resolutions using 4 NVIDIA GTX460 GPU
Style APA, Harvard, Vancouver, ISO itp.
9

Liu, Hui, Bo Yang, and Zhangxin Chen. "Accelerating algebraic multigrid solvers on NVIDIA GPUs." Computers & Mathematics with Applications 70, no. 5 (2015): 1162–81. http://dx.doi.org/10.1016/j.camwa.2015.07.005.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Lin, Chun-Yuan, Jin Ye, Che-Lun Hung, Chung-Hung Wang, Min Su, and Jianjun Tan. "Constructing a Bioinformatics Platform with Web and Mobile Services Based on NVIDIA Jetson TK1." International Journal of Grid and High Performance Computing 7, no. 4 (2015): 57–73. http://dx.doi.org/10.4018/ijghpc.2015100105.

Pełny tekst źródła
Streszczenie:
Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board
Style APA, Harvard, Vancouver, ISO itp.
11

Koszczał, Grzegorz, Mariusz Matuszek, and Paweł Czarnul. "Comparison and analysis of software and hardware energy measurement methods for a CPU+GPU system and selected parallel applications." Computer Science and Information Systems, no. 00 (2025): 23. https://doi.org/10.2298/csis240722023k.

Pełny tekst źródła
Streszczenie:
In this paper authors extend upon their previous research on power capped optimization of performance-energy metrics of deep neural networks training workloads. A professional power meter Yokogawa WT-310E is used, as well as Intel RAPL and Nvidia NVML interfaces, to examine power consumption of a much more comprehensive set of multi-GPU and multi-CPU workloads, including: selected kernels from NAS Parallel Benchmarks for CPUs and GPUs as well as Horovod-Python Xception deep neural network training using several GPUs. A comparison and discussion of results obtained by both power measurement met
Style APA, Harvard, Vancouver, ISO itp.
12

Song, Yifan. "NVIDIA's Market Strategy and Innovation: The Fusion of Technological Leadership and Brand Building." Advances in Economics, Management and Political Sciences 137, no. 1 (2024): 143–50. https://doi.org/10.54254/2754-1169/2024.18671.

Pełny tekst źródła
Streszczenie:
With the rapid development of artificial intelligence (AI) and digital currencies, the global demand for computing power has surged sharply, creating an urgent need for advanced technological solutions. Graphics processing units (GPUs), as a key measure of a company's computing capability, present unprecedented opportunities for manufacturers to innovate and expand their market reach. However, these opportunities are accompanied by significant challenges, including intense competition and the need for continuous technological advancement. In this rapidly evolving landscape, standing out has be
Style APA, Harvard, Vancouver, ISO itp.
13

Špeťko, Matej, Ondřej Vysocký, Branislav Jansík, and Lubomír Říha. "DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation." Energies 14, no. 2 (2021): 376. http://dx.doi.org/10.3390/en14020376.

Pełny tekst źródła
Streszczenie:
Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addi
Style APA, Harvard, Vancouver, ISO itp.
14

Kim, Youngtae, and Gyuhyeon Hwang. "Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs." Journal of KIISE 42, no. 12 (2015): 1467–73. http://dx.doi.org/10.5626/jok.2015.42.12.1467.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
15

Dhanuskodi, Gobikrishna, Sudeshna Guha, Vidhya Krishnan, et al. "Creating the First Confidential GPUs." Communications of the ACM 67, no. 1 (2023): 60–67. http://dx.doi.org/10.1145/3626827.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Xu, Jingheng, Guangwen Yang, Haohuan Fu, et al. "Optimizing Finite Volume Method Solvers on Nvidia GPUs." IEEE Transactions on Parallel and Distributed Systems 30, no. 12 (2019): 2790–805. http://dx.doi.org/10.1109/tpds.2019.2926084.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

李, 静海, 云. 张, 蔚. 葛, et al. "Lattice Boltzmann simulation on Nvidia and AMD GPUs." Chinese Science Bulletin 54, no. 20 (2009): 3177–84. http://dx.doi.org/10.1360/972009-1347.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Gloster, Andrew, Lennon Ó Náraigh, and Khang Ee Pang. "cuPentBatch—A batched pentadiagonal solver for NVIDIA GPUs." Computer Physics Communications 241 (August 2019): 113–21. http://dx.doi.org/10.1016/j.cpc.2019.03.016.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Huang, Xuanteng, Xianwei Zhang, Panfei Yang, and Nong Xiao. "Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS." Applied Sciences 13, no. 24 (2023): 13022. http://dx.doi.org/10.3390/app132413022.

Pełny tekst źródła
Streszczenie:
GPUs have been broadly used to accelerate big data analytics, scientific computing and machine intelligence. Particularly, matrix multiplication and convolution are two principal operations that use a large proportion of steps in modern data analysis and deep neural networks. These performance-critical operations are often offloaded to the GPU to obtain substantial improvements in end-to-end latency. In addition, multifarious workload characteristics and complicated processing phases in big data demand a customizable yet performant operator library. To this end, GPU vendors, including NVIDIA a
Style APA, Harvard, Vancouver, ISO itp.
20

Dhanuskodi, Gobikrishna, Sudeshna Guha, Vidhya Krishnan, et al. "Creating the First Confidential GPUs." Queue 21, no. 4 (2023): 68–93. http://dx.doi.org/10.1145/3623393.3623391.

Pełny tekst źródła
Streszczenie:
Today's datacenter GPU has a long and storied 3D graphics heritage. In the 1990s, graphics chips for PCs and consoles had fixed pipelines for geometry, rasterization, and pixels using integer and fixed-point arithmetic. In 1999, NVIDIA invented the modern GPU, which put a set of programmable cores at the heart of the chip, enabling rich 3D scene generation with great efficiency. It did not take long for developers and researchers to realize 'I could run compute on those parallel cores, and it would be blazing fast.' In 2004, Ian Buck created Brook at Stanford, the first compute library for GPU
Style APA, Harvard, Vancouver, ISO itp.
21

Leinhauser, Matthew, René Widera, Sergei Bastrakov, Alexander Debus, Michael Bussmann, and Sunita Chandrasekaran. "Metrics and Design of an Instruction Roofline Model for AMD GPUs." ACM Transactions on Parallel Computing 9, no. 1 (2022): 1–14. http://dx.doi.org/10.1145/3505285.

Pełny tekst źródła
Streszczenie:
Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this article, we design an instruction roofline model for AMD GPUs using AMD’s ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application’s perf
Style APA, Harvard, Vancouver, ISO itp.
22

FUJIMOTO, NORIYUKI. "DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE." Parallel Processing Letters 18, no. 04 (2008): 511–30. http://dx.doi.org/10.1142/s0129626408003545.

Pełny tekst źródła
Streszczenie:
Recently GPUs have acquired the ability to perform fast general purpose computation by running thousands of threads concurrently. This paper presents a new algorithm for dense matrix-vector multiplication on the NVIDIA CUDA architecture. The experiments are conducted on a PC with GeForce 8800GTX and 2.0 GHz Intel Xeon E5335 CPU. The results show that the proposed algorithm runs a maximum of 11.19 times faster than NVIDIA's BLAS library CUBLAS 1.1 on the GPU and 35.15 times faster than the Intel Math Kernel Library 9.1 on a single core x86 with SSE3 SIMD instructions. The performance of Jacobi'
Style APA, Harvard, Vancouver, ISO itp.
23

Bocci, Andrea. "CMS High Level Trigger performance comparison on CPUs and GPUs." Journal of Physics: Conference Series 2438, no. 1 (2023): 012016. http://dx.doi.org/10.1088/1742-6596/2438/1/012016.

Pełny tekst źródła
Streszczenie:
Abstract At the start of the upcoming LHC Run-3, CMS will deploy a heterogeneous High Level Trigger (HLT) farm composed of x86 CPUs and NVIDIA GPUs. In order to guarantee that the HLT can run on machines without any GPU accelerators - for example as part of the large scale Monte Carlo production running on the grid, or when individual developers need to optimise specific triggers - the HLT reconstruction has been implemented both for NVIDIA GPUs and for traditional CPUs. This contribution will describe how the CMS software used online and offline (CMSSW) can transparently switch between the tw
Style APA, Harvard, Vancouver, ISO itp.
24

Myasishchev, A., S. Lienkov, V. Dzhulii, and I. Muliar. "USING GPU NVIDIA FOR LINEAR ALGEBRA PROLEMS." Collection of scientific works of the Military Institute of Kyiv National Taras Shevchenko University, no. 64 (2019): 144–57. http://dx.doi.org/10.17721/2519-481x/2019/64-14.

Pełny tekst źródła
Streszczenie:
Research goals and objectives: the purpose of the article is to study the feasibility of graphics processors using in solving linear equations systems and calculating matrix multiplication as compared with conventional multi-core processors. The peculiarities of the MAGMA and CUBLAS libraries use for various graphics processors are considered. A performance comparison is made between the Tesla C2075 and GeForce GTX 480 GPUs and a six-core AMD processor. Subject of research: the software is developed basing on the MAGMA and CUBLAS libraries for the purpose of the NVIDIA Tesla C2075 and GeForce
Style APA, Harvard, Vancouver, ISO itp.
25

Al-Kharusi, Ibrahim, and David W. Walker. "Locality properties of 3D data orderings with application to parallel molecular dynamics simulations." International Journal of High Performance Computing Applications 33, no. 5 (2019): 998–1018. http://dx.doi.org/10.1177/1094342019846282.

Pełny tekst źródła
Streszczenie:
Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give result
Style APA, Harvard, Vancouver, ISO itp.
26

Tabani, Hamid, Fabio Mazzocchetti, Pedro Benedicte, Jaume Abella, and Francisco J. Cazorla. "Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs." Journal of Parallel and Distributed Computing 152 (June 2021): 21–32. http://dx.doi.org/10.1016/j.jpdc.2021.02.008.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

Zhang, Ying, Lu Peng, Bin Li, Jih-Kwon Peir, and Jianmin Chen. "Performance and Power Comparisons between NVIDIA and ATI GPUS." International Journal of Computer Science and Information Technology 6, no. 6 (2014): 1–22. http://dx.doi.org/10.5121/ijcsit.2014.6601.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
28

Lashgar, Ahmad, and Amirali Baniasadi. "Efficient implementation of OpenACC cache directive on NVIDIA GPUs." International Journal of High Performance Computing and Networking 13, no. 1 (2019): 35. http://dx.doi.org/10.1504/ijhpcn.2019.097047.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Baniasadi, Amirali, and Ahmad Lashgar. "Efficient implementation of OpenACC cache directive on NVIDIA GPUs." International Journal of High Performance Computing and Networking 13, no. 1 (2019): 35. http://dx.doi.org/10.1504/ijhpcn.2019.10018085.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

Marak, Laszlo. "Implementing the Multi-Layer Perceptron Algorithm on NVidia GPUs." International Journal of Engineering & Technology 13, no. 2 (2024): 398–408. https://doi.org/10.14419/r2hvcq88.

Pełny tekst źródła
Streszczenie:
With the adoption of machine learning algorithms for image processing tasks and the ever growing need for embedded device applications, the developers use several methods to optimize the computational efficiency of their applications. Optimization of algorithms can be challenging and developers must apply non-trivial strategies to exploit the computational resources of computer architectures more efficiently. In this article we are describing an efficient GPU implementation for the Multi-Layer Perceptron (MLP) algorithm. The MLP is a basic algorithm for machine learning and artificial intellig
Style APA, Harvard, Vancouver, ISO itp.
31

Ernst, Dominik, Georg Hager, Jonas Thies, and Gerhard Wellein. "Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs." International Journal of High Performance Computing Applications 35, no. 1 (2020): 5–19. http://dx.doi.org/10.1177/1094342020965661.

Pełny tekst źródła
Streszczenie:
General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. NVIDIA’s current CUBLAS implementation delivers only a fraction of the potential performance as indicated by the roofline model in this case. We describe the challenges and key characteristics of an implementation that can achieve close to optimal performance. We further evaluate different strategies of parallelization and thr
Style APA, Harvard, Vancouver, ISO itp.
32

Chilingaryan, Suren, Andrei Shkarin, Roman Shkarin, Matthias Vogelgesang, and Sergey Tsapko. "Benchmark for FFT Libraries." Applied Mechanics and Materials 756 (April 2015): 673–77. http://dx.doi.org/10.4028/www.scientific.net/amm.756.673.

Pełny tekst źródła
Streszczenie:
There are various vendors of FFT libraries, but there is no software available for it automatic benchmarking on all available devices. In this article an application that allows easy measure the performance and precision of various FFT libraries on the available GPUs and CPUs is presented. This application has been used to find out the fastest FFT library for NVIDIA GTX TESLA and NVIDIA GTX TITAN. The obtained results shown that the best implementation is provided by cuFFT library developed by NVIDIA.
Style APA, Harvard, Vancouver, ISO itp.
33

Kommera, Pranay Reddy, Vinay Ramakrishnaiah, Christine Sweeney, Jeffrey Donatelli, and Petrus H. Zwart. "GPU-accelerated multitiered iterative phasing algorithm for fluctuation X-ray scattering." Journal of Applied Crystallography 54, no. 4 (2021): 1179–88. http://dx.doi.org/10.1107/s1600576721005744.

Pełny tekst źródła
Streszczenie:
The multitiered iterative phasing (MTIP) algorithm is used to determine the biological structures of macromolecules from fluctuation scattering data. It is an iterative algorithm that reconstructs the electron density of the sample by matching the computed fluctuation X-ray scattering data to the external observations, and by simultaneously enforcing constraints in real and Fourier space. This paper presents the first ever MTIP algorithm acceleration efforts on contemporary graphics processing units (GPUs). The Compute Unified Device Architecture (CUDA) programming model is used to accelerate
Style APA, Harvard, Vancouver, ISO itp.
34

DeTar, Carleton, Steven Gottlieb, Ruizi Li, and Doug Toussaint. "MILC Code Performance on High End CPU and GPU Supercomputer Clusters." EPJ Web of Conferences 175 (2018): 02009. http://dx.doi.org/10.1051/epjconf/201817502009.

Pełny tekst źródła
Streszczenie:
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentr
Style APA, Harvard, Vancouver, ISO itp.
35

Beceiro, Bieito, Jorge González-Domínguez, Laura Morán-Fernández, Veronica Bolon-Canedo, and Juan Touriño. "CUDA acceleration of MI-based feature selection methods." Journal of Parallel and Distributed Computing 190 (August 5, 2024): 104901. https://doi.org/10.1016/j.jpdc.2024.104901.

Pełny tekst źródła
Streszczenie:
Feature selection algorithms are necessary nowadays for machine learning as they are capable of removing irrelevant and redundant information to reduce the dimensionality of the data and improve the quality of subsequent analyses. The problem with current feature selection approaches is that they are computationally expensive when processing large datasets. This work presents parallel implementations for Nvidia GPUs of three highly-used feature selection methods based on the Mutual Information (MI) metric: mRMR, JMI and DISR. Publicly available code includes not only CUDA implementations of th
Style APA, Harvard, Vancouver, ISO itp.
36

Gilman, Guin, and Robert J. Walls. "Characterizing concurrency mechanisms for NVIDIA GPUs under deep learning workloads." Performance Evaluation 151 (November 2021): 102234. http://dx.doi.org/10.1016/j.peva.2021.102234.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Jorda, Marc, Pedro Valero-Lara, and Antonio J. Pena. "Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs." IEEE Access 7 (2019): 70461–73. http://dx.doi.org/10.1109/access.2019.2918851.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

White, Jack, Karel Adámek, Jayanta Roy, Sofia Dimoudi, Scott M. Ransom, and Wesley Armour. "Bits Missing: Finding Exotic Pulsars Using bfloat16 on NVIDIA GPUs." Astrophysical Journal Supplement Series 265, no. 1 (2023): 13. http://dx.doi.org/10.3847/1538-4365/acb351.

Pełny tekst źródła
Streszczenie:
Abstract The Fourier domain acceleration search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy data sets. This paper quantifies the sensitivity impact of reducing numerical precision in the graphics processing unit (GPU)-accelerated FDAS pipeline of the AstroAccelerate (AA) software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU-accelerated fast Fourier transforms. AA has been modified to use bfloat16 (and IEEE-754 double-precisi
Style APA, Harvard, Vancouver, ISO itp.
39

White, Jack, Karel Adámek, Jayanta Roy, Scott M. Ransom, and Wesley Armour. "Pulscan: Binary Pulsar Detection Using Unmatched Filters on NVIDIA GPUs." Astrophysical Journal Supplement Series 279, no. 1 (2025): 8. https://doi.org/10.3847/1538-4365/adc89e.

Pełny tekst źródła
Streszczenie:
Abstract The Fourier domain acceleration search (FDAS) and Fourier domain jerk search (FDJS) are proven matched-filtering techniques for detecting binary pulsar signatures in time-domain radio astronomy data sets. Next-generation radio telescopes such as the SPOTLIGHT project at the Giant Metrewave Radio Telescope (GMRT) produce data at rates that mandate real-time processing, as storage of the entire captured data set for subsequent offline processing is infeasible. The computational demands of FDAS and FDJS make them challenging to implement in real-time detection pipelines, requiring costly
Style APA, Harvard, Vancouver, ISO itp.
40

Gilman, Guin, and Robert J. Walls. "Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads (Extended Abstract)." ACM SIGMETRICS Performance Evaluation Review 49, no. 3 (2022): 32–34. http://dx.doi.org/10.1145/3529113.3529124.

Pełny tekst źródła
Streszczenie:
Hazelwood et al. observed that at Facebook data centers, variations in user activity (e.g. due to diurnal load) resulted in low utilization periods with large pools of idle resources [4]. To make use of these resources, they proposed using machine learning training tasks. Analagous lowutilization periods have also been observed at the scale of individual GPUs when using both GPU-based inference [1] and training [6]. The proposed solution to this latter problem was colocating additional inference or training tasks on a single GPU.We go a step further than these previous studies by considering t
Style APA, Harvard, Vancouver, ISO itp.
41

Chu, Chen, Jian Wang, Sen Ke Hou, Qi Lv, Guo Qiang Ma, and Xiao Yong Ji. "A Comparative Study of Color Space Conversion on Homogeneous and Heterogeneous Multicore." Applied Mechanics and Materials 519-520 (February 2014): 724–28. http://dx.doi.org/10.4028/www.scientific.net/amm.519-520.724.

Pełny tekst źródła
Streszczenie:
Color space conversion (CSC) is an important kernel in the area of image and video processing applications including video compression. As a matrix math, this operation consumes up to 40% of processing time of a highly optimized decoder. Therefore, techniques which efficiently implement this conversion are desired. Multicore processors provide an opportunity to increase the performance of CSC by exploiting data parallelism. In this paper, we present three novel approaches for efficient implementation of color space conversion suitable for homogeneous and heterogeneous multicore. We compare the
Style APA, Harvard, Vancouver, ISO itp.
42

Sorokin, Maksym V. "Parallelization of numerical solutions of shallow water equations by the finite volume method for implementation on multiprocessor systems and graphics processors." Environmental safety and natural resources 46, no. 2 (2023): 163–93. http://dx.doi.org/10.32347/2411-4049.2023.2.163-193.

Pełny tekst źródła
Streszczenie:
An overview of approaches to parallelization of grid-based numerical methods for solving shallow water equations for multiprocessor systems and graphics processors is presented. A multithreaded approach for shared-memory computing systems implemented on the basis of the OpenMP programming interface and a geometric decomposition approach with message-passing using the MPI library for distributed-memory computers are described. Multithreading for programming GPUs based on the OpenACC software interface is considered. For the COASTOX-UN system of two-dimensional modeling of hydrodynamics, sedimen
Style APA, Harvard, Vancouver, ISO itp.
43

Blyth, Simon. "Meeting the challenge of JUNO simulation with Opticks: GPU optical photon acceleration via NVIDIA® OptiXTM." EPJ Web of Conferences 245 (2020): 11003. http://dx.doi.org/10.1051/epjconf/202024511003.

Pełny tekst źródła
Streszczenie:
Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementa
Style APA, Harvard, Vancouver, ISO itp.
44

Zhang, Kaili. "Analyzing NVIDIA’s Stock Market Reaction Following the Launch of ChatGPT." SHS Web of Conferences 218 (2025): 01034. https://doi.org/10.1051/shsconf/202521801034.

Pełny tekst źródła
Streszczenie:
This study employs an event study methodology to thoroughly analyze the short-term and long-term impact of ChatGPT’s launch on NVIDIA’s stock price. The findings reveal that the initial release of ChatGPT significantly boosted market enthusiasm for investing in NVIDIA, driven by its central role in AI computing infrastructure (e.g., surging demand for GPUs), which propelled short-term stock price gains. However, in the long run, NVIDIA’s stock performance is constrained by multiple factors, including intensified industry competition (e.g., technological catch-up by rivals like AMD and Intel),
Style APA, Harvard, Vancouver, ISO itp.
45

Ortega-Arranz, Hector, Yuri Torres, Arturo Gonzalez-Escribano, and Diego R. Llanos. "Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria." Journal of Supercomputing 70, no. 2 (2014): 786–98. http://dx.doi.org/10.1007/s11227-014-1212-z.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
46

Karami, Ali, Farshad Khunjush, and Seyyed Ali Mirsoleimani. "A statistical performance analyzer framework for OpenCL kernels on Nvidia GPUs." Journal of Supercomputing 71, no. 8 (2014): 2900–2921. http://dx.doi.org/10.1007/s11227-014-1338-z.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
47

Vázquez, F., J. J. Fernández, and E. M. Garzón. "A new approach for sparse matrix vector product on NVIDIA GPUs." Concurrency and Computation: Practice and Experience 23, no. 8 (2010): 815–26. http://dx.doi.org/10.1002/cpe.1658.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Cao, Kai, Qizhong Wu, Lingling Wang, et al. "GPU-HADVPPM4HIP V1.0: using the heterogeneous-compute interface for portability (HIP) to speed up the piecewise parabolic method in the CAMx (v6.10) air quality model on China's domestic GPU-like accelerator." Geoscientific Model Development 17, no. 17 (2024): 6887–901. http://dx.doi.org/10.5194/gmd-17-6887-2024.

Pełny tekst źródła
Streszczenie:
Abstract. Graphics processing units (GPUs) are becoming a compelling acceleration strategy for geoscience numerical models due to their powerful computing performance. In this study, AMD's heterogeneous-compute interface for portability (HIP) was implemented to port the GPU acceleration version of the piecewise parabolic method (PPM) solver (GPU-HADVPPM) from NVIDIA GPUs to China's domestic GPU-like accelerators like GPU-HADVPPM4HIP. Further, it introduced the multi-level hybrid parallelism scheme to improve the total computational performance of the HIP version of the CAMx (Comprehensive Air
Style APA, Harvard, Vancouver, ISO itp.
49

Blyth, Simon. "Opticks : GPU Optical Photon Simulation for Particle Physics using NVIDIA® OptiXTM." EPJ Web of Conferences 214 (2019): 02027. http://dx.doi.org/10.1051/epjconf/201921402027.

Pełny tekst źródła
Streszczenie:
Opticks is an open source project that integrates the NVIDIA OptiX GPU ray tracing engine with Geant4 toolkit based simulations. Massive parallelism brings drastic performance improvements with optical photon simulation speedup expected to exceed 1000 times Geant4 with workstation GPUs. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented as CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved
Style APA, Harvard, Vancouver, ISO itp.
50

Horie, Satoru, and Alex Fukunaga. "Block-Parallel IDA* for GPUs." Proceedings of the International Symposium on Combinatorial Search 8, no. 1 (2021): 134–38. http://dx.doi.org/10.1609/socs.v8i1.18440.

Pełny tekst źródła
Streszczenie:
We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to a highly optimized sequential IDA* implementation on a
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!