Journal articles: 'NVIDIA'

1

Санжаров, В. В., В. А. Фролов, and В. А. Галактионов. "ИССЛЕДОВАНИЕ ТЕХНОЛОГИИ Nvidia RTX." Программирование, no. 4 (2020): 65–72. http://dx.doi.org/10.31857/s0132347420030061.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Nangla, Siddhante. "GPU Programming using NVIDIA CUDA." International Journal for Research in Applied Science and Engineering Technology 6, no. 6 (June 30, 2018): 79–84. http://dx.doi.org/10.22214/ijraset.2018.6016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sanzharov, V. V., V. A. Frolov, and V. A. Galaktionov. "Survey of Nvidia RTX Technology." Programming and Computer Software 46, no. 4 (July 2020): 297–304. http://dx.doi.org/10.1134/s0361768820030068.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lin, Chun-Yuan, Jin Ye, Che-Lun Hung, Chung-Hung Wang, Min Su, and Jianjun Tan. "Constructing a Bioinformatics Platform with Web and Mobile Services Based on NVIDIA Jetson TK1." International Journal of Grid and High Performance Computing 7, no. 4 (October 2015): 57–73. http://dx.doi.org/10.4018/ijghpc.2015100105.

Full text

Abstract:

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.

APA, Harvard, Vancouver, ISO, and other styles

5

Fasi, Massimiliano, Nicholas J. Higham, Mantas Mikaitis, and Srikara Pranesh. "Numerical behavior of NVIDIA tensor cores." PeerJ Computer Science 7 (February 10, 2021): e330. http://dx.doi.org/10.7717/peerj-cs.330.

Full text

Abstract:

We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step.

APA, Harvard, Vancouver, ISO, and other styles

6

Peng, Tao, Dingnan Zhang, Don Lahiru Nirmal Hettiarachchi, and John Loomis. "An Evaluation of Embedded GPU Systems for Visual SLAM Algorithms." Electronic Imaging 2020, no. 6 (January 26, 2020): 325–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.6.iriacv-074.

Full text

Abstract:

Simultaneous Localization and Mapping (SLAM) solves the computational problem of estimating the location of a robot and the map of the environment. SLAM is widely used in the area of navigation, odometry, and mobile robot mapping. However, the performance and efficiency of the small industrial mobile robots and unmanned aerial vehicles (UAVs) are highly constrained to the battery capacity. Therefore, a mobile robot, especially a UAV, requires low power consumption while maintaining high performance. This paper demonstrates holistic and quantitative performance evaluations of embedded computing devices that run on the Nvidia Jetson platform. Evaluations are based on the execution of two state-of-the-art Visual SLAM algorithms, ORB-SLAM2 and OpenVSLAM, on Nvidia Jetson Nano, Nvidia Jetson TX2, and Nvidia Jetson Xavier.

APA, Harvard, Vancouver, ISO, and other styles

7

Chilingaryan, Suren, Andrei Shkarin, Roman Shkarin, Matthias Vogelgesang, and Sergey Tsapko. "Benchmark for FFT Libraries." Applied Mechanics and Materials 756 (April 2015): 673–77. http://dx.doi.org/10.4028/www.scientific.net/amm.756.673.

Full text

Abstract:

There are various vendors of FFT libraries, but there is no software available for it automatic benchmarking on all available devices. In this article an application that allows easy measure the performance and precision of various FFT libraries on the available GPUs and CPUs is presented. This application has been used to find out the fastest FFT library for NVIDIA GTX TESLA and NVIDIA GTX TITAN. The obtained results shown that the best implementation is provided by cuFFT library developed by NVIDIA.

APA, Harvard, Vancouver, ISO, and other styles

8

Zhu, Li, and Yi Min Yang. "Real-Time Multitasking Video Encoding Processing System of Multicore." Applied Mechanics and Materials 66-68 (July 2011): 2074–79. http://dx.doi.org/10.4028/www.scientific.net/amm.66-68.2074.

Full text

Abstract:

This paper achieved the optimize which is based on the Series processors Produced by NVIDIA, such as Geforce, Tegra, Nexus and so on, and discussed the future development of the video image processor. Expounded the most popular DSP optimization techniques and objectives in the current, to optimized the design for the methods of the various papers available in existence. Based on the NVIDIA's series of products, specific discussed CUDA GPU architecture based on NVIDIA's products, raised the hardware and algorithms of the current most popular video encoding equipment, based on real practical technology to improve the transmission and encoding of multimedia data.

APA, Harvard, Vancouver, ISO, and other styles

9

Blyth, Simon. "Meeting the challenge of JUNO simulation with Opticks: GPU optical photon acceleration via NVIDIA® OptiXTM." EPJ Web of Conferences 245 (2020): 11003. http://dx.doi.org/10.1051/epjconf/202024511003.

Full text

Abstract:

Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved into GPU textures providing fast interpolated property lookup or wavelength generation. Major recent developments enable Opticks to benefit from ray trace dedicated RT cores available in NVIDIA RTX series GPUs. Results of extensive validation tests are presented.

APA, Harvard, Vancouver, ISO, and other styles

10

McCarthy, Dylan, and J¨ Urgen P. Schulze. "Distributed VR Rendering Using NVIDIA OptiX." Electronic Imaging 2017, no. 3 (January 29, 2017): 36–41. http://dx.doi.org/10.2352/issn.2470-1173.2017.3.ervr-095.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Philippidis, Alex. "NVIDIA Powers Up in Drug Discovery." GEN Edge 2, no. 1 (January 1, 2020): 277–83. http://dx.doi.org/10.1089/genedge.2.1.48.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Burgess, John. "RTX on—The NVIDIA Turing GPU." IEEE Micro 40, no. 2 (March 1, 2020): 36–44. http://dx.doi.org/10.1109/mm.2020.2971677.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Pogorilyy, S. D., D. Yu Vitel, and O. A. Vereshchynsky. "Новітні архітектури відеоадаптерів. Технологія GPGPU. Частина 2." Реєстрація, зберігання і обробка даних 15, no. 1 (April 4, 2013): 71–81. http://dx.doi.org/10.35681/1560-9189.2013.15.1.103367.

Full text

Abstract:

Детально розглянуто основні принципи роботи зі спільною та розподіленою пам’яттю в технології NVidia CUDA. Описано шаблони взаємодії потоків і проблеми глобальної синхронізації. Проведено порівняльний аналіз основних технологій, що використовуються в підході GPGPU — Nvidia CUDA, OpenCL, Direct Compute.

APA, Harvard, Vancouver, ISO, and other styles

14

Adrian, Yosef, Rachel Caroline Lesmana, and Sudimanto. "Analisis Performa pada Video Graphic Array (VGA) Nvidia GTX 950M DDR3 dan Nvidia GTX 950M GDDR5." Media Informatika 20, no. 2 (July 31, 2021): 91–96. http://dx.doi.org/10.37595/mediainfo.v20i2.74.

Full text

Abstract:

Video Graphic Array card (VGA) berfungsi untuk memproses data grafik atau sinyal digital pada komputer lalu kemudian sinyal grafik tersebut ditransfer ke layar monitor. VGA card yang dipakai untuk benchmark adalah NVIDIA GTX 950M GDDR5 dan NVIDIA GTX 950M DDR3. VGA card ini cukup diminati karena harga yang terjangkau serta performa yang tinggi dimana kartu grafis ini sudah mempunyai lebar jalur data (bus width) sebesar 128 bit serta terdiri dari varian DDR3 dan GDDR5. Pengambilan data spesifikasi dan performa diambil dari situs Jagat Review Gatot Tri [9] dan Notebookcheck Otshoff [8] yang mana dari data tersebut akan dilakukan analisa terhadap hasil benchmark yang didapat. Berdasarkan data dari spesifikasi dan benchmarking VGA card yang diperoleh, diketahui bahwa perbedaan Double-Data-Rate (DDR) pada Video Random Access Memory (VRAM) berpengaruh besar pada proses merender dan menampilkan gambar. VGA card memiliki komponen-komponen yang saling bekerja sama secara sederhana yaitu chip GPU, besar VRAM, dan tipe DDR. Oleh karena itu, perbedaan tipe DDR pada sebuah VGA card merupakan hal penting dalam performansi sebuah kartu grafis.

APA, Harvard, Vancouver, ISO, and other styles

15

Blyth, Simon. "Integration of JUNO simulation framework with Opticks: GPU accelerated optical propagation via NVIDIA® OptiX™." EPJ Web of Conferences 251 (2021): 03009. http://dx.doi.org/10.1051/epjconf/202125103009.

Full text

Abstract:

Opticks is an open source project that accelerates optical photon simulation by integrating NVIDIA GPU ray tracing, accessed via NVIDIA OptiX, with Geant4 toolkit based simulations. A single NVIDIA Turing architecture GPU has been measured to provide optical photon simulation speedup factors exceeding 1500 times single threaded Geant4 with a full JUNO analytic GPU geometry automatically translated from the Geant4 geometry. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented within CUDA OptiX programs based on the Geant4 implementations. Wavelength-dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved into GPU textures providing fast interpolated property lookup or wavelength generation. In this work we describe major recent developments to facilitate integration of Opticks with the JUNO simulation framework including on GPU collection effciency hit culling which substantially reduces both the CPU memory needed for photon hits and copying overheads. Also progress with the migration of Opticks to the all new NVIDIA OptiX 7 API is described.

APA, Harvard, Vancouver, ISO, and other styles

16

Rasyad, Muhammad Abdullah, Favian Dewanta, and Sri Astuti. "All-in-one computation vs computational-offloading approaches: a performance evaluation of object detection strategies on android mobile devices." JURNAL INFOTEL 13, no. 4 (December 9, 2021): 216–22. http://dx.doi.org/10.20895/infotel.v13i4.700.

Full text

Abstract:

Object detection gives a computer ability to classify objects in an image or video. However, high specified devices are needed to get a good performance. To enable devices with low specifications performs better, one way is offloading the computation process from a device with a low specification to another device with better specifications. This paper investigates the performance of object detection strategies on all-in-one Android mobile phone computation versus Android mobile phone computation with computational offloading on Nvidia Jetson Nano. The experiment carries out the video surveillance from the Android mobile phone with two scenarios, all-in-one object detection computation in a single Android device and decoupled object detection computation between an Android device and an Nvidia Jetson Nano. Android applications send video input for object detection using RTSP/RTMP streaming protocol and received by Nvidia Jetson Nano which acts as an RTSP/RTMP server. Then, the output of object detection is sent back to the Android device for being displayed to the user. The results show that the android device Huawei Y7 Pro with an average FPS performance of 1.82 and an average computing speed of 552 ms significantly improves when working with the Nvidia Jetson Nano, the average FPS becomes ten and the average computing speed becomes 95 ms. It means decoupling object detection computation between an Android device and an Nvidia Jetson Nano using the system provided in this paper successfully improves the detection speed performance.

APA, Harvard, Vancouver, ISO, and other styles

17

Радојчин, Милош. "КОМПАРАТИВНА АНАЛИЗА ТЕХНОЛОГИЈА ЗА МАШИНСКО УЧЕЊЕ НА ИВИЦИ ПРИМЕНОМ NVIDIA JETSON TX2 УРЕЂАЈА." Zbornik radova Fakulteta tehničkih nauka u Novom Sadu 37, no. 02 (February 3, 2022): 250–53. http://dx.doi.org/10.24867/16be23radojcin.

Full text

Abstract:

У овом раду дате су теоријске основе машинског учења на ивици и обраде тока података, значење термина Machine Learning Operations и опис Nvidia Jetson TX2 уређаја. Затим су анализиране технологије за машинско учење на ивици и дате су њихове компаративне анализе. Неке од ових технологија су примењене на решење демографске аналитике коришћењем Nvidia Jetson TX2 уређаја.

APA, Harvard, Vancouver, ISO, and other styles

18

Myasishchev, A., S. Lienkov, V. Dzhulii, and I. Muliar. "USING GPU NVIDIA FOR LINEAR ALGEBRA PROLEMS." Collection of scientific works of the Military Institute of Kyiv National Taras Shevchenko University, no. 64 (2019): 144–57. http://dx.doi.org/10.17721/2519-481x/2019/64-14.

Full text

Abstract:

Research goals and objectives: the purpose of the article is to study the feasibility of graphics processors using in solving linear equations systems and calculating matrix multiplication as compared with conventional multi-core processors. The peculiarities of the MAGMA and CUBLAS libraries use for various graphics processors are considered. A performance comparison is made between the Tesla C2075 and GeForce GTX 480 GPUs and a six-core AMD processor. Subject of research: the software is developed basing on the MAGMA and CUBLAS libraries for the purpose of the NVIDIA Tesla C2075 and GeForce GTX 480 GPUs performance study for linear equation systems solving and matrix multiplication calculating. Research methods used: libraries were used to parallelize the linear algebra problems solution. For GPUs, these are MAGMA and CUBLAS, for multi-core processors, the ScaLAPACK and ATLAS libraries. To study the operational speed there are used methods and algorithms of computational procedures parallelization similar to these libraries. A software module has been developed for linear equations systems solving and matrix multiplication calculating by parallel systems. Results of the research: it has been determined that for double-precision numbers the GPU GeForce GTX 480 and the GPU Tesla C2075 performance is approximately 3.5 and 6.3 times higher than that of the AMD CPU. And the GPU GeForce GTX 480 performance is 1.3 times higher than the GPU Tesla C2075 performance for single precision numbers. To achieve maximum performance of the NVIDIA CUDA GPU, you need to use the MAGMA or CUBLAS libraries, which accelerate the calculations by about 6.4 times as compared to the traditional programming method. It has been determined that in equations systems solving on a 6-core CPU, it is possible to achieve a maximum acceleration of 3.24 times as compared to calculations on the 1st core using the ScaLAPACK and ATLAS libraries instead of 6-fold theoretical acceleration. Therefore, it is impossible to efficiently use processors with a large number of cores with considered libraries. It is demonstrated that the advantage of the GPU over the CPU increases with the number of equations.

APA, Harvard, Vancouver, ISO, and other styles

19

Elster, Anne C., and Tor A. Haugdahl. "Nvidia Hopper GPU and Grace CPU Highlights." Computing in Science & Engineering 24, no. 2 (March 1, 2022): 95–100. http://dx.doi.org/10.1109/mcse.2022.3163817.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Lee, Newton. "The NVIDIA® championships at QuakeCon 2005." Computers in Entertainment 3, no. 3 (July 2005): 5. http://dx.doi.org/10.1145/1077246.1077265.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Liu, Hui, Bo Yang, and Zhangxin Chen. "Accelerating algebraic multigrid solvers on NVIDIA GPUs." Computers & Mathematics with Applications 70, no. 5 (September 2015): 1162–81. http://dx.doi.org/10.1016/j.camwa.2015.07.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Mamri, Ayoub, Mohamed Abouzahir, Mustapha Ramzi, and Rachid Latif. "ORB-SLAM accelerated on heterogeneous parallel architectures." E3S Web of Conferences 229 (2021): 01055. http://dx.doi.org/10.1051/e3sconf/202122901055.

Full text

Abstract:

SLAM algorithm permits the robot to cartography the desired environment while positioning it in space. It is a more efficient system and more accredited by autonomous vehicle navigation and robotic application in the ongoing research. Except it did not adopt any complete end-to-end hardware implementation yet. Our work aims to a hardware/software optimization of an expensive computational time functional block of monocular ORB-SLAM2. Through this, we attempt to implement the proposed optimization in FPGA-based heterogeneous embedded architecture that shows attractive results. Toward this, we adopt a comparative study with other heterogeneous architecture including powerful embedded GPGPU (NVIDIA Tegra TX1) and high-end GPU (NVIDIA GeForce 920MX). The implementation is achieved using high-level synthesis-based OpenCL for FPGA and CUDA for NVIDIA targeted boards.

APA, Harvard, Vancouver, ISO, and other styles

23

Ayoub, Naeem, and Peter Schneider-Kamp. "Real-Time On-Board Deep Learning Fault Detection for Autonomous UAV Inspections." Electronics 10, no. 9 (May 5, 2021): 1091. http://dx.doi.org/10.3390/electronics10091091.

Full text

Abstract:

Inspection of high-voltage power lines using unmanned aerial vehicles is an emerging technological alternative to traditional methods. In the Drones4Energy project, we work toward building an autonomous vision-based beyond-visual-line-of-sight (BVLOS) power line inspection system. In this paper, we present a deep learning-based autonomous vision system to detect faults in power line components. We trained a YOLOv4-tiny architecture-based deep neural network, as it showed prominent results for detecting components with high accuracy. For running such deep learning models in a real-time environment, different single-board devices such as the Raspberry Pi 4, Nvidia Jetson Nano, Nvidia Jetson TX2, and Nvidia Jetson AGX Xavier were used for the experimental evaluation. Our experimental results demonstrated that the proposed approach can be effective and efficient for fully automatic real-time on-board visual power line inspection.

APA, Harvard, Vancouver, ISO, and other styles

24

Cardellini, Valeria, Salvatore Filippone, and Damian W. I. Rouson. "Design Patterns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms." Scientific Programming 22, no. 1 (2014): 1–19. http://dx.doi.org/10.1155/2014/469753.

Full text

Abstract:

We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.

APA, Harvard, Vancouver, ISO, and other styles

25

Yang, Zhang, Chen Wen Bo, Bai Qi Feng, and Lian Li. "Test and Analysis GPU-Accelerated in Molecular Dynamics Simulation." Applied Mechanics and Materials 380-384 (August 2013): 1652–55. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.1652.

Full text

Abstract:

GPU computing is the use of a graphics processing unit together with a CPU to accelerate large scale scientific and engineering applications, such as molecule simulation. The paper use NVIDIA Tesla C2050NVIDIA GTX580 and NAMD 2.9 simulates three differences molecule systems: Beta2,SET9 and Ubiquitin. We compared and analyzed the results of the simulations experiment, and come to conclusion that the difference molecule systems will get the difference speed accelerated. The computing times of four GPU is nearly half of the time used by one GPU; and this is especially in the case of macromolecules system. Furthermore, from the GPUs memory utilization rate, the larger the protein system is, the higher the memory use of the GPU is. The performance of NVIDIA GTX580 is only half of the NVIDIAC2050. NVIDIA Tesla C2050 is can satisfy an even larger system simulation.

APA, Harvard, Vancouver, ISO, and other styles

26

Zhao, Muyuan. "NVIDIA's Investment Feasibility and Weighted SWOT Model." Highlights in Business, Economics and Management 3 (January 20, 2023): 227–36. http://dx.doi.org/10.54097/hbem.v3i.4749.

Full text

Abstract:

As one of the leading enterprises in the semiconductor industry, Nvidia has many advantages and opportunities in the development process, but there are also many threats and challenges. In financial analysis, investors' research on the company's financial statements will help them fully understand the company's operating conditions. This paper will take Nvidia as an example, and propose an improved weighted SWOT model by analyzing the company's annual report, financial authority data, relative valuation, local government policies, media news and other factors. The output result of the model will represent whether the company is worth investing. The conclusion of this paper is that the negative coefficient of feedback in the model shows that Nvidia is not a good investment object in the short term. After verification, the model can provide investors with more rigorous and scientific analysis methods to avoid impulsive investment.

APA, Harvard, Vancouver, ISO, and other styles

27

Špeťko, Matej, Ondřej Vysocký, Branislav Jansík, and Lubomír Říha. "DGX-A100 Face to Face DGX-2—Performance, Power and Thermal Behavior Evaluation." Energies 14, no. 2 (January 12, 2021): 376. http://dx.doi.org/10.3390/en14020376.

Full text

Abstract:

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.

APA, Harvard, Vancouver, ISO, and other styles

28

Serpa, Matheus S., Eduardo HM Cruz, Matthias Diener, Arthur M. Krause, Philippe OA Navaux, Jairo Panetta, Albert Farrés, Claudia Rosas, and Mauricio Hanzich. "Optimization strategies for geophysics models on manycore systems." International Journal of High Performance Computing Applications 33, no. 3 (January 17, 2019): 473–86. http://dx.doi.org/10.1177/1094342018824150.

Full text

Abstract:

Many software mechanisms for geophysics exploration in oil and gas industries are based on wave propagation simulation. To perform such simulations, state-of-the-art high-performance computing architectures are employed, generating results faster with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand the impact of each change applied to the software to improve the performance as most as possible. In this article, we propose several optimization strategies for a wave propagation model for six architectures: Intel Broadwell, Intel Haswell, Intel Knights Landing, Intel Knights Corner, NVIDIA Pascal, and NVIDIA Kepler. We focus on improving the cache memory usage, vectorization, load balancing, portability, and locality in the memory hierarchy. We analyze the hardware impact of the optimizations, providing insights of how each strategy can improve the performance. The results show that NVIDIA Pascal outperforms the other considered architectures by up to 8.5[Formula: see text].

APA, Harvard, Vancouver, ISO, and other styles

29

Ye, Yutong, Hongyin Zhu, Chaoying Zhang, and Binghai Wen. "Efficient graphic processing unit implementation of the chemical-potential multiphase lattice Boltzmann method." International Journal of High Performance Computing Applications 35, no. 1 (October 27, 2020): 78–96. http://dx.doi.org/10.1177/1094342020968272.

Full text

Abstract:

The chemical-potential multiphase lattice Boltzmann method (CP-LBM) has the advantages of satisfying the thermodynamic consistency and Galilean invariance, and it realizes a very large density ratio and easily expresses the surface wettability. Compared with the traditional central difference scheme, the CP-LBM uses the Thomas algorithm to calculate the differences in the multiphase simulations, which significantly improves the calculation accuracy but increases the calculation complexity. In this study, we designed and implemented a parallel algorithm for the chemical-potential model on a graphic processing unit (GPU). Several strategies were used to optimize the GPU algorithm, such as coalesced access, instruction throughput, thread organization, memory access, and loop unrolling. Compared with dual-Xeon 5117 CPU server, our methods achieved 95 times speedup on an NVIDIA RTX 2080Ti GPU and 106 times speedup on an NVIDIA Tesla P100 GPU. When the algorithm was extended to the environment with dual NVIDIA Tesla P100 GPUs, 189 times speedup was achieved and the workload of each GPU reached 96%.

APA, Harvard, Vancouver, ISO, and other styles

30

Choquette, Jack, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. "NVIDIA A100 Tensor Core GPU: Performance and Innovation." IEEE Micro 41, no. 2 (March 1, 2021): 29–35. http://dx.doi.org/10.1109/mm.2021.3061394.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Philippidis, Alex. "Learning Curve: Absci, NVIDIA Partner on Antibody Design." GEN Edge 4, no. 1 (January 1, 2022): 237–44. http://dx.doi.org/10.1089/genedge.4.1.39.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Edwards, C. "Adapting to the waves [nVidia 3D graphics processors]." IEE Review 50, no. 11 (November 1, 2004): 40–43. http://dx.doi.org/10.1049/ir:20041104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Kozlov, I. M. "Using NVIDIA multicore graphics processors in image decoding." Pattern Recognition and Image Analysis 24, no. 3 (September 2014): 425–30. http://dx.doi.org/10.1134/s1054661814030110.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Popov, S. E. "Improved phase unwrapping algorithm based on NVIDIA CUDA." Programming and Computer Software 43, no. 1 (January 2017): 24–36. http://dx.doi.org/10.1134/s0361768817010054.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Xu, Jingheng, Guangwen Yang, Haohuan Fu, Wayne Luk, Lin Gan, Wen Shi, Wei Xue, Chao Yang, Yong Jiang, and Conghui He. "Optimizing Finite Volume Method Solvers on Nvidia GPUs." IEEE Transactions on Parallel and Distributed Systems 30, no. 12 (December 1, 2019): 2790–805. http://dx.doi.org/10.1109/tpds.2019.2926084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

李, 静海, 云. 张, 蔚. 葛, 骥. 徐, 曦鹏李, 博. 李, 险峰何, 健. 王, 小伟王, and 飞国陈. "Lattice Boltzmann simulation on Nvidia and AMD GPUs." Chinese Science Bulletin 54, no. 20 (October 1, 2009): 3177–84. http://dx.doi.org/10.1360/972009-1347.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Lindholm, Erik, John Nickolls, Stuart Oberman, and John Montrym. "NVIDIA Tesla: A Unified Graphics and Computing Architecture." IEEE Micro 28, no. 2 (March 2008): 39–55. http://dx.doi.org/10.1109/mm.2008.31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Gloster, Andrew, Lennon Ó Náraigh, and Khang Ee Pang. "cuPentBatch—A batched pentadiagonal solver for NVIDIA GPUs." Computer Physics Communications 241 (August 2019): 113–21. http://dx.doi.org/10.1016/j.cpc.2019.03.016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Borisov, A. N., and E. V. Myasnikov. "The implementation of ”Kuznyechik” encryption algorithm using NVIDIA CUDA technology." Information Technology and Nanotechnology, no. 2416 (2019): 308–13. http://dx.doi.org/10.18287/1613-0073-2019-2416-308-313.

Full text

Abstract:

In this paper, we discuss various options for implementing the ”Kuznyechik” block encryption algorithm using the NVIDIA CUDA technology. We use lookup tables as a basis for the implementation. In experiments, we study the influence of the size of the block of threads and the location of lookup tables on the encryption speed. We show that the best results are obtained when the lookup tables are stored in the global memory. The peak encryption speed reaches 30.83 Gbps on the NVIDIA GeForce GTX 1070 graphics processor.

APA, Harvard, Vancouver, ISO, and other styles

40

Kim, Youngtae, and Gyuhyeon Hwang. "Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs." Journal of KIISE 42, no. 12 (December 15, 2015): 1467–73. http://dx.doi.org/10.5626/jok.2015.42.12.1467.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Tyler, Neil. "Broadcom Overtakes Qualcomm in Q2 2020." New Electronics 53, no. 15 (September 8, 2020): 9. http://dx.doi.org/10.12968/s0047-9624(22)61370-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Bähr, Pascal R., Bruno Lang, Peer Ueberholz, Marton Ady, and Roberto Kersevan. "Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs." International Journal of High Performance Computing Applications 36, no. 2 (December 11, 2021): 141–52. http://dx.doi.org/10.1177/10943420211056654.

Full text

Abstract:

Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel runs much faster than on comparable CPUs at the expense of a marginal drop in calculation precision.

APA, Harvard, Vancouver, ISO, and other styles

43

EMMART, NIALL, and CHARLES WEEMS. "SEARCH-BASED AUTOMATIC CODE GENERATION FOR MULTIPRECISION MODULAR EXPONENTIATION ON MULTIPLE GENERATIONS OF GPU." Parallel Processing Letters 23, no. 04 (December 2013): 1340009. http://dx.doi.org/10.1142/s0129626413400094.

Full text

Abstract:

Multiprecision modular exponentiation has a variety of uses, including cryptography, prime testing and computational number theory. It is also a very costly operation to compute. GPU parallelism can be used to accelerate these computations, but to use the GPU efficiently, a problem must involve many simultaneous exponentiation operations. Handling a large number of TLS/SSL encrypted sessions in a data center is an important problem that fits this profile. We are developing a framework that enables generation of highly efficient implementations of exponentiation operations for different NVIDIA GPU architectures and problem instances. One of the challenges in generating such code is that NVIDIA's PTX is not a true assembly language, but is instead a virtual instruction set that is compiled and optimized in different ways for different generations of GPU hardware. Thus, the same PTX code runs with different levels of efficiency on different machines. And as the precision of the computations changes, each architecture has its own break-even points where a different algorithm or parallelization strategy must be employed. To make the code efficient for a given problem instance and architecture thus requires searching a multidimensional space of algorithms and configurations, by generating PTX code for each combination, executing it, validating the numerical result, and evaluating its performance. Our framework automates much of this process, and produces exponentiation code that is up to six times faster than the best known hand-coded implementations for the NVIDIA GTX 580. Our goal for the framework is to enable users to relatively quickly find the best configuration for each new GPU architecture. However, in migrating to the GTX 680, which has three times as many cores as the GTX 580, we found that the best performance our system could achieve was significantly less than for the GTX 580. The decrease was traced to a radical shift in the NVIDIA architecture that greatly reduces the storage resources for each core. Further analysis and feasibility simulations indicate that it should be possible, through changes in our code generators to adapt for different storage models, to take greater advantage of the parallelism on the GTX 680. That will add a new dimension to our search space, but will also give our framework greater flexibility for dealing with future architectures.

APA, Harvard, Vancouver, ISO, and other styles

44

Мясіщев, О. А. "Ефективність використання GPU NVIDIA при вирішенні систем лінійних рівнянь." Збірник наукових праць Військового інституту Київського національного університету імені Тараса Шевченка, Вип. № 38 (2012): 76–81.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

45

Мясіщев, О. А. "Ефективність використання GPU NVIDIA при вирішенні систем лінійних рівнянь." Збірник наукових праць Військового інституту Київського національного університету імені Тараса Шевченка, Вип. № 38 (2012): 76–81.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

46

Tabani, Hamid, Fabio Mazzocchetti, Pedro Benedicte, Jaume Abella, and Francisco J. Cazorla. "Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs." Journal of Parallel and Distributed Computing 152 (June 2021): 21–32. http://dx.doi.org/10.1016/j.jpdc.2021.02.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Zhang, Ying, Lu Peng, Bin Li, Jih-Kwon Peir, and Jianmin Chen. "Performance and Power Comparisons between NVIDIA and ATI GPUS." International Journal of Computer Science and Information Technology 6, no. 6 (December 31, 2014): 1–22. http://dx.doi.org/10.5121/ijcsit.2014.6601.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Bućan, Dušan. "MAŠINSKO UČENJE NA IVICI UPOTREBOM NVIDIA JETSON TX2 UREĐAJA." Zbornik radova Fakulteta tehničkih nauka u Novom Sadu 36, no. 11 (November 9, 2021): 2017–20. http://dx.doi.org/10.24867/15be41bucan.

Full text

Abstract:

U ovom radu opisana je primena konvolucionih neuronskih mreža u oblasti računarske vizije, kao i tehnike obrade toka podataka i arhitektura sistema za demografsku analitiku u realnom vremenu upotrebom mašinskog učenja na ivici. Opisan sistem je implementiran i izloženi su mogući pravci unapređenja i proširenja sistema.

APA, Harvard, Vancouver, ISO, and other styles

49

Schneider, Evan, Brant Robertson, Alexander Kuhn, Christopher Lux, and Marc Nienhaus. "NVIDIA IndeX accelerated computing for visualizing Cholla's galactic winds." Parallel Computing 107 (October 2021): 102809. http://dx.doi.org/10.1016/j.parco.2021.102809.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Zhang, Shicheng, Laixian Zhang, Huayan Sun, and Huichao Guo. "Photoelectric Target Detection Algorithm Based on NVIDIA Jeston Nano." Sensors 22, no. 18 (September 17, 2022): 7053. http://dx.doi.org/10.3390/s22187053.

Full text

Abstract:

This paper proposes a photoelectric target detection algorithm for NVIDIA Jeston Nano embedded devices, exploiting the characteristics of active and passive differential images of lasers after denoising. An adaptive threshold segmentation method was developed based on the statistical characteristics of photoelectric target echo light intensity, which effectively improves detection of the target area. The proposed method’s effectiveness is compared and analyzed against a typical lightweight network that was knowledge-distilled by ResNet18 on target region detection tasks. Furthermore, TensorRT technology was applied to accelerate inference and deploy on hardware platforms the lightweight network Shuffv2_x0_5. The experimental results demonstrate that the developed method’s accuracy rate reaches 97.15%, the false alarm rate is 4.87%, and the detection rate can reach 29 frames per second for an image resolution of 640 × 480 pixels.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'NVIDIA'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles