To see the other types of publications on this topic, follow the link: NVIDIA.

Dissertations / Theses on the topic 'NVIDIA'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'NVIDIA.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gameiro, Pedro Miguel Rodrigues. "Equity research - NVIDIA Corporation." Master's thesis, Instituto Superior de Economia e Gestão, 2018. http://hdl.handle.net/10400.5/16970.

Full text
Abstract:
Mestrado em Finanças
Este relatório reflete a avaliação da empresa de Semi-condutores, a NVIDIA Corporation e está de acordo com o trabalho final de mestrado de Finanças do ISEG. Este relatório foi escrito com base nas recomendações do CFA Institute. A NVIDIA é uma empresa que está a enfrentar um momento bastante singular comparado com os seus competidores, com um crescimento anual de vendas de 40% e um aumento na avaliação das suas ações de 334,46% nos últimos dos anos. Não só a NVIDIA está a ter uma performance financeira interessante como se está a entrar em mercados emergentes como a autonomização automóvel e a criptomoeda, o que faz com que seja um caso de estudo bastante interessante. Também a fascinação em relação a tecnologia e em especifico, ao gaming, foram uma das razões pela qual esta empresa foi escolhida. Este relatório foi desenvolvido com base em informação pública disponível até 30 de Junho de 2018 e nenhuma informação posterior a esta data não foi considerada. O preço de ação de $303,67, foi obtido através do modelo de Fluxos de Caixa Descontados. O método de avaliação relativa foi tentado, porém dado à situação única da NVIDIA, não existe competidores que consideremos como peer's comparáveis em termos de múltiplos. Esta avaliação sugere uma recomendação de COMPRA, apesar do seu risco médio, dado que a NVDIA está consolidada no seu mercado principal, o gaming, porém existe alguma incerteza relativamente aos mercados da criptomoeda e autonomização automóvel.
This project reflects an evaluation of NVIDIA Corporation, Semiconductor Company, according to ISEG´s Master in Finance final work project. This report was written in agreement with the recommendations of the CFA Institute. NVIDIA is a company that is facing a very singular moment comparing to its peers, with a 40% annual revenue growth and a valuation increase of 334,46% in the last two years. Not only NVIDIA is having an interesting financial performance but also is entering in emerging markets, such as, autonomous cars and cryptocurrencies, being a very interesting case study. Also the fascination about technology and gaming in specific was one of the reasons this company was chosen. This report was developed considering public information available until June 30th 2018 and any information or event subsequent to this date has not been considered. The price target of $303,67 was obtained from the Discounted Cash Flow method. The relative valuation method was attempted, but due to the unique situation of NVIDIA, there are not close peers following the criteria's used. This valuation suggests to a BUY recommendation, although with medium risk, since NVIDIA is consolidated in their main market, gaming, but there is some uncertainty relatively to markets like cryptocurrency and autonomous cars.
info:eu-repo/semantics/publishedVersion
APA, Harvard, Vancouver, ISO, and other styles
2

Zajíc, Jiří. "Překladač jazyka C# do jazyka Nvidia CUDA." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236439.

Full text
Abstract:
This master's thesis is focused on GPU accelerated calculations on NVidia graphics card. CUDA technology is used and converted to implementation on a .NET platform. The problem is solved as a compiler from C# programing language to NVidia CUDA language with expression atrributes of C# language that preserves the same semantics of actions. Application is implemented in C# programing language and uses NRefactory, the open-source library.
APA, Harvard, Vancouver, ISO, and other styles
3

Santos, Paulo Carlos Ferreira dos. "Extração de informações de desempenho em GPUs NVIDIA." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-02042013-090806/.

Full text
Abstract:
O recente crescimento da utilização de Unidades de Processamento Gráfico (GPUs) em aplicações científicas, que são voltadas ao desempenho, gerou a necessidade de otimizar os programas que nelas rodam. Uma ferramenta adequada para essa tarefa é o modelo de desempenho que, por sua vez, se beneficia da existência de uma ferramenta de extração de informações de desempenho para GPUs. Este trabalho cobre a criação de um gerador de microbenchmark para instruções PTX que também obtém informações sobre as características do hardware da GPU. Os resultados obtidos com o microbenchmark foram validados através de um modelo simplificado que obteve erros entre 6,11% e 16,32% em cinco kernels de teste. Também foram levantados os fatores de imprecisão nos resultados do microbenchmark. Utilizamos a ferramenta para analisar o perfil de desempenho das instruções e identificar grupos de comportamentos semelhantes. Também testamos a dependência do desempenho do pipeline da GPU em função da sequência de instruções executada e verificamos a otimização do compilador para esse caso. Ao fim deste trabalho concluímos que a utilização de microbenchmarks com instruções PTX é factível e se mostrou eficaz para a construção de modelos e análise detalhada do comportamento das instruções.
The recent growth in the use of tailored for performance Graphics Processing Units (GPUs) in scientific applications, generated the need to optimize GPU targeted programs. Performance models are the suitable tools for this task and they benefits from existing GPUs performance information extraction tools. This work covers the creation of a microbenchmark generator using PTX instructions and it also retrieves information about the GPU hardware characteristics. The microbenchmark results were validated using a simplified model with errors rates between 6.11% and 16.32% under five diferent GPU kernels. We also explain the imprecision factors present in the microbenchmark results. This tool was used to analyze the instructions performance profile, identifying groups with similar behavior. We also evaluated the corelation of the GPU pipeline performance and instructions execution sequence. Compiler optimization capabilities for this case were also verified. We concluded that the use of microbenchmarks with PTX instructions is a feasible approach and an efective way to build performance models and to generate detailed analysis of the instructions\' behavior.
APA, Harvard, Vancouver, ISO, and other styles
4

Krivoklatský, Filip. "Návrh vestavaného systému inteligentného vidění na platformě NVIDIA." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400627.

Full text
Abstract:
This diploma thesis deals with design of embedded computer vision system and transfer of existing computer vision application for 3D object detection from Windows OS to designed embedded system with Linux OS. Thesis focuses on design of communication interface for system control and camera video transfer through local network with video compression. Then, detection algorithm is enhanced by transferring computationally expensive functions to GPU using CUDA technology. Finally, a user application with graphical interface is designed for system control on Windows platform.
APA, Harvard, Vancouver, ISO, and other styles
5

Savioli, Nicolo'. "Parallelization of the algorithm WHAM with NVIDIA CUDA." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/6377/.

Full text
Abstract:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
APA, Harvard, Vancouver, ISO, and other styles
6

Ikeda, Patricia Akemi. "Um estudo do uso eficiente de programas em placas gráficas." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25042012-212956/.

Full text
Abstract:
Inicialmente projetadas para processamento de gráficos, as placas gráficas (GPUs) evoluíram para um coprocessador paralelo de propósito geral de alto desempenho. Devido ao enorme potencial que oferecem para as diversas áreas de pesquisa e comerciais, a fabricante NVIDIA destaca-se pelo pioneirismo ao lançar a arquitetura CUDA (compatível com várias de suas placas), um ambiente capaz de tirar proveito do poder computacional aliado à maior facilidade de programação. Na tentativa de aproveitar toda a capacidade da GPU, algumas práticas devem ser seguidas. Uma delas consiste em manter o hardware o mais ocupado possível. Este trabalho propõe uma ferramenta prática e extensível que auxilie o programador a escolher a melhor configuração para que este objetivo seja alcançado.
Initially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.
APA, Harvard, Vancouver, ISO, and other styles
7

Rivera-Polanco, Diego Alejandro. "COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU." Lexington, Ky. : [University of Kentucky Libraries], 2009. http://hdl.handle.net/10225/1158.

Full text
Abstract:
Thesis (M.S.)--University of Kentucky, 2009.
Title from document title page (viewed on May 18, 2010). Document formatted into pages; contains: ix, 88 p. : ill. Includes abstract and vita. Includes bibliographical references (p. 86-87).
APA, Harvard, Vancouver, ISO, and other styles
8

Harvey, Jesse Patrick. "GPU acceleration of object classification algorithms using NVIDIA CUDA /." Online version of thesis, 2009. http://hdl.handle.net/1850/10894.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lerchundi, Osa Gorka. "Fast Implementation of Two Hash Algorithms on nVidia CUDA GPU." Thesis, Norwegian University of Science and Technology, Department of Telematics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9817.

Full text
Abstract:

User needs increases as time passes. We started with computers like the size of a room where the perforated plaques did the same function as the current machine code object does and at present we are at a point where the number of processors within our graphic device unit it’s not enough for our requirements. A change in the evolution of computing is looming. We are in a transition where the sequential computation is losing ground on the benefit of the distributed. And not because of the birth of the new GPUs easily accessible this trend is novel but long before it was used for projects like SETI@Home, fightAIDS@Home, ClimatePrediction and there were shouting from the rooftops about what was to come. Grid computing was its formal name. Until now it was linked only to distributed systems over the network, but as this technology evolves it will take different meaning. nVidia with CUDA has been one of the first companies to make this kind of software package noteworthy. Instead of being a proof of concept it’s a real tool. Where the transition is expressed in greater magnitude in which the true artist is the programmer who uses it and achieves performance increases. As with many innovations, a community distributed worldwide has grown behind this software package and each one doing its bit. It is noteworthy that after CUDA release a lot of software developments grown like the cracking of the hitherto insurmountable WPA. With Sony-Toshiba-IBM (STI) alliance it could be said the same thing, it has a great community and great software (IBM is the company in charge of maintenance). Unlike nVidia is not as accessible as it is but IBM is powerful enough to enter home made supercomputing market. In this case, after IBM released the PS3 SDK, a notorious application was created using the benefits of parallel computing named Folding@Home. Its purpose is to, inter alia, find the cure for cancer. To sum up, this is only the beginning, and in this thesis is sized up the possibility of using this technology for accelerating cryptographic hash algorithms. BLUE MIDNIGHT WISH (The hash algorithm that is applied to the surgery) is undergone to an environment change adapting it to a parallel capable code for creating empirical measures that compare to the current sequential implementations. It will answer questions that nowadays haven’t been answered yet. BLUE MIDNIGHT WISH is a candidate hash function for the next NIST standard SHA-3, designed by professor Danilo Gligoroski from NTNU and Vlastimil Klima – an independent cryptographer from Czech Republic. So far, from speed point of view BLUE MIDNIGHT WISH is on the top of the charts (generally on the second place – right behind EDON-R - another hash function from professor Danilo Gligoroski). One part of the work on this thesis was to investigate is it possible to achieve faster speeds in processing of Blue Midnight Wish when the computations are distributed among the cores in a CUDA device card. My numerous experiments give a clear answer: NO. Although the answer is negative, it still has a significant scientific value. The point is that my work acknowledges viewpoints and standings of a part of the cryptographic community that is doubtful that the cryptographic primitives will benefit when executed in parallel in many cores in one CPU. Indeed, my experiments show that the communication costs between cores in CUDA outweigh by big margin the computational costs done inside one core (processor) unit.

APA, Harvard, Vancouver, ISO, and other styles
10

Virk, Bikram. "Implementing method of moments on a GPGPU using Nvidia CUDA." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33980.

Full text
Abstract:
This thesis concentrates on the algorithmic aspects of Method of Moments (MoM) and Locally Corrected Nyström (LCN) numerical methods in electromagnetics. The data dependency in each step of the algorithm is analyzed to implement a parallel version that can harness the powerful processing power of a General Purpose Graphics Processing Unit (GPGPU). The GPGPU programming model provided by NVIDIA's Compute Unified Device Architecture (CUDA) is described to learn the software tools at hand enabling us to implement C code on the GPGPU. Various optimizations such as the partial update at every iteration, inter-block synchronization and using shared memory enable us to achieve an overall speedup of approximately 10. The study also brings out the strengths and weaknesses in implementing different methods such as Crout's LU decomposition and triangular matrix inversion on a GPGPU architecture. The results suggest future directions of study in different algorithms and their effectiveness on a parallel processor environment. The performance data collected show how different features of the GPGPU architecture can be enhanced to yield higher speedup.
APA, Harvard, Vancouver, ISO, and other styles
11

Subramoniapillai, Ajeetha Saktheesh. "Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1344623484.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Sreenibha, Reddy Byreddy. "Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16846.

Full text
Abstract:
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
APA, Harvard, Vancouver, ISO, and other styles
13

Nejadfard, Kian. "Context-aware automated refactoring for unified memory allocation in NVIDIA CUDA programs." Cleveland State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=csu1624622944458295.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Zaahid, Mohammed. "Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuS." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16852.

Full text
Abstract:
The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU.
APA, Harvard, Vancouver, ISO, and other styles
15

Graves, Russell Edward. "High performance password cracking by implementing rainbow tables on nVidia graphics cards (IseCrack)." [Ames, Iowa : Iowa State University], 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1461850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Ясменко, В. Р., and К. В. Донець. "Аналіз можливостей візуалізатора NVIDIA Iray у порівнянні з візуалізатором V-Ray програми 3DS Max." Thesis, КНУТД, 2016. https://er.knutd.edu.ua/handle/123456789/4360.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Bourque, Donald. "CUDA-Accelerated ORB-SLAM for UAVs." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/882.

Full text
Abstract:
"The use of cameras and computer vision algorithms to provide state estimation for robotic systems has become increasingly popular, particularly for small mobile robots and unmanned aerial vehicles (UAVs). These algorithms extract information from the camera images and perform simultaneous localization and mapping (SLAM) to provide state estimation for path planning, obstacle avoidance, or 3D reconstruction of the environment. High resolution cameras have become inexpensive and are a lightweight and smaller alternative to laser scanners. UAVs often have monocular camera or stereo camera setups since payload and size impose the greatest restrictions on their flight time and maneuverability. This thesis explores ORB-SLAM, a popular Visual SLAM method that is appropriate for UAVs. Visual SLAM is computationally expensive and normally offloaded to computers in research environments. However, large UAVs with greater payload capacity may carry the necessary hardware for performing the algorithms. The inclusion of general-purpose GPUs on many of the newer single board computers allows for the potential of GPU-accelerated computation within a small board profile. For this reason, an NVidia Jetson board containing an NVidia Pascal GPU was used. CUDA, NVidia’s parallel computing platform, was used to accelerate monocular ORB-SLAM, achieving onboard Visual SLAM on a small UAV. Committee members:"
APA, Harvard, Vancouver, ISO, and other styles
18

Shaker, Alfred M. "COMPARISON OF THE PERFORMANCE OF NVIDIA ACCELERATORS WITH SIMD AND ASSOCIATIVE PROCESSORS ON REAL-TIME APPLICATIONS." Kent State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=kent1501084051233453.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Buenaflor, Jr Romeo C. "Using State-of-the-Art GPGPU's for Molecular Simulation : Optimizing Massively Parrallelized N-Body programs using NVIDIA Tesla." Thesis, Uppsala University, Department of Information Technology, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-110895.

Full text
Abstract:

Computation and simulation as a tool for discovering new knowledge is still marred by problems that are intractable, combinatoric, or simply plagued by the so called curse of dimensionality. The algorithm used for molecular simulation of polyelectrolytes is one of those areas in computational chemistry and science suffering from the curse of dimensionality. Much of the problems, though, have been claimed to be solvable with the advent of more sophisticated and powerful computers and related technologies. This paper attempts to substantiate the claim. In this paper, a state-of-the-art NVIDIA Tesla C870 has been utilized to massively parallelize the algorithm for the molecular simulation of polyelectrolytes. In particular, this paper attempts to optimize the portion of the code involving the computation of electrostatic interaction using the new technology. It has been shown that tapping this new line of technology poses great advantage in winning the war against the curse of dimensionality.

APA, Harvard, Vancouver, ISO, and other styles
20

Surineni, Sruthikesh. "Performance/Accuracy Trade-offs of Floating-point Arithmetic on Nvidia GPUs| From a Characterization to an Auto-tuner." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13850754.

Full text
Abstract:

Floating-point computations produce approximate results, possibly leading to inaccuracy and reproducibility problems. Existing work addresses two issues: first, the design of high precision floating-point representations, and second, the study of methods to support a trade-off between accuracy and performance of central processing unit (CPU) applications. However, a comprehensive study of trade-offs between accuracy and performance on modern graphic processing units (GPUs) is missing. This thesis covers the use of different floating-point precisions (i.e., single and double floating-point precision) in the IEEE 754 standard, the GNU Multiple Precision Arithmetic Library (GMP), and composite floating-point precision on a GPU using a variety of synthetic and real-world benchmark applications. First, we analyze the support for a single and double precision floating-point arithmetic on the considered GPU architectures, and we characterize the latencies of all floating-point instructions on GPU. Second, a study is presented on the performance/accuracy tradeoffs related to the use of different arithmetic precisions on addition, multiplication, division, and natural exponential function. Third, an analysis is given on the combined use of different arithmetic operations on three benchmark applications characterized by different instruction mixes and arithmetic intensities. As a result of this analysis, a novel auto tuner was designed in order to select the arithmetic precision of a GPU program leading to a better performance and accuracy tradeoff depending on the arithmetic operations and math functions used in the program and the degree of multithreading of the code.

APA, Harvard, Vancouver, ISO, and other styles
21

Bronda, Samuel. "Hluboké neuronové sítě pro prostředí superpočítače." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400885.

Full text
Abstract:
The main benefit of the work is the optimization of the hardware configuration for the calculation of neural networks. The theoretical part describes neural networks, deep learning frameworks and hardware options. The next part of the thesis deals with implementation of performance tests, which include application of Inception V3 and ResNet models. Network models are applied to various graphics cards and computing hardware. The output of the thesis is the implemented model of the network Inception V3, which examines the graphics cards and their performance, time-consuming calculations and their efficiency. The ResNet model is applied to a section that examines other impacts on neural network computing such as used disk, operating memory, and so on. Each practical part contains a discussion where the knowledge of the given part is explained. In the case of consumption measurement, a mismatch between the declaration by the manufacturer and the measured values was identified.
APA, Harvard, Vancouver, ISO, and other styles
22

Araújo, João Manuel da Silva. "Paralelização de algoritmos de Filtragem baseados em XPATH/XML com recurso a GPUs." Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2530.

Full text
Abstract:
Dissertação de Mestrado em Engenharia Informática
Esta dissertação envolve o estudo da viabilidade da utilização dos GPUs para o processamento paralelo aplicado aos algoritmos de filtragem de notificações num sistema editor/assinante. Este objectivo passou por realizar uma comparação de resultados experimentais entre a versão sequencial (nos CPUs) e a versão paralela de um algoritmo de filtragem escolhido como referência. Essa análise procurou dar elementos para aferir se eventuais ganhos da exploração dos GPUs serão suficientes para compensar a maior complexidade do processo.
APA, Harvard, Vancouver, ISO, and other styles
23

Shi, Bobo. "Implementation and Performance Analysis of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor and NVIDIA GPU Accelerator." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462793739.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Cammareri, Costantino Davide. "Sistema di misura dei consumi di unità di calcolo low-power per applicazioni scientifiche presso il centro INFN-CNAF." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15718/.

Full text
Abstract:
Questo lavoro di tesi, svolto presso il CNAF (Centro Nazionale per lo sviluppo delle Tecnologie Informatiche e Telematiche) è inserito nell'ambito del progetto INFN COSA (Computing on SoC Architecture). Si pone come obbiettivo quello di testare i consumi e le prestazioni di sistemi computazionali a basso consumo energetico di tipo System on Chip (SoC), i quali stanno emergendo come unità di calcolo su cui eseguire e testare applicazioni scientifiche. A tal proposito si è realizzato un misuratore di corrente basato su Arduino UNO e un trasduttore corrente-tensione, la cui uscita analogica dipende linearmente dal valore della corrente in ingresso fornitagli dall'alimentazione del SoC di cui si vuole testarne il consumo. L'Arduino converte la tensione analogica in uscita dal trasduttore, tramite l'ADC a 10 bit incorporato, in valori di tensione digitali con un'incertezza di 5 mV e una durata temporale di 1 ms. Le misure sono state effettuate tramite una retta di calibrazione corrente-tensione ottenuta nel lavoro di tesi, la quale ha consentito di convertire le letture di tensione salvate nella memoria dell'Arduino, espresse come livello dell'ADC, in letture di corrente. In questo elaborato, si espongono alcune ragioni per cui oggigiorno la comunità scientifica è sempre più interessata a scegliere come unità di calcolo architeturre basate su SoC, piuttosto che architetture tradizionali. La tesi illustra tutti gli step che hanno consentito di ottenere la retta di calibrazione tramite la quale sono state effettuate delle misure, per alcune applicazioni fornite dal CNAF, su una scheda low power modello NVIDA Jetson che monta un SoC Tegra K1. Successivamente sono stati confrontati i risultati del misuratore di corrente realizzato con quelli misurati tramite un multimetro digitale ad alta precisione in dotazione al CNAF.
APA, Harvard, Vancouver, ISO, and other styles
25

Nottingham, Alastair. "GPF : a framework for general packet classification on GPU co-processors." Thesis, Rhodes University, 2012. http://hdl.handle.net/10962/d1006662.

Full text
Abstract:
This thesis explores the design and experimental implementation of GPF, a novel protocol-independent, multi-match packet classification framework. This framework is targeted and optimised for flexible, efficient execution on NVIDIA GPU platforms through the CUDA API, but should not be difficult to port to other platforms, such as OpenCL, in the future. GPF was conceived and developed in order to accelerate classification of large packet capture files, such as those collected by Network Telescopes. It uses a multiphase SIMD classification process which exploits both the parallelism of packet sets and the redundancy in filter programs, in order to classify packet captures against multiple filters at extremely high rates. The resultant framework - comprised of classification, compilation and buffering components - efficiently leverages GPU resources to classify arbitrary protocols, and return multiple filter results for each packet. The classification functions described were verified and evaluated by testing an experimental prototype implementation against several filter programs, of varying complexity, on devices from three GPU platform generations. In addition to the significant speedup achieved in processing results, analysis indicates that the prototype classification functions perform predictably, and scale linearly with respect to both packet count and filter complexity. Furthermore, classification throughput (packets/s) remained essentially constant regardless of the underlying packet data, and thus the effective data rate when classifying a particular filter was heavily influenced by the average size of packets in the processed capture. For example: in the trivial case of classifying all IPv4 packets ranging in size from 70 bytes to 1KB, the observed data rate achieved by the GPU classification kernels ranged from 60Gbps to 900Gbps on a GTX 275, and from 220Gbps to 3.3Tbps on a GTX 480. In the less trivial case of identifying all ARP, TCP, UDP and ICMP packets for both IPv4 and IPv6 protocols, the effective data rates ranged from 15Gbps to 220Gbps (GTX 275), and from 50Gbps to 740Gbps (GTX 480), for 70B and 1KB packets respectively.
LaTeX with hyperref package
APA, Harvard, Vancouver, ISO, and other styles
26

Fuksa, Tomáš. "Paralelizace výpočtů pro zpracování obrazu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-219371.

Full text
Abstract:
This work deals with parallel computing on modern processors - multi-core CPU and GPU. The goal is to learn about computing on this devices suitable for parallelization, define their advantages and disadvantages, test their properties in examples and select appropriate tools to implement a library for parallel image processing. This library is going to be used for the vanishing point estimation in the path finding mobile robot.
APA, Harvard, Vancouver, ISO, and other styles
27

Mašek, Jan. "Dynamický částicový systém jako účinný nástroj pro statistické vzorkování." Doctoral thesis, Vysoké učení technické v Brně. Fakulta stavební, 2018. http://www.nusl.cz/ntk/nusl-390276.

Full text
Abstract:
The presented doctoral thesis aims at development a new efficient tool for optimization of uniformity of point samples. One of use-cases of these point sets is the usage as optimized sets of integration points in statistical analyses of computer models using Monte Carlo type integration. It is well known that the pursuit of uniformly distributed sets of integration points is the only possible way of decreasing the error of estimation of an integral over an unknown function. The tasks of the work concern a survey of currently used criteria for evaluation and/or optimization of uniformity of point sets. A critical evaluation of their properties is presented, leading to suggestions towards improvements in spatial and statistical uniformity of resulting samples. A refined variant of the general formulation of the phi optimization criterion has been derived by incorporating the periodically repeated design domain along with a scale-independent behavior of the criterion. Based on a notion of a physical analogy between a set of sampling points and a dynamical system of mutually repelling particles, a hyper-dimensional N-body system has been selected to be the driver of the developed optimization tool. Because the simulation of such a dynamical system is known to be a computationally intensive task, an efficient solution using the massively parallel GPGPU platform Nvidia CUDA has been developed. An intensive study of properties of this complex architecture turned out as necessary to fully exploit the possible solution speedup.
APA, Harvard, Vancouver, ISO, and other styles
28

Senthil, Kumar Nithin. "Designing optimized MPI+NCCL hybrid collective communication routines for dense many-GPU clusters." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619132252608831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Bartosch, Nadine. "Correspondence-based pairwise depth estimation with parallel acceleration." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34372.

Full text
Abstract:
This report covers the implementation and evaluation of a stereo vision corre- spondence-based depth estimation algorithm on a GPU. The results and feed- back are used for a Multi-view camera system in combination with Jetson TK1 devices for parallelized image processing and the aim of this system is to esti- mate the depth of the scenery in front of it. The performance of the algorithm plays the key role. Alongside the implementation, the objective of this study is to investigate the advantages of parallel acceleration inter alia the differences to the execution on a CPU which are significant for all the function, the imposed overheads particular for a GPU application like memory transfer from the CPU to the GPU and vice versa as well as the challenges for real-time and concurrent execution. The study has been conducted with the aid of CUDA on three NVIDIA GPUs with different characteristics and with the aid of knowledge gained through extensive literature study about different depth estimation algo- rithms but also stereo vision and correspondence as well as CUDA in general. Using the full set of components of the algorithm and expecting (near) real-time execution is utopic in this setup and implementation, the slowing factors are in- ter alia the semi-global matching. Investigating alternatives shows that results for disparity maps of a certain accuracy are also achieved by local methods like the Hamming Distance alone and by a filter that refines the results. Further- more, it is demonstrated that the kernel launch configuration and the usage of GPU memory types like shared memory is crucial for GPU implementations and has an impact on the performance of the algorithm. Just concurrency proves to be a more complicated task, especially in the desired way of realization. For the future work and refinement of the algorithm it is therefore recommended to invest more time into further optimization possibilities in regards of shared memory and into integrating the algorithm into the actual pipeline.
APA, Harvard, Vancouver, ISO, and other styles
30

Karri, Venkata Praveen. "Effective and Accelerated Informative Frame Filtering in Colonoscopy Videos Using Graphic Processing Units." Thesis, University of North Texas, 2010. https://digital.library.unt.edu/ark:/67531/metadc31536/.

Full text
Abstract:
Colonoscopy is an endoscopic technique that allows a physician to inspect the mucosa of the human colon. Previous methods and software solutions to detect informative frames in a colonoscopy video (a process called informative frame filtering or IFF) have been hugely ineffective in (1) covering the proper definition of an informative frame in the broadest sense and (2) striking an optimal balance between accuracy and speed of classification in both real-time and non real-time medical procedures. In my thesis, I propose a more effective method and faster software solutions for IFF which is more effective due to the introduction of a heuristic algorithm (derived from experimental analysis of typical colon features) for classification. It contributed to a 5-10% boost in various performance metrics for IFF. The software modules are faster due to the incorporation of sophisticated parallel-processing oriented coding techniques on modern microprocessors. Two IFF modules were created, one for post-procedure and the other for real-time. Code optimizations through NVIDIA CUDA for GPU processing and/or CPU multi-threading concepts embedded in two significant microprocessor design philosophies (multi-core design and many-core design) resulted a 5-fold acceleration for the post-procedure module and a 40-fold acceleration for the real-time module. Some innovative software modules, which are still in testing phase, have been recently created to exploit the power of multiple GPUs together.
APA, Harvard, Vancouver, ISO, and other styles
31

Loundagin, Justin. "Optimizing Harris Corner Detection on GPGPUs Using CUDA." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1348.

Full text
Abstract:
ABSTRACT Optimizing Harris Corner Detection on GPGPUs Using CUDA The objective of this thesis is to optimize the Harris corner detection algorithm implementation on NVIDIA GPGPUs using the CUDA software platform and measure the performance benefit. The Harris corner detection algorithm—developed by C. Harris and M. Stephens—discovers well defined corner points within an image. The corner detection implementation has been proven to be computationally intensive, thus realtime performance is difficult with a sequential software implementation. This thesis decomposes the Harris corner detection algorithm into a set of parallel stages, each of which are implemented and optimized on the CUDA platform. The performance results show that by applying strategic CUDA optimizations to the Harris corner detection implementation, realtime performance is feasible. The optimized CUDA implementation of the Harris corner detection algorithm showed significant speedup over several platforms: standard C, MATLAB, and OpenCV. The optimized CUDA implementation of the Harris corner detection algorithm was then applied to a feature matching computer vision system, which showed significant speedup over the other platforms.
APA, Harvard, Vancouver, ISO, and other styles
32

Ekstam, Ljusegren Hannes, and Hannes Jonsson. "Parallelizing Digital Signal Processing for GPU." Thesis, Linköpings universitet, Programvara och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167189.

Full text
Abstract:
Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation.
APA, Harvard, Vancouver, ISO, and other styles
33

Fors, Martin. "Normal Mapping för Hårda Ytor : Photoshop och Maya Transfer Maps för Normal Mapping av icke-organiska geometri i datorspel." Thesis, University of Skövde, School of Humanities and Informatics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-3103.

Full text
Abstract:

I mitt examensarbete har jag undersökt om det lämpar sig att använda en manuell metod för att skapa normal maps till icke-organiska polygonmodeller avsedda för datorspel. Jag har använt mig av Photoshop för att måla normal maps som jag sedan applicerar på lågdetaljerade modeller jag skapat, för att höja detaljgraden avsevärt.

 

Då icke-organisk modellering inbegriper modeller som ska representera hårda ytor, och därmed inte animeras med deformation, så antog jag att denna metod skulle lämpa sig väldigt väl åt dessa ytor som ofta har extremt mjuka former och precisa vassare kanter.

 

Min metod har varit att studera litteratur om Normal Mapping och hur man använder Photoshop för detta. Jag har sedan utfört praktiskt arbete för att utvärdera hur effektiv metoden är samt vilka fördelar den bidrar med. Jag går igenom teori för normal mapping som jag stödjer med hjälp av faktatexter och instruktions-DVDer i ämnet för att sedan redovisa metoden jag använt i mitt arbete. Jag avslutar sedan med en diskussion kring mitt resultat och redovisar vad jag kommit fram till genom mina experiment.

 

Jag kommer fram till att Normal Mapping med Photoshop är mycket väl lämpat åt hårda ytor och även bidrar med optimeringar i arbetsflödet både vad gäller organisering, tidsåtgång samt kontroll över resultatet. Ytterligare så ges förslag på förbättringar i pluginets funktionalitet för att öka användarvänligheten.

APA, Harvard, Vancouver, ISO, and other styles
34

Chehaimi, Omar. "Parallelizzazione dell'algoritmo di ricostruzione di Feldkamp-Davis-Kress per architetture Low-Power di tipo System-On-Chip." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13918/.

Full text
Abstract:
In questa tesi,svolta presso il CNAF,si presentano i risultati ottenuti nel lavoro svolto per la parallelizzazione in CUDA dell'algoritmo di ricostruzione tomografica di Feldkamp-Davis-Kress (FDK),sulla base del software in versione sia sequenziale che parallela MPI,sviluppato presso i laboratori del X-ray Imaging Group.Gli obbiettivi di questo lavoro sono principalmente due:ridurre in modo sensibile i tempi di esecuzione dell'algoritmo di ricostruzione FDK parallelizzando su Graphics Processing Unit (GPU) e valutare,su diverse tipologie di architetture,i consumi energetici.Le piattaforme prese in esame sono:SoC (System-on-Chip) low-power, architetture a basso consumo energetico ma a limitata potenza di calcolo,e High Performance Computing (HPC),caratterizzate da un'elevata potenza di calcolo ma con un ingente consumo energetico.Si vuole mettere in risalto la differenza di prestazioni in relazione al tipo di architettura e rispetto al relativo consumo energetico.Poter sostituire nodi HPC con schede SoC low-power presenta il vantaggio di ridurre i consumi, la complessità dell'hardware e la possibilità di ottenere dei risultati direttamente in loco.I risultati ottenuti mostrano che la parallelizzazione di FDK su GPU sia la scelta più efficiente. Risulta infatti sempre,e su ogni architettura testata,più performante rispetto alla versione MPI,nonostante in quest'ultima venga parallelizzato tutto l'algoritmo.In CUDA invece si parallelizza solo la fase di ricostruzione.Inoltre si è risusciti a raggiungere un'efficienza di utilizzo della GPU del 100%.L'efficienza energetica rapportata alle prestazioni in termini di tempo è migliore per le architetture SoC rispetto a quelle HPC.Si propone infine un approccio ibrido MPI unito a CUDA che migliora ulteriormente le prestazioni di esecuzione.Il filtraggio e la ricostruzione sono operazioni indipendenti,si utilizza allora l'implementazione più efficiente per la data operazione,filtrare in MPI e ricostruire in CUDA.
APA, Harvard, Vancouver, ISO, and other styles
35

Torcolacci, Veronica. "Implementation of Machine Learning Algorithms on Hardware Accelerators." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text
Abstract:
Nowadays, cutting-edge technology, innovation and efficiency are the cornerstones on which industries are based. Therefore, prognosis and health management have started to play a key role in the prevention of crucial faults and failures. Recognizing malfunctions in a system in advance is fundamental both in economic and safety terms. This obviously requires a lot of data – mainly information from sensors or machine control - to be processed, and it’s in this scenario that Machine Learning comes to the aid. This thesis aims to apply these methodologies to prognosis in automatic machines and has been carried out at LIAM lab (Laboratorio Industriale Automazione Macchine per il packaging), an industrial research laboratory born from the experience of leading companies in the sector. Machine learning techniques such as neural networks will be exploited to solve the problems of classification that derive from the system in exam. Such algorithms will be combined with systems identification techniques that performs an estimate of the plant parameters and a feature reduction by compressing the data. This makes easier for the neural networks to distinguish the different operating conditions and perform a good prognosis activity. Practically the algorithms will be developed in Python and then implemented on two hardware accelerators, whose performance will be evaluated.
APA, Harvard, Vancouver, ISO, and other styles
36

Ng, Robin. "Efficient Implementation of Histogram Dimension Reduction using Deep Learning : The project focuses on implementing deep learning algorithms on the state of the art Nvidia Drive PX GPU platform to achieve high performance." Thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-34862.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Mintěl, Tomáš. "Interpolace obrazových bodů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236736.

Full text
Abstract:
This master's thesis deals with acceleration of pixel interpolation methods using the GPU and NVIDIA (R) CUDA TM architecture. Graphic output is represented by a demonstrational application for geometrical image transforms using chosen interpolation method. Time critical parts of the code are moved on the GPU and executed in parallel. There are used highly optimized routines from the OpenCV library, made by the Intel company for an image and video processing.
APA, Harvard, Vancouver, ISO, and other styles
38

Němeček, Petr. "Geometrické transformace obrazu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236764.

Full text
Abstract:
This master's thesis deals with acceleration of geometrical image transforms using the GPU and NVIDIA (R) CUDA TM architecture. Time critical parts of the code are moved on the GPU and executed in parallel. One of the results is a demonstrational application for performance comparison of both architectures: the CPU, and GPU in combination with the CPU. As a reference implementation, there are used highly optimized routines from the OpenCV library, made by the Intel company.
APA, Harvard, Vancouver, ISO, and other styles
39

Hlavoň, David. "Detekce a klasifikace dopravních prostředků v obraze pomocí hlubokých neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-386014.

Full text
Abstract:
This master's thesis deals with a vehicle detector based on the convolutional neural network and scene captured by drone. Dataset is described at the beginning, because the main aim of this thesis is to create practicly usable detector. Architectures of the forward neural networks which detector was created from are described in the next chapter. Techniques for building a detector based on the naive methods and current the most successful meta architectures follow the neural network architectures. An implementation of the detector is described in the second part of this thesis. The final detector was built on meta architecture Faster R-CNN and PVA neural network on which the detector achieved score over 90 % and 45 full HD frames per seconds.
APA, Harvard, Vancouver, ISO, and other styles
40

Hordemann, Glen J. "Exploring High Performance SQL Databases with Graphics Processing Units." Bowling Green State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1380125703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Dočkal, Jiří. "Fyzikální simulace v počítačových hrách." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237215.

Full text
Abstract:
The thesis is concerned with modern game engines, focusing on physical simulation and particle systems. It offers usable architectures overview for a game engine development. The thesis provides characteristic to the most essential game engine's logical modules as scene graph, resource management or rendering. Today's tools used for physical simulation in games are also described. Main part of the thesis concentrates on design and implementation of its own C3D game engine which exploits capabilities of the NVIDIA PhysX physical engine. The thesis includes modern techniques rising from author's gained experience.
APA, Harvard, Vancouver, ISO, and other styles
42

Music, Sani. "Grafikkort till parallella beräkningar." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20150.

Full text
Abstract:
Den här studien beskriver hur grafikkort kan användas på en bredare front änmultimedia. Arbetet förklarar och diskuterar huvudsakliga alternativ som finnstill att använda grafikkort till generella operationer i dagsläget. Inom denna studieanvänds Nvidias CUDA arkitektur. Studien beskriver hur grafikkort användstill egna operationer rent praktiskt ur perspektivet att vi redan kan programmerai högnivåspråk och har grundläggande kunskap om hur en dator fungerar. Vianvänder s.k. accelererade bibliotek på grafikkortet (THRUST och CUBLAS) föratt uppnå målet som är utveckling av programvara och prestandatest. Resultatetär program som använder GPU:n till generella och prestandatest av dessa,för lösning av olika problem (matrismultiplikation, sortering, binärsökning ochvektor-inventering) där grafikkortet jämförs med processorn seriellt och parallellt.Resultat visar att grafikkortet exekverar upp till ungefär 50 gånger snabbare(tidsmässigt) kod jämfört med seriella program på processorn.
This study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor.
APA, Harvard, Vancouver, ISO, and other styles
43

Farabegoli, Nicolas. "Implementazione ottimizata dell'operatore di Dirac su GPGPU." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20356/.

Full text
Abstract:
Nelle applicazioni Lattice QCD l'operatore di Dirac rappresenta una delle principali operazioni, ottimizzarne l'efficienza si riflette in un incremento delle prestazioni globali dell'algoritmo. In tal senso i Tensor Core rappresentano una soluzione che incrementa le prestazioni del calcolo dell'operatore di Dirac ottimizzando in particolare la moltiplicazione tra matrici e vettori. Si è analizzata nel dettaglio l'architettura dei Tensor Core studiando il modello di esecuzione e il layout della memoria. Sono quindi state formulate e analizzate in dettaglio alcune soluzioni che sfruttano i Tensor Core per accelerare l'operatore di Dirac.
APA, Harvard, Vancouver, ISO, and other styles
44

Macenauer, Pavel. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234942.

Full text
Abstract:
This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.
APA, Harvard, Vancouver, ISO, and other styles
45

Polášek, Tomáš. "Hybridní raytracing v rozhraní DXR." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403161.

Full text
Abstract:
The goal of this thesis is to evaluate the usability of hardware accelerated ray tracing in near-future rendering engines. Specifically, DirectX Ray Tracing API and Nvidia Turing architecture are being examined. Design and implementation of a hybrid rendering engine with support for hardware accelerated ray tracing is included and used in implementation of frequently used graphical effects -- hard and soft shadows, reflections, and Ambient Occlusion. The assessment is made in terms of difficulty of integration into a rendering engine, performance of the resulting system and suitability of implementation of chosen graphical effects. Performance parameters -- including number of rays cast per second, time to build acceleration structures and computation time on the GPU -- are tested and discussed.
APA, Harvard, Vancouver, ISO, and other styles
46

Míček, Vojtěch. "Neuronové sítě pro klasifikaci typu a kvality průmyslových výrobků." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413276.

Full text
Abstract:
The aim of this master's thesis thesis is to enable evaluation of quality, or the type of product in industrial applications using artificial neural networks, especially in applications where the classical approach of machine vision is too complicated. The system thus designed is implemented onto a specific hardware platform and becomes a subject to the final optimalisation for the hardware platform for the best performance of the system.
APA, Harvard, Vancouver, ISO, and other styles
47

Bokhari, Saniyah S. "Parallel Solution of the Subset-sum Problem: An Empirical Study." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Straňák, Marek. "Raytracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237020.

Full text
Abstract:
Raytracing is a basic technique for displaying 3D objects. The goal of this thesis is to demonstrate the possibility of implementing raytracer using a programmable GPU. The algorithm and its modified version, implemented using "C for CUDA" language, are described. The raytracer is focused on displaying dynamic scenes. For this purpose the KD tree structure, bounding volume hierarchies and PBO transfer are used. To achieve realistic output, photon mapping was implemented.
APA, Harvard, Vancouver, ISO, and other styles
49

Artico, Fausto. "Performance Optimization Of GPU ELF-Codes." Doctoral thesis, Università degli studi di Padova, 2014. http://hdl.handle.net/11577/3424532.

Full text
Abstract:
GPUs (Graphic Processing Units) are of interest for their favorable ratio $\frac{GF/s}{price}$. Compared to the beginning - early 1980's - nowadays GPU architectures are more similar to general purpose architectures but with (much) larger numbers of cores - the GF100 architecture released by NVIDIA in 2009-2010, for example, has a true hardware cache hierarchy, a unified memory address space, double precision performance and has a maximum of 512 cores. Exploiting the computational power of GPUs for non-graphics applications - past or present - has, however, always been hard. Initially, in the early 2000's, the way to program GPUs was by using graphic libraries API's (exclusively), which made writing non-graphics codes non-trivial and tedious at best, and virtually impossible in the worst case. In 2003, the Brook compiler and runtime system was introduced, giving users the ability to generate GPU code from a high level programming language. In 2006 NVIDIA introduced CUDA (Compute Unified Device Architecture). CUDA, a parallel computing platform and programming model specifically developed by NVIDIA for its GPUs, attempts to further facilitate general purpose programming of GPUs. Code edited using CUDA is portable between different NVIDIA GPU architectures and this is one of the reasons because NVIDIA claims that the user's productivity is much higher than previous solutions, however optimizing GPU code for utmost performance remains very hard, especially for NVIDIA GPUs using the GF100 architecture - e.g., Fermi GPUs and some Tesla GPUs - because a) the real instruction set architecture (ISA) is not publicly available, b) the code of the NVIDIA compiler - nvcc - is not open and c) users can not edit code using the real assembly - ELF in NVIDIA parlance. Compilers, while enabling immense increases in programmer productivity, by eliminating the need to code at the (tedious) assembly level, are incapable of achieving, to date, performance similar to that of an expert assembly programmer with good knowledge of the underlying architecture. In fact, it is widely accepted that high-level language programming and compiling even with a state-of-the-art compilers loose, on average, a factor of 3 in performance - and sometimes much more - over what a good assembly programmer could achieve, and that even on a conventional, simple, single-core machine. Compilers for more complex machines, such as NVIDIA GPUs, are likely to do much worse because among other things, they face (even more) complex trade-offs between often undecidable and NP-hard problems. However, because NVIDIA a) makes it virtually impossible to gain access to the actual assembly language used by its GF100 architecture, b) does not publicly explain many of the internal mechanisms implemented in its compiler - nvcc - and c) makes it virtually impossible to learn the details of its very complex GF100 architecture in sufficient detail to be able to exploit them, obtaining an estimate of the performance difference between CUDA programming and machine-level programming for NVIDIA GPUs using the GF100 architecture - let alone achieving some a priori performance guarantees of shortest execution time - has been, prior to this current work, impossible. To optimize GPU code, users have to use CUDA or PTX (Parallel Thread Execution) - a virtual instruction set architecture. The CUDA or PTX files are given in input to nvcc that produces as output fatbin files. The fatbin files are produced considering the target GPU architecture selected by the user - this is done setting a flag used by nvcc. In a fatbin file, zero or more parts of the fatbin file will be executed by the CPU - think of these parts as the C/C++ parts - while the remaining parts of the fatbin file - think of these parts as the ELF parts - will be executed by the specific model of the GPU for which the CUDA or PTX file has been compiled. The fatbin files are usually very different from the corresponding CUDA or PTX files and this lack of control can completely ruin any effort made at CUDA or PTX level to optimize the ELF part/parts of the fatbin file that will be executed by the target GPU for which the fatbin file has been compiled. We therefore reverse engineer the real ISA used by the GF100 architecture and generate a set of editing guidelines to force nvcc to generate fatbin files with at least the minimum number of resources later necessary to modify them to get the wanted ELF algorithmic implementations - this gives control on the ELF code that is executed by any GPU using the GF100 architecture. During the process of reverse engineering we also discover all the correspondences between PTX instructions and ELF instructions - a single PTX instruction can be transformed in one or more ELF instructions - and the correspondences between PTX registers and ELF registers. Our procedure is completely repeatable for any NVIDIA Kepler GPU - we do not need to rewrite our code. Being able to get the wanted ELF algorithmic implementations is not enough to optimize the ELF code of a fatbin file, we need in fact also to discover, understand, and quantify some not disclosed GPU behaviors that could slow down the execution of ELF code. This is necessary to understand how to execute the optimization process and while we can not report here all the results we have got, we can however say that we will explain to the reader a) how to force even distributions of the GPU thread blocks to the streaming multiprocessors, b) how we have discovered and quantified several warp scheduling phenomenons, c) how to avoid phenomenons of warp scheduling load unbalancing, that it is not possible to control, in the streaming multiprocessors, d) how we have determined, for each ELF instruction, the minimum quantity of time that it is necessary to wait before a warp scheduler can schedule again a warp - yes, the quantity of time can be different for different ELF instructions - e) how we have determined the time that it is necessary to wait before to be able to read again the data in a register previously read or written - this too can be different for different ELF instructions and different whether the data has been previously read or written - and f) how we have discovered the presence of an overhead time for the management of the warps that does not grow linearly to a liner increase of the number of residents warps in a streaming multiprocessor. Next we explain a) the procedures of transformation that it is necessary to apply to the ELF code of a fatbin file to optimize the ELF code and so making its execution time as short as possible, b) why we need to classify the fatbin files generated from the original fatbin file during the process of optimization and how we do this using several criteria that as final result allow us to determine the positions, occupied by each one of the fatbin files generated, in a taxonomy that we have created, c) how using the position of a fatbin file in the taxonomy we determine whether the fatbin file is eligible for an empirical analysis - that we explain - a theoretical analysis or both, and d) how - if the fatbin file is eligible for a theoretical analysis - we execute the theoretical analysis that we have devised and give an a priori - without any previous execution of the fatbin file - shortest ELF code execution time guarantee - this if the fatbin file satisfies all the requirements of the theoretical analysis - for the ELF code of the fatbin file that will be executed by the target GPU for which the fatbin file has been compiled.
GPUs (Graphic Processing Units) sono di interesse per il loro favorevole rapporto $\frac{GF/s}{price}$. Rispetto all'inizio - primi anni 70 - oggigiorno le architectture GPU sono più simili ad architectture general purpose ma hanno un numero (molto) più grande di cores - la architecttura GF100 rilasciata da NVIDIA durante il 2009-2010, per esempio, ha una vera gerarchia di memoria cache, uno spazio unificato per l'indirizzamento in memoria, è in grado di eseguire calcoli in doppia precisione ed ha un massimo 512 core. Sfruttare la potenza computazionale delle GPU per applicazioni non grafiche - passate o presenti - è, comunque, sempre stato difficile. Inizialmente, nei primi anni 2000, la programmazione su GPU avveniva (esclusivamente) attraverso l'uso librerie grafiche, le quali rendevano la scrittura di codici non grafici non triviale e tediosa al meglio, e virtualmente impossibile al peggio. Nel 2003, furono introdotti il compilatore e il sistema runtime Brook che diedero agli utenti l'abilità di generare codice GPU da un linguaggio di programmazione ad alto livello. Nel 2006 NVIDIA introdusse CUDA (Compute Unified Device Architecture). CUDA, un modello di programmazione e computazione parallela specificamente sviluppato da NVIDIA per le sue GPUs, tenta di facilitare ulteriormente la programmazione general purpose di GPU. Codice scritto in CUDA è portabile tra differenti architectture GPU della NVIDIA e questa è una delle ragioni perché NVIDIA afferma che la produttività degli utenti è molto più alta di precedenti soluzioni, tuttavia ottimizare codice GPU con l'obbiettivo di ottenere le massime prestazioni rimane molto difficile, specialmente per NVIDIA GPUs che usano l'architecttura GF100 - per esempio, Fermi GPUs e delle Tesla GPUs - perché a) il vero instruction set architecture (ISA) è non pubblicamente disponibile, b) il codice del compilatore NVIDIA - nvcc - è non aperto e c) gli utenti non possono scrivere codice usando il vero assembly - ELF nel gergo della NVIDIA. I compilatori, mentre permettono un immenso incremento della produttività di un programmatore, eliminando la necessità di codificare al (tedioso) livello assembly, sono incapaci di ottenere, a questa data, prestazioni simili a quelle di un programmatore che è esperto in assembly ed ha una buona conoscenza dell'architettura sottostante. Infatti, è largamente accettato che programmazione ad alto livello e compilazione perfino con compilatori che sono considerati allo stato dell'arte perdono, in media, un fattore 3 in prestazione - e a volte molto di più - nei confronti di cosa un buon programmatore assembly potrebbe ottenere, e questo perfino su una macchina convenzionale, semplice, a singolo core. Compilatori per macchine più complesse, come le GPU NVIDIA, sono propensi a fare molto peggio perché tra le altre cose, essi devono determinare (persino più) complessi trade-offs durante la ricerca di soluzioni a problemi spesso indecidibili e NP-hard. Peraltro, perché NVIDIA a) rende virtualmente impossibile guadagnare accesso all'attuale linguaggio assembly usato dalla architettura GF100, b) non spiega pubblicamente molti dei meccanismi interni implementati nel suo compilatore - nvcc - e c) rende virtualmente impossible imparare i dettagli della molto complessa architecttura GF100 ad un sufficiente livello di dettaglio che permetta di sfruttarli, ottenere una stima delle differenze prestazionali tra programmazione in CUDA e programmazione a livello macchina per GPU NVIDIA che usano la architecttura GF100 - per non parlare dell'ottenimento a priori di garanzie di tempo di esecuzione più breve - è stato, prima di questo corrente lavoro, impossbile. Per ottimizare codice GPU, gli utenti devono usare CUDA or PTX (Parallel Thread Execution) - un instruction set architecture virtuale. I file CUDA or PTX sono dati in input a nvcc che produce come output fatbin file. I fatbin file sono prodotti considerando l'architecttura GPU selezionata dall'utente - questo è fatto settando un flag usato da nvcc. In un fatbin file, zero o più parti del fatbin file saranno eseguite dalla CPU - pensa a queste parti come le parti C/C++ - mentre le rimanenti parti del fatbin file - pensa a queste parti come le parti ELF - saranno eseguite dallo specifico modello GPU per il quale i file CUDA or PTX sono stati compilati. I fatbin file sono normalmente molto differenti dai corrispodenti file CUDA o PTX e questa assenza di controllo può completamente rovinare qualsiasi sforzo fatto a livello CUDA o PTX per otimizzare la parte o le parti ELF del fatbin file che sarà eseguita / saranno eseguite dalla GPU per la quale il fatbin file è stato compilato. Noi quindi scopriamo quale è il vero ISA usato dalla architettura GF100 e generiamo un insieme di linea guida per scrivere codice in modo tale da forzare nvcc a generare fatbin file con almeno il minimo numero di risorse successivamente necessario per modificare i fatbin file per ottenere le volute implementazioni algoritmiche in ELF - questo da controllo sul codice ELF che è eseguito da qualsiasi GPU che usa l'architettura GF100. Durante il processo di scoperata del vero ISA scopriamo anche le corrispondenze tra istruzioni PTX e istruzioni ELF - una singola istructione PTX può essere transformata in one o più istruzioni ELF - e le corrispondenze tra registri PTX e registri ELF. La nostra procedura è completamente ripetibile per ogni NVIDIA Kepler GPU - non occorre che riscrivamo il nostro codice. Essere in grado di ottenere le volute implementazioni algoritmiche in ELF non è abbastanza per ottimizzare il codice ELF di un fatbin file, ci occorre infatti anche scoprire, comprendere e quantificare dei comportamenti GPU che non sono divulgati e che potrebbero rallentare l'esecuzione di codice ELF. Questo è necessario per comprendere come eseguire il processo di ottimizzazione e mentre noi non possiamo riportare qui tutti i risultati che abbiamo ottenuto, noi possiamo comunque dire che spiegheremo al lettore a) come forzare una distribuzione uniforme dei GPU thread blocks agli streaming multiprocessors, b) come abbiamo scoperto e quantificato diversi fenomeni riguardanti il warp scheduling, c) come evitare fenomeni di warp scheduling load unblanacing, che è non possible controllare, negli streaming multiprocessors, d) come abbiamo determinato, per ogni istruzione ELF, la minima quantità di tempo che è necessario attendere prima che un warp scheduler possa schedulare ancora un warp - si, la quantità di tempo può essere differente per differenti istruzioni ELF - e) come abbiamo determinato il tempo che è necessario attendere prima di essere in grado di leggere ancora un dato in un registro precedentemente letto o scritto - questo pure può essere differente per differnti istruzioni ELF e differente se il dato è stato precedentemente letto o scritto - e f) come abbiamo scoperto la presenza di un tempo di overhead per la gestione dei warp che non cresce linearmente ad un incremento lineare del numero di warp residenti in uno streaming multiprocessor. Successivamente, noi spiegamo a) le procedure di trasformazione che è necessario applicare al codice ELF di un fatbin file per ottimizzare il codice ELF e così rendere il suo tempo di esecuzione il più corto possibile, b) perché occorre classificare i fatbin file generati dal fatbin file originale durante il processo di ottimizzazione e come noi facciamo questo usando diversi criteri che come risultato finale permettono a noi di determinare le posizioni, occupate da ogni fatbin file generato, in una tassonomia che noi abbiamo creato, c) come usando la posizione di un fatbin file nella tassonomia noi determiniamo se il fatbin file è qualificato per una analisi empirica - che noi spieghiamo - una analisi teorica o entrambe and d) come - supponendo il fatbin file sia qualificato per una analisi teorica - noi eseguiamo l'analisi teorica che abbiamo ideato e diamo a priori - senza alcuna precedente esecuzione del fatbin file - la garanzia - questo supponendo il fatbin file soddisfi tutti i requisiti dell'analisi teorica - che l'esecuzione del codice ELF del fatbin file, quando il fatbin file sarà eseguito sulla architettura GPU per cui è stato generato, sarà la più breve possibile.
APA, Harvard, Vancouver, ISO, and other styles
50

Adeboye, Taiyelolu. "Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-27561.

Full text
Abstract:
This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography