Log in

Relevant bibliographies by topics / Cuda (Compute unified device architecture)

Contents

Dissertations / Theses

Academic literature on the topic 'Cuda (Compute unified device architecture)'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cuda (Compute unified device architecture).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Dissertations / Theses on the topic "Cuda (Compute unified device architecture)"

1

Ringaby, Erik. "Optical Flow Computation on Compute Unified Device Architecture." Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15426.

Full text

Abstract:

<p>There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for.</p><p>Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance.</p>

APA, Harvard, Vancouver, ISO, and other styles

2

Bardella, Tiago Ungaro. "Otimização de multidões em jogos digitais utilizando CUDA." Universidade Presbiteriana Mackenzie, 2015. http://tede.mackenzie.br/jspui/handle/tede/1468.

Full text

Abstract:

Made available in DSpace on 2016-03-15T19:38:03Z (GMT). No. of bitstreams: 1 TIAGO UNGARO BARDELLA.pdf: 2553991 bytes, checksum: f8e6ba33f7c930ee81f6b64116f495ff (MD5) Previous issue date: 2015-10-19<br>The history of digital games shows, since the beginning, games which uses many types of enemy models to confront and many types of characters to control, like Real-Time Strategy games, for example. These huge amount of models into an important scene are called crowds. The crowds needs a high computer performance and specific algorithms in their interaction control to avoid immersion loss into a game by problems which may happen if the crowds are not treated accordingly. With the popularization of graphic board languages like NVIDIA CUDA, new algorithms were created to easily increase the performance of crowds in digital games and their overwhelming superiority compared to the methods used in linear programming were proved in many researches. The goal of this work is to use these GPU techniques as base to implement a new API using CUDA language that will present better performance and simplicity compared to the others algorithms on the area of crowds in digital games. After the project conclusion, the created API turned easier the crowd treatment to digital game developers using Unity3D integrated with API TBX, that now only need to include a DLL in the project instead creating na algorithm for crowd treatment from the beginning, which takes a huge amount of time from development.<br>O histórico dos jogos digitais apresenta, desde seu princípio, jogos que utilizam diversos modelos de inimigos para enfrentar ou diversos modelos de personagens para controlar, como os jogos Real-Time Strategy por exemplo. Essas grandes quantidades de modelos que compõem uma cena importante são chamadas de multidões. As multidões necessitam de um alto poder computacional e algoritmos específicos para seu tratamento para evitar a perda de imersão dentro de um jogo pelos problemas que podem acontecer caso as multidões não sejam tratadas adequadamente. Com o surgimento de linguagens de placas gráficas como a NVIDIA CUDA, novos algoritmos foram criados para melhor trabalhar com o desempenho de multidões em jogos digitais e sua superioridade em comparação com os métodos utilizados em programação sequencial foi comprovada em diversos estudos. O objetivo deste trabalho é se basear nestas técnicas de GPU para implementar uma nova API usando tecnologia CUDA que visa melhorar os algoritmos existentes para tratamento de multidões em jogos digitais em termos de desempenho e simplicidade de implementação. Com a conclusão do projeto, a API criada facilitou o tratamento de multidões para desenvolvedores de jogos digitais com a game engine Unity3D integrada com a API TBX de simulação de multidões, que agora apenas precisam incluir uma DLL em seu projeto ao invés de criar um algoritmo próprio de tratamento de multidões do início, o que demanda tempo de desenvolvimento.

APA, Harvard, Vancouver, ISO, and other styles

3

Rocha, Lindomar José. "Determinação de autovalores e autovetores de matrizes tridiagonais simétricas usando CUDA." reponame:Repositório Institucional da UnB, 2015. http://repositorio.unb.br/handle/10482/19625.

Full text

Abstract:

Dissertação (mestrado)–Universidade de Brasília, Universidade UnB de Planaltina, Programa de Pós-Graduação em Ciência de Materiais, 2015.<br>Submitted by Fernanda Percia França (fernandafranca@bce.unb.br) on 2015-12-15T17:59:17Z No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5)<br>Approved for entry into archive by Raquel Viana(raquelviana@bce.unb.br) on 2016-02-29T22:14:44Z (GMT) No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5)<br>Made available in DSpace on 2016-02-29T22:14:44Z (GMT). No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5)<br>Diversos ramos do conhecimento humano fazem uso de autovalores e autovetores, dentre eles têm-se Física, Engenharia, Economia, etc. A determinação desses autovalores e autovetores pode ser feita utilizando diversas rotinas computacionais, porém umas mais rápidas que outras nesse senário de ganho de velocidade aparece a opção de se usar a computação paralela de forma mais especifica a CUDA da Nvidia é uma opção que oferece um ganho de velocidade significativo, nesse modelo as rotinas são executadas na GPU onde se tem diversos núcleos de processamento. Dada a tamanha importância dos autovalores e autovetores o objetivo desse trabalho é determinar rotinas que possam efetuar o cálculos dos mesmos com matrizes tridiagonais simétricas reais de maneira mais rápida e segura, através de computação paralela com uso da CUDA. Objetivo esse alcançado através da combinação de alguns métodos numéricos para a obtenção dos autovalores e um alteração no método da iteração inversa utilizado na determinação dos autovetores. Temos feito uso de rotinas LAPACK para comparar com as nossas rotinas desenvolvidas em CUDA. De acordo com os resultados, a rotina desenvolvida em CUDA tem a vantagem clara de velocidade quer na precisão simples ou dupla, quando comparado com o estado da arte das rotinas de CPU a partir da biblioteca LAPACK. ______________________________________________________________________________________________ ABSTRACT<br>Severa branches of human knowledge make use of eigenvalues and eigenvectors, among them we have physics, engineering, economics, etc. The determination of these eigenvalues and eigenvectors can be using various computational routines, som faster than others in this speed increase scenario appears the option to use the parallel computing more specifically the Nvidia’s CUDA is an option that provides a gain of significant speed, this model the routines are performed on the GPU which has several processing cores. Given the great importance of the eigenvalues and eigenvectors the objective of this study is to determine routines that can perform the same calculations with real symmetric tridiagonal matrices more quickly and safely, through parallel computing with use of CUDA. Objective that achieved by some combination of numerical methods to obtain the eigenvalues and a change in the method of inverse iteration used to determine of the eigenvectors, which was used LAPACK routines to compare with routine developed in CUDA. According to the results of the routine developed in CUDA has marked superiority with single or double precision, in the question speed regarding the routines of LAPACK.

APA, Harvard, Vancouver, ISO, and other styles

4

Ling, Cheng. "High performance bioinformatics and computational biology on general-purpose graphics processing units." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6260.

Full text

Abstract:

Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology.

APA, Harvard, Vancouver, ISO, and other styles

5

Góes, Josecley Fialho. "Resolução numérica de escoamentos compressíveis empregando um método de partículas livre de malhas e o processamento em paralelo (CUDA)." Universidade do Estado do Rio de Janeiro, 2011. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=3082.

Full text

Abstract:

Os métodos numéricos convencionais, baseados em malhas, têm sido amplamente aplicados na resolução de problemas da Dinâmica dos Fluidos Computacional. Entretanto, em problemas de escoamento de fluidos que envolvem superfícies livres, grandes explosões, grandes deformações, descontinuidades, ondas de choque etc., estes métodos podem apresentar algumas dificuldades práticas quando da resolução destes problemas. Como uma alternativa viável, existem os métodos de partículas livre de malhas. Neste trabalho é feita uma introdução ao método Lagrangeano de partículas, livre de malhas, Smoothed Particle Hydrodynamics (SPH) voltado para a simulação numérica de escoamentos de fluidos newtonianos compressíveis e quase-incompressíveis. Dois códigos numéricos foram desenvolvidos, uma versão serial e outra em paralelo, empregando a linguagem de programação C/C++ e a Compute Unified Device Architecture (CUDA), que possibilita o processamento em paralelo empregando os núcleos das Graphics Processing Units (GPUs) das placas de vídeo da NVIDIA Corporation. Os resultados numéricos foram validados e a eficiência computacional avaliada considerandose a resolução dos problemas unidimensionais Shock Tube e Blast Wave e bidimensional da Cavidade (Shear Driven Cavity Problem).<br>The conventional mesh-based numerical methods have been widely applied to solving problems in Computational Fluid Dynamics. However, in problems involving fluid flow free surfaces, large explosions, large deformations, discontinuities, shock waves etc. these methods suffer from some inherent difficulties which limit their applications to solving these problems. Meshfree particle methods have emerged as an alternative to the conventional grid-based methods. This work introduces the Smoothed Particle Hydrodynamics (SPH), a meshfree Lagrangian particle method to solve compressible flows. Two numerical codes have been developed, serial and parallel versions, using the Programming Language C/C++ and Compute Unified Device Architecture (CUDA). CUDA is NVIDIAs parallel computing architecture that enables dramatic increasing in computing performance by harnessing the power of the Graphics Processing Units (GPUs). The numerical results were validated and the speedup evaluated for the Shock Tube and Blast Wave one-dimensional problems and Shear Driven Cavity Problem.

APA, Harvard, Vancouver, ISO, and other styles

6

Góes, Marciana Lima. "Desenvolvimento de um simulador numérico empregando o método Smoothed Particle Hydrodynamics para a resolução de escoamentos incompressíveis. Implementação computacional em paralelo (CUDA)." Universidade do Estado do Rio de Janeiro, 2012. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=4029.

Full text

Abstract:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior<br>Neste trabalho, foi desenvolvido um simulador numérico baseado no método livre de malhas Smoothed Particle Hydrodynamics (SPH) para a resolução de escoamentos de fluidos newtonianos incompressíveis. Diferentemente da maioria das versões existentes deste método, o código numérico faz uso de uma técnica iterativa na determinação do campo de pressões. Este procedimento emprega a forma diferencial de uma equação de estado para um fluido compressível e a equação da continuidade a fim de que a correção da pressão seja determinada. Uma versão paralelizada do simulador numérico foi implementada usando a linguagem de programação C/C++ e a Compute Unified Device Architecture (CUDA) da NVIDIA Corporation. Foram simulados três problemas, o problema unidimensional do escoamento de Couette e os problemas bidimensionais do escoamento no interior de uma Cavidade (Shear Driven Cavity Problem) e da Quebra de Barragem (Dambreak).<br>In this work a numerical simulator was developed based on the mesh-free Smoothed Particle Hydrodynamics (SPH) method to solve incompressible newtonian fluid flows. Unlike most existing versions of this method, the numerical code uses an iterative technique in the pressure field determination. This approach employs a differential state equation for a compressible fluid and the continuity equation to calculate the pressure correction. A parallel version of the numerical code was implemented using the Programming Language C/C++ and Compute Unified Device Architecture (CUDA) from the NVIDIA Corporation. The numerical results were validated and the speed-up evaluated for an one-dimensional Couette flow and two-dimensional Shear Driven Cavity and Dambreak problems.

APA, Harvard, Vancouver, ISO, and other styles

7

Kanzaki, Cabrera Takeichi. "Numerical modeling of anisotropic granular media." Doctoral thesis, Universitat de Girona, 2013. http://hdl.handle.net/10803/133834.

Full text

Abstract:

Granular materials are multi-particle systems involved in many industrial process and everyday life. The mechanical behavior of granular media such as sand, coffee beans, planetary rings and powders are current challenging tasks. In the last years, these systems have been widely examined experimentally, analytically and numerically, and they continue producing relevant and unexpected results. Despite the fact that granular media are often composed of grains with anisotropic shapes like rice, lentils or pills, most experimental and theoretical studies have concerned spherical particles. The aim of this thesis has been to examine numerically the behavior of granular media composted by spherical and non-spherical particles. Our numerical implementations have permitted the description of the macroscopic properties of mechanically stable granular assemblies, which have been experimentally examined in a framework of the projects "Estabilidad y dinámica de medios granulares anisótropos" (FIS2008- 06034-C02-02) University of Girona and "Interacciones entre partículas y emergencia de propiedades macroscópicas en medios granulares" (FIS2008-06034-C02-01) University of Navarra<br>Els materials granulars són sistemes de moltes partícules implicats en diversos processos industrials i en la nostra vida quotidiana. El comportament mecànic de conjunts granulars, com la sorra, grans de cafè, anells o pols planetàries, representa actualment un repte per a la ciència. En els últims anys aquests sistemes s’han estudiat àmpliament de forma experimental, analítica i numèrica. De totes maneres, avui dia es continuen obtenint resultats rellevants, i en moltes ocasions, inesperats. Malgrat el fet que els materials granulars sovint estan compostos per grans amb forma anisotròpica, com l’arròs, les llenties o les píndoles, la majoria dels estudis experimentals i teòrics se centren en partícules esfèriques. L’objectiu d’aquesta tesi ha estat analitzar numèricament el comportament dels mitjans granulars compostos per partícules esfèriques i no esfèriques. Els mètodes numèrics implementats han permès la descripció de les propietats macroscòpiques de piles i columnes granulars, que s’han estudiat experimentalment en el marc dels projectes "Estabilidad y dinámica de medios granulares anisótropos" (FIS2008-06034-C02- 02) de la Universitat de Girona i "Interacciones entre partículas y emergencia de propiedades macroscópicas en medios granulares" (FIS2008-06034-C02- 01) de la Universitat de Navarra

APA, Harvard, Vancouver, ISO, and other styles

8

Mendes, Sérgio Alexandre Alves. "Reconstrução de imagem médica de mamografia por emissão de positrões (PEM) com GPU." Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/7916.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica<br>O cancro da mama é uma das principais causas de mortes entre as mulheres, sendo a detecção precoce desta patologia uma das áreas de maior interesse e desenvolvimento nos últimos anos. O projecto Clear-PEM consistiu no desenvolvimento de um scanner baseado numa técnica tomográfica de medicina nuclear, designada por mamografia por emissão de positrões(PEM). Este scanner é constituído por duas cabeças detectoras que rodam em torno da mama, permitindo a detecção de radiação emitida do interior do corpo da paciente. Este trabalho consistiu no desenvolvimento de dois algoritmos iterativos de reconstrução de imagem: Máxima Verosimilhança – Maximização da Expectativa (MLEM) e Subconjuntos Ordenados – Maximização da Expectativa (OSEM). O objectivo era recorrer às vantagens da computação em paralelo, através da programação em placas gráficas (GPU - Graphics Processing Unit), em oposição à programação mais tradicional que utiliza o processador do computador (CPU - Central Processing Unit). Deste modo, pretende-se minimizar a principal desvantagem dos algoritmos iterativos de reconstrução de imagem, quando comparados com soluções analíticas, que é o seu elevado tempo de computação. Neste trabalho recorreu-se a uma placa gráfica (GPU) da NVIDIA®, tendo sido utilizada a CUDA™ para desenvolver os dois algoritmos de reconstrução. Os dois algoritmos desenvolvidos (MLEM e OSEM) apresentam uma melhoria significativa em termos do tempo de computação recorrendo à programação em placas gráficas(GPU), aproximadamente 29 e 27 vezes inferior, respectivamente, em relação ao tempo necessário utilizando o processador do computador (CPU), sendo os resultados obtidos em termos de qualidade de imagem semelhantes.

APA, Harvard, Vancouver, ISO, and other styles

9

Azevedo, Bernardo Lopes de Sá. "Reconstrução/processamento de imagem médica com GPU em tomossíntese." Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/7503.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica<br>A Tomossíntese Digital Mamária (DBT) é uma recente técnica de imagem médica tridimensional baseada na mamografia digital que permite uma melhor observação dos tecidos sobrepostos, principalmente em mamas densas. Esta técnica consiste na obtenção de múltiplas imagens (cortes) do volume a reconstruir, permitindo dessa forma um diagnóstico mais eficaz, uma vez que os vários tecidos não se encontram sobrepostos numa imagem 2D. Os algoritmos de reconstrução de imagem usados em DBT são bastante similares aos usados em Tomografia Computorizada (TC). Existem duas classes de algoritmos de reconstrução de imagem: analíticos e iterativos. No âmbito deste trabalho foram implementados dois algoritmos iterativos de reconstrução: Maximum Likelihood – Expectation Maximization (ML-EM) e Ordered Subsets – Expectation Maximization (OS-EM). Os algoritmos iterativos permitem melhores resultados, no entanto são computacionalmente muito pesados, pelo que, os algoritmos analíticos têm sido preferencialmente usados em prática clínica. Com os avanços tecnológicos na área dos computadores, já é possível diminuir consideravelmente o tempo que leva para reconstruir uma imagem com um algoritmo iterativo. Os algoritmos foram implementados com recurso à programação em placas gráficas − General-Purpose computing on Graphics Processing Units (GPGPU). A utilização desta técnica permite usar uma placa gráfica (GPU – Graphics Processing Unit) para processar tarefas habitualmente designadas para o processador de um computador (CPU – Central Processing Unit) ao invés da habitual tarefa do processamento gráfico a que são associadas as GPUs. Para este projecto foi usado uma GPU NVIDIA®, recorrendo-se à arquitectura Compute Unified Device Architecture (CUDA™) para codificar os algoritmos de reconstrução. Os resultados mostraram que a implementação dos algoritmos em GPU permitiu uma diminuição do tempo de reconstrução em, aproximadamente, 6,2 vezes relativamente ao tempo obtido em CPU. No respeitante à qualidade de imagem, a GPU conseguiu atingir um nível de detalhe similar às imagens da CPU, apesar de diferenças pouco significativas.

APA, Harvard, Vancouver, ISO, and other styles

10

Brown, Dane. "Faster upper body pose recognition and estimation using compute unified device architecture." Thesis, University of Western Cape, 2013. http://hdl.handle.net/11394/3455.

Full text

Abstract:

>Magister Scientiae - MSc<br>The SASL project is in the process of developing a machine translation system that can translate fully-fledged phrases between SASL and English in real-time. To-date, several systems have been developed by the project focusing on facial expression, hand shape, hand motion, hand orientation and hand location recognition and estimation. Achmed developed a highly accurate upper body pose recognition and estimation system. The system is capable of recognizing and estimating the location of the arms from a twodimensional video captured from a monocular view at an accuracy of 88%. The system operates at well below real-time speeds. This research aims to investigate the use of optimizations and parallel processing techniques using the CUDA framework on Achmed’s algorithm to achieve real-time upper body pose recognition and estimation. A detailed analysis of Achmed’s algorithm identified potential improvements to the algorithm. Are- implementation of Achmed’s algorithm on the CUDA framework, coupled with these improvements culminated in an enhanced upper body pose recognition and estimation system that operates in real-time with an increased accuracy.

APA, Harvard, Vancouver, ISO, and other styles

More sources

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!