Dissertations / Theses on the topic 'NVIDIA'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'NVIDIA.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gameiro, Pedro Miguel Rodrigues. "Equity research - NVIDIA Corporation." Master's thesis, Instituto Superior de Economia e Gestão, 2018. http://hdl.handle.net/10400.5/16970.
Full textEste relatório reflete a avaliação da empresa de Semi-condutores, a NVIDIA Corporation e está de acordo com o trabalho final de mestrado de Finanças do ISEG. Este relatório foi escrito com base nas recomendações do CFA Institute. A NVIDIA é uma empresa que está a enfrentar um momento bastante singular comparado com os seus competidores, com um crescimento anual de vendas de 40% e um aumento na avaliação das suas ações de 334,46% nos últimos dos anos. Não só a NVIDIA está a ter uma performance financeira interessante como se está a entrar em mercados emergentes como a autonomização automóvel e a criptomoeda, o que faz com que seja um caso de estudo bastante interessante. Também a fascinação em relação a tecnologia e em especifico, ao gaming, foram uma das razões pela qual esta empresa foi escolhida. Este relatório foi desenvolvido com base em informação pública disponível até 30 de Junho de 2018 e nenhuma informação posterior a esta data não foi considerada. O preço de ação de $303,67, foi obtido através do modelo de Fluxos de Caixa Descontados. O método de avaliação relativa foi tentado, porém dado à situação única da NVIDIA, não existe competidores que consideremos como peer's comparáveis em termos de múltiplos. Esta avaliação sugere uma recomendação de COMPRA, apesar do seu risco médio, dado que a NVDIA está consolidada no seu mercado principal, o gaming, porém existe alguma incerteza relativamente aos mercados da criptomoeda e autonomização automóvel.
This project reflects an evaluation of NVIDIA Corporation, Semiconductor Company, according to ISEG´s Master in Finance final work project. This report was written in agreement with the recommendations of the CFA Institute. NVIDIA is a company that is facing a very singular moment comparing to its peers, with a 40% annual revenue growth and a valuation increase of 334,46% in the last two years. Not only NVIDIA is having an interesting financial performance but also is entering in emerging markets, such as, autonomous cars and cryptocurrencies, being a very interesting case study. Also the fascination about technology and gaming in specific was one of the reasons this company was chosen. This report was developed considering public information available until June 30th 2018 and any information or event subsequent to this date has not been considered. The price target of $303,67 was obtained from the Discounted Cash Flow method. The relative valuation method was attempted, but due to the unique situation of NVIDIA, there are not close peers following the criteria's used. This valuation suggests to a BUY recommendation, although with medium risk, since NVIDIA is consolidated in their main market, gaming, but there is some uncertainty relatively to markets like cryptocurrency and autonomous cars.
info:eu-repo/semantics/publishedVersion
Zajíc, Jiří. "Překladač jazyka C# do jazyka Nvidia CUDA." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236439.
Full textSantos, Paulo Carlos Ferreira dos. "Extração de informações de desempenho em GPUs NVIDIA." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-02042013-090806/.
Full textThe recent growth in the use of tailored for performance Graphics Processing Units (GPUs) in scientific applications, generated the need to optimize GPU targeted programs. Performance models are the suitable tools for this task and they benefits from existing GPUs performance information extraction tools. This work covers the creation of a microbenchmark generator using PTX instructions and it also retrieves information about the GPU hardware characteristics. The microbenchmark results were validated using a simplified model with errors rates between 6.11% and 16.32% under five diferent GPU kernels. We also explain the imprecision factors present in the microbenchmark results. This tool was used to analyze the instructions performance profile, identifying groups with similar behavior. We also evaluated the corelation of the GPU pipeline performance and instructions execution sequence. Compiler optimization capabilities for this case were also verified. We concluded that the use of microbenchmarks with PTX instructions is a feasible approach and an efective way to build performance models and to generate detailed analysis of the instructions\' behavior.
Krivoklatský, Filip. "Návrh vestavaného systému inteligentného vidění na platformě NVIDIA." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400627.
Full textSavioli, Nicolo'. "Parallelization of the algorithm WHAM with NVIDIA CUDA." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/6377/.
Full textIkeda, Patricia Akemi. "Um estudo do uso eficiente de programas em placas gráficas." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25042012-212956/.
Full textInitially designed for graphical processing, the graphic cards (GPUs) evolved to a high performance general purpose parallel coprocessor. Due to huge potencial that graphic cards offer to several research and commercial areas, NVIDIA was the pioneer lauching of CUDA architecture (compatible with their several cards), an environment that take advantage of computacional power combined with an easier programming. In an attempt to make use of all capacity of GPU, some practices must be followed. One of them is to maximizes hardware utilization. This work proposes a practical and extensible tool that helps the programmer to choose the best configuration and achieve this goal.
Rivera-Polanco, Diego Alejandro. "COLLECTIVE COMMUNICATION AND BARRIER SYNCHRONIZATION ON NVIDIA CUDA GPU." Lexington, Ky. : [University of Kentucky Libraries], 2009. http://hdl.handle.net/10225/1158.
Full textTitle from document title page (viewed on May 18, 2010). Document formatted into pages; contains: ix, 88 p. : ill. Includes abstract and vita. Includes bibliographical references (p. 86-87).
Harvey, Jesse Patrick. "GPU acceleration of object classification algorithms using NVIDIA CUDA /." Online version of thesis, 2009. http://hdl.handle.net/1850/10894.
Full textLerchundi, Osa Gorka. "Fast Implementation of Two Hash Algorithms on nVidia CUDA GPU." Thesis, Norwegian University of Science and Technology, Department of Telematics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9817.
Full textUser needs increases as time passes. We started with computers like the size of a room where the perforated plaques did the same function as the current machine code object does and at present we are at a point where the number of processors within our graphic device unit its not enough for our requirements. A change in the evolution of computing is looming. We are in a transition where the sequential computation is losing ground on the benefit of the distributed. And not because of the birth of the new GPUs easily accessible this trend is novel but long before it was used for projects like SETI@Home, fightAIDS@Home, ClimatePrediction and there were shouting from the rooftops about what was to come. Grid computing was its formal name. Until now it was linked only to distributed systems over the network, but as this technology evolves it will take different meaning. nVidia with CUDA has been one of the first companies to make this kind of software package noteworthy. Instead of being a proof of concept its a real tool. Where the transition is expressed in greater magnitude in which the true artist is the programmer who uses it and achieves performance increases. As with many innovations, a community distributed worldwide has grown behind this software package and each one doing its bit. It is noteworthy that after CUDA release a lot of software developments grown like the cracking of the hitherto insurmountable WPA. With Sony-Toshiba-IBM (STI) alliance it could be said the same thing, it has a great community and great software (IBM is the company in charge of maintenance). Unlike nVidia is not as accessible as it is but IBM is powerful enough to enter home made supercomputing market. In this case, after IBM released the PS3 SDK, a notorious application was created using the benefits of parallel computing named Folding@Home. Its purpose is to, inter alia, find the cure for cancer. To sum up, this is only the beginning, and in this thesis is sized up the possibility of using this technology for accelerating cryptographic hash algorithms. BLUE MIDNIGHT WISH (The hash algorithm that is applied to the surgery) is undergone to an environment change adapting it to a parallel capable code for creating empirical measures that compare to the current sequential implementations. It will answer questions that nowadays havent been answered yet. BLUE MIDNIGHT WISH is a candidate hash function for the next NIST standard SHA-3, designed by professor Danilo Gligoroski from NTNU and Vlastimil Klima an independent cryptographer from Czech Republic. So far, from speed point of view BLUE MIDNIGHT WISH is on the top of the charts (generally on the second place right behind EDON-R - another hash function from professor Danilo Gligoroski). One part of the work on this thesis was to investigate is it possible to achieve faster speeds in processing of Blue Midnight Wish when the computations are distributed among the cores in a CUDA device card. My numerous experiments give a clear answer: NO. Although the answer is negative, it still has a significant scientific value. The point is that my work acknowledges viewpoints and standings of a part of the cryptographic community that is doubtful that the cryptographic primitives will benefit when executed in parallel in many cores in one CPU. Indeed, my experiments show that the communication costs between cores in CUDA outweigh by big margin the computational costs done inside one core (processor) unit.
Virk, Bikram. "Implementing method of moments on a GPGPU using Nvidia CUDA." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33980.
Full textSubramoniapillai, Ajeetha Saktheesh. "Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1344623484.
Full textSreenibha, Reddy Byreddy. "Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16846.
Full textNejadfard, Kian. "Context-aware automated refactoring for unified memory allocation in NVIDIA CUDA programs." Cleveland State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=csu1624622944458295.
Full textZaahid, Mohammed. "Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuS." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-16852.
Full textGraves, Russell Edward. "High performance password cracking by implementing rainbow tables on nVidia graphics cards (IseCrack)." [Ames, Iowa : Iowa State University], 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1461850.
Full textЯсменко, В. Р., and К. В. Донець. "Аналіз можливостей візуалізатора NVIDIA Iray у порівнянні з візуалізатором V-Ray програми 3DS Max." Thesis, КНУТД, 2016. https://er.knutd.edu.ua/handle/123456789/4360.
Full textBourque, Donald. "CUDA-Accelerated ORB-SLAM for UAVs." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/882.
Full textShaker, Alfred M. "COMPARISON OF THE PERFORMANCE OF NVIDIA ACCELERATORS WITH SIMD AND ASSOCIATIVE PROCESSORS ON REAL-TIME APPLICATIONS." Kent State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=kent1501084051233453.
Full textBuenaflor, Jr Romeo C. "Using State-of-the-Art GPGPU's for Molecular Simulation : Optimizing Massively Parrallelized N-Body programs using NVIDIA Tesla." Thesis, Uppsala University, Department of Information Technology, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-110895.
Full textComputation and simulation as a tool for discovering new knowledge is still marred by problems that are intractable, combinatoric, or simply plagued by the so called curse of dimensionality. The algorithm used for molecular simulation of polyelectrolytes is one of those areas in computational chemistry and science suffering from the curse of dimensionality. Much of the problems, though, have been claimed to be solvable with the advent of more sophisticated and powerful computers and related technologies. This paper attempts to substantiate the claim. In this paper, a state-of-the-art NVIDIA Tesla C870 has been utilized to massively parallelize the algorithm for the molecular simulation of polyelectrolytes. In particular, this paper attempts to optimize the portion of the code involving the computation of electrostatic interaction using the new technology. It has been shown that tapping this new line of technology poses great advantage in winning the war against the curse of dimensionality.
Surineni, Sruthikesh. "Performance/Accuracy Trade-offs of Floating-point Arithmetic on Nvidia GPUs| From a Characterization to an Auto-tuner." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13850754.
Full textFloating-point computations produce approximate results, possibly leading to inaccuracy and reproducibility problems. Existing work addresses two issues: first, the design of high precision floating-point representations, and second, the study of methods to support a trade-off between accuracy and performance of central processing unit (CPU) applications. However, a comprehensive study of trade-offs between accuracy and performance on modern graphic processing units (GPUs) is missing. This thesis covers the use of different floating-point precisions (i.e., single and double floating-point precision) in the IEEE 754 standard, the GNU Multiple Precision Arithmetic Library (GMP), and composite floating-point precision on a GPU using a variety of synthetic and real-world benchmark applications. First, we analyze the support for a single and double precision floating-point arithmetic on the considered GPU architectures, and we characterize the latencies of all floating-point instructions on GPU. Second, a study is presented on the performance/accuracy tradeoffs related to the use of different arithmetic precisions on addition, multiplication, division, and natural exponential function. Third, an analysis is given on the combined use of different arithmetic operations on three benchmark applications characterized by different instruction mixes and arithmetic intensities. As a result of this analysis, a novel auto tuner was designed in order to select the arithmetic precision of a GPU program leading to a better performance and accuracy tradeoff depending on the arithmetic operations and math functions used in the program and the degree of multithreading of the code.
Bronda, Samuel. "Hluboké neuronové sítě pro prostředí superpočítače." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400885.
Full textAraújo, João Manuel da Silva. "Paralelização de algoritmos de Filtragem baseados em XPATH/XML com recurso a GPUs." Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2530.
Full textEsta dissertação envolve o estudo da viabilidade da utilização dos GPUs para o processamento paralelo aplicado aos algoritmos de filtragem de notificações num sistema editor/assinante. Este objectivo passou por realizar uma comparação de resultados experimentais entre a versão sequencial (nos CPUs) e a versão paralela de um algoritmo de filtragem escolhido como referência. Essa análise procurou dar elementos para aferir se eventuais ganhos da exploração dos GPUs serão suficientes para compensar a maior complexidade do processo.
Shi, Bobo. "Implementation and Performance Analysis of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor and NVIDIA GPU Accelerator." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1462793739.
Full textCammareri, Costantino Davide. "Sistema di misura dei consumi di unità di calcolo low-power per applicazioni scientifiche presso il centro INFN-CNAF." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15718/.
Full textNottingham, Alastair. "GPF : a framework for general packet classification on GPU co-processors." Thesis, Rhodes University, 2012. http://hdl.handle.net/10962/d1006662.
Full textLaTeX with hyperref package
Fuksa, Tomáš. "Paralelizace výpočtů pro zpracování obrazu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-219371.
Full textMašek, Jan. "Dynamický částicový systém jako účinný nástroj pro statistické vzorkování." Doctoral thesis, Vysoké učení technické v Brně. Fakulta stavební, 2018. http://www.nusl.cz/ntk/nusl-390276.
Full textSenthil, Kumar Nithin. "Designing optimized MPI+NCCL hybrid collective communication routines for dense many-GPU clusters." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619132252608831.
Full textBartosch, Nadine. "Correspondence-based pairwise depth estimation with parallel acceleration." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34372.
Full textKarri, Venkata Praveen. "Effective and Accelerated Informative Frame Filtering in Colonoscopy Videos Using Graphic Processing Units." Thesis, University of North Texas, 2010. https://digital.library.unt.edu/ark:/67531/metadc31536/.
Full textLoundagin, Justin. "Optimizing Harris Corner Detection on GPGPUs Using CUDA." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1348.
Full textEkstam, Ljusegren Hannes, and Hannes Jonsson. "Parallelizing Digital Signal Processing for GPU." Thesis, Linköpings universitet, Programvara och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167189.
Full textFors, Martin. "Normal Mapping för Hårda Ytor : Photoshop och Maya Transfer Maps för Normal Mapping av icke-organiska geometri i datorspel." Thesis, University of Skövde, School of Humanities and Informatics, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-3103.
Full textI mitt examensarbete har jag undersökt om det lämpar sig att använda en manuell metod för att skapa normal maps till icke-organiska polygonmodeller avsedda för datorspel. Jag har använt mig av Photoshop för att måla normal maps som jag sedan applicerar på lågdetaljerade modeller jag skapat, för att höja detaljgraden avsevärt.
Då icke-organisk modellering inbegriper modeller som ska representera hårda ytor, och därmed inte animeras med deformation, så antog jag att denna metod skulle lämpa sig väldigt väl åt dessa ytor som ofta har extremt mjuka former och precisa vassare kanter.
Min metod har varit att studera litteratur om Normal Mapping och hur man använder Photoshop för detta. Jag har sedan utfört praktiskt arbete för att utvärdera hur effektiv metoden är samt vilka fördelar den bidrar med. Jag går igenom teori för normal mapping som jag stödjer med hjälp av faktatexter och instruktions-DVDer i ämnet för att sedan redovisa metoden jag använt i mitt arbete. Jag avslutar sedan med en diskussion kring mitt resultat och redovisar vad jag kommit fram till genom mina experiment.
Jag kommer fram till att Normal Mapping med Photoshop är mycket väl lämpat åt hårda ytor och även bidrar med optimeringar i arbetsflödet både vad gäller organisering, tidsåtgång samt kontroll över resultatet. Ytterligare så ges förslag på förbättringar i pluginets funktionalitet för att öka användarvänligheten.
Chehaimi, Omar. "Parallelizzazione dell'algoritmo di ricostruzione di Feldkamp-Davis-Kress per architetture Low-Power di tipo System-On-Chip." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13918/.
Full textTorcolacci, Veronica. "Implementation of Machine Learning Algorithms on Hardware Accelerators." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.
Find full textNg, Robin. "Efficient Implementation of Histogram Dimension Reduction using Deep Learning : The project focuses on implementing deep learning algorithms on the state of the art Nvidia Drive PX GPU platform to achieve high performance." Thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-34862.
Full textMintěl, Tomáš. "Interpolace obrazových bodů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236736.
Full textNěmeček, Petr. "Geometrické transformace obrazu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236764.
Full textHlavoň, David. "Detekce a klasifikace dopravních prostředků v obraze pomocí hlubokých neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-386014.
Full textHordemann, Glen J. "Exploring High Performance SQL Databases with Graphics Processing Units." Bowling Green State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1380125703.
Full textDočkal, Jiří. "Fyzikální simulace v počítačových hrách." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237215.
Full textMusic, Sani. "Grafikkort till parallella beräkningar." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20150.
Full textThis study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor.
Farabegoli, Nicolas. "Implementazione ottimizata dell'operatore di Dirac su GPGPU." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20356/.
Full textMacenauer, Pavel. "Detekce objektů na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234942.
Full textPolášek, Tomáš. "Hybridní raytracing v rozhraní DXR." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403161.
Full textMíček, Vojtěch. "Neuronové sítě pro klasifikaci typu a kvality průmyslových výrobků." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413276.
Full textBokhari, Saniyah S. "Parallel Solution of the Subset-sum Problem: An Empirical Study." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281.
Full textStraňák, Marek. "Raytracing na GPU." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-237020.
Full textArtico, Fausto. "Performance Optimization Of GPU ELF-Codes." Doctoral thesis, Università degli studi di Padova, 2014. http://hdl.handle.net/11577/3424532.
Full textGPUs (Graphic Processing Units) sono di interesse per il loro favorevole rapporto $\frac{GF/s}{price}$. Rispetto all'inizio - primi anni 70 - oggigiorno le architectture GPU sono più simili ad architectture general purpose ma hanno un numero (molto) più grande di cores - la architecttura GF100 rilasciata da NVIDIA durante il 2009-2010, per esempio, ha una vera gerarchia di memoria cache, uno spazio unificato per l'indirizzamento in memoria, è in grado di eseguire calcoli in doppia precisione ed ha un massimo 512 core. Sfruttare la potenza computazionale delle GPU per applicazioni non grafiche - passate o presenti - è, comunque, sempre stato difficile. Inizialmente, nei primi anni 2000, la programmazione su GPU avveniva (esclusivamente) attraverso l'uso librerie grafiche, le quali rendevano la scrittura di codici non grafici non triviale e tediosa al meglio, e virtualmente impossibile al peggio. Nel 2003, furono introdotti il compilatore e il sistema runtime Brook che diedero agli utenti l'abilità di generare codice GPU da un linguaggio di programmazione ad alto livello. Nel 2006 NVIDIA introdusse CUDA (Compute Unified Device Architecture). CUDA, un modello di programmazione e computazione parallela specificamente sviluppato da NVIDIA per le sue GPUs, tenta di facilitare ulteriormente la programmazione general purpose di GPU. Codice scritto in CUDA è portabile tra differenti architectture GPU della NVIDIA e questa è una delle ragioni perché NVIDIA afferma che la produttività degli utenti è molto più alta di precedenti soluzioni, tuttavia ottimizare codice GPU con l'obbiettivo di ottenere le massime prestazioni rimane molto difficile, specialmente per NVIDIA GPUs che usano l'architecttura GF100 - per esempio, Fermi GPUs e delle Tesla GPUs - perché a) il vero instruction set architecture (ISA) è non pubblicamente disponibile, b) il codice del compilatore NVIDIA - nvcc - è non aperto e c) gli utenti non possono scrivere codice usando il vero assembly - ELF nel gergo della NVIDIA. I compilatori, mentre permettono un immenso incremento della produttività di un programmatore, eliminando la necessità di codificare al (tedioso) livello assembly, sono incapaci di ottenere, a questa data, prestazioni simili a quelle di un programmatore che è esperto in assembly ed ha una buona conoscenza dell'architettura sottostante. Infatti, è largamente accettato che programmazione ad alto livello e compilazione perfino con compilatori che sono considerati allo stato dell'arte perdono, in media, un fattore 3 in prestazione - e a volte molto di più - nei confronti di cosa un buon programmatore assembly potrebbe ottenere, e questo perfino su una macchina convenzionale, semplice, a singolo core. Compilatori per macchine più complesse, come le GPU NVIDIA, sono propensi a fare molto peggio perché tra le altre cose, essi devono determinare (persino più) complessi trade-offs durante la ricerca di soluzioni a problemi spesso indecidibili e NP-hard. Peraltro, perché NVIDIA a) rende virtualmente impossibile guadagnare accesso all'attuale linguaggio assembly usato dalla architettura GF100, b) non spiega pubblicamente molti dei meccanismi interni implementati nel suo compilatore - nvcc - e c) rende virtualmente impossible imparare i dettagli della molto complessa architecttura GF100 ad un sufficiente livello di dettaglio che permetta di sfruttarli, ottenere una stima delle differenze prestazionali tra programmazione in CUDA e programmazione a livello macchina per GPU NVIDIA che usano la architecttura GF100 - per non parlare dell'ottenimento a priori di garanzie di tempo di esecuzione più breve - è stato, prima di questo corrente lavoro, impossbile. Per ottimizare codice GPU, gli utenti devono usare CUDA or PTX (Parallel Thread Execution) - un instruction set architecture virtuale. I file CUDA or PTX sono dati in input a nvcc che produce come output fatbin file. I fatbin file sono prodotti considerando l'architecttura GPU selezionata dall'utente - questo è fatto settando un flag usato da nvcc. In un fatbin file, zero o più parti del fatbin file saranno eseguite dalla CPU - pensa a queste parti come le parti C/C++ - mentre le rimanenti parti del fatbin file - pensa a queste parti come le parti ELF - saranno eseguite dallo specifico modello GPU per il quale i file CUDA or PTX sono stati compilati. I fatbin file sono normalmente molto differenti dai corrispodenti file CUDA o PTX e questa assenza di controllo può completamente rovinare qualsiasi sforzo fatto a livello CUDA o PTX per otimizzare la parte o le parti ELF del fatbin file che sarà eseguita / saranno eseguite dalla GPU per la quale il fatbin file è stato compilato. Noi quindi scopriamo quale è il vero ISA usato dalla architettura GF100 e generiamo un insieme di linea guida per scrivere codice in modo tale da forzare nvcc a generare fatbin file con almeno il minimo numero di risorse successivamente necessario per modificare i fatbin file per ottenere le volute implementazioni algoritmiche in ELF - questo da controllo sul codice ELF che è eseguito da qualsiasi GPU che usa l'architettura GF100. Durante il processo di scoperata del vero ISA scopriamo anche le corrispondenze tra istruzioni PTX e istruzioni ELF - una singola istructione PTX può essere transformata in one o più istruzioni ELF - e le corrispondenze tra registri PTX e registri ELF. La nostra procedura è completamente ripetibile per ogni NVIDIA Kepler GPU - non occorre che riscrivamo il nostro codice. Essere in grado di ottenere le volute implementazioni algoritmiche in ELF non è abbastanza per ottimizzare il codice ELF di un fatbin file, ci occorre infatti anche scoprire, comprendere e quantificare dei comportamenti GPU che non sono divulgati e che potrebbero rallentare l'esecuzione di codice ELF. Questo è necessario per comprendere come eseguire il processo di ottimizzazione e mentre noi non possiamo riportare qui tutti i risultati che abbiamo ottenuto, noi possiamo comunque dire che spiegheremo al lettore a) come forzare una distribuzione uniforme dei GPU thread blocks agli streaming multiprocessors, b) come abbiamo scoperto e quantificato diversi fenomeni riguardanti il warp scheduling, c) come evitare fenomeni di warp scheduling load unblanacing, che è non possible controllare, negli streaming multiprocessors, d) come abbiamo determinato, per ogni istruzione ELF, la minima quantità di tempo che è necessario attendere prima che un warp scheduler possa schedulare ancora un warp - si, la quantità di tempo può essere differente per differenti istruzioni ELF - e) come abbiamo determinato il tempo che è necessario attendere prima di essere in grado di leggere ancora un dato in un registro precedentemente letto o scritto - questo pure può essere differente per differnti istruzioni ELF e differente se il dato è stato precedentemente letto o scritto - e f) come abbiamo scoperto la presenza di un tempo di overhead per la gestione dei warp che non cresce linearmente ad un incremento lineare del numero di warp residenti in uno streaming multiprocessor. Successivamente, noi spiegamo a) le procedure di trasformazione che è necessario applicare al codice ELF di un fatbin file per ottimizzare il codice ELF e così rendere il suo tempo di esecuzione il più corto possibile, b) perché occorre classificare i fatbin file generati dal fatbin file originale durante il processo di ottimizzazione e come noi facciamo questo usando diversi criteri che come risultato finale permettono a noi di determinare le posizioni, occupate da ogni fatbin file generato, in una tassonomia che noi abbiamo creato, c) come usando la posizione di un fatbin file nella tassonomia noi determiniamo se il fatbin file è qualificato per una analisi empirica - che noi spieghiamo - una analisi teorica o entrambe and d) come - supponendo il fatbin file sia qualificato per una analisi teorica - noi eseguiamo l'analisi teorica che abbiamo ideato e diamo a priori - senza alcuna precedente esecuzione del fatbin file - la garanzia - questo supponendo il fatbin file soddisfi tutti i requisiti dell'analisi teorica - che l'esecuzione del codice ELF del fatbin file, quando il fatbin file sarà eseguito sulla architettura GPU per cui è stato generato, sarà la più breve possibile.
Adeboye, Taiyelolu. "Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-27561.
Full text