Dissertations / Theses on the topic 'Parallel programming techniques'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 28 dissertations / theses for your research on the topic 'Parallel programming techniques.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Pereira, Marcio Machado 1959. "Scheduling and serialization techniques for transactional memories." [s.n.], 2015. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275547.
Full textTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-27T10:12:59Z (GMT). No. of bitstreams: 1 Pereira_MarcioMachado_D.pdf: 2922376 bytes, checksum: 9775914667eadf354d7e256fb2835859 (MD5) Previous issue date: 2015
Resumo: Nos últimos anos, Memórias Transacionais (Transactional Memories ¿ TMs) têm-se mostrado um modelo de programação paralela que combina, de forma eficaz, a melhoria de desempenho com a facilidade de programação. Além disso, a recente introdução de extensões para suporte a TM por grandes fabricantes de microprocessadores, também parece endossá-la como um modelo de programação para aplicações paralelas. Uma das questões centrais na concepção de sistemas de TM em Software (STM) é identificar mecanismos ou heurísticas que possam minimizar a contenção decorrente dos conflitos entre transações. Apesar de já terem sido propostos vários mecanismos para reduzir a contenção, essas técnicas têm um alcance limitado, uma vez que o conflito é evitado por interrupção ou serialização da execução da transação, impactando consideravelmente o desempenho do programa. Este trabalho explora uma abordagem complementar para melhorar o desempenho de STM através da utilização de escalonadores. Um escalonador de TM é um componente de software que decide quando uma determinada transação deve ser executada ou não. Sua eficácia é muito sensível às métricas usadas para prever o comportamento das transações, especialmente em cenários de alta contenção. Este trabalho propõe um novo escalonador, Dynamic Transaction Scheduler ¿ DTS, para selecionar a próxima transação a ser executada. DTS é baseada em uma política de "recompensa pelo sucesso" e utiliza uma métrica que mede com melhor precisão o trabalho realizado por uma transação. Memórias Transacionais em Hardware (HTMs) são mecanismos interessante para implementar TM porque integram o suporte a transações no nível da arquitetura. Por outro lado, aplicações que usam HTM podem ter o seu desempenho dificultado pela falta de escalabilidade e transbordamento da cache de dados. Este trabalho apresenta um extenso estudo de desempenho de aplicações que usam HTM na arquitetura Haswell da Intel. Ele avalia os pontos fortes e fracos desta nova arquitetura, realizando uma exploração das várias características das aplicações de TM. Este estudo detalhado revela as restrições impostas pela nova arquitetura e introduz uma política de serialização simples, porém eficaz, para garantir o progresso das transações, além de proporcionar melhor desempenho
Abstract: In the last few years, Transactional Memories (TMs) have been shown to be a parallel programming model that can effectively combine performance improvement with ease of programming. Moreover, the recent introduction of (H)TM-based ISA extensions, by major microprocessor manufacturers, also seems to endorse TM as a programming model for today¿s parallel applications. One of the central issues in designing Software TM (STM) systems is to identify mechanisms or heuristics that can minimize contention arising from conflicting transactions. Although a number of mechanisms have been proposed to tackle contention, such techniques have a limited scope, because conflict is avoided by either interrupting or serializing transaction execution, thus considerably impacting performance. This work explores a complementary approach to boost the performance of STM through the use of schedulers. A TM scheduler is a software component that decides when a particular transaction should be executed. Their effectiveness is very sensitive to the accuracy of the metrics used to predict transaction behaviour, particularly in high-contention scenarios. This work proposes a new Dynamic Transaction Scheduler ¿ DTS to select a transaction to execute next, based on a new policy that rewards success and an improved metric that measures the amount of effective work performed by a transaction. Hardware TMs (HTM) are an interesting mechanism to implement TM as they integrate the support for transactions at the lowest, most efficient, architectural level. On the other hand, for some applications, HTMs can have their performance hindered by the lack of scalability and by limitations in cache store capacity. This work presents an extensive performance study of the implementation of HTM in the Haswell generation of Intel x86 core processors. It evaluates the strengths and weaknesses of this new architecture by exploring several dimensions in the space of TM application characteristics. This detailed performance study provides insights on the constraints imposed by the Intel¿s Transaction Synchronization Extension (Intel¿s TSX) and introduces a simple, but efficient, serialization policy for guaranteeing forward progress on top of the best-effort Intel¿s HTM which was critical to achieving performance
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
Hind, Alan. "Parallel simulation techniques for telecommunication network modelling." Thesis, Durham University, 1994. http://etheses.dur.ac.uk/5520/.
Full textLu, Kang Hsin. "Modelling of saturated traffic flow using highly parallel systems." Thesis, University of Sheffield, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245726.
Full textPapadopoulos, George Angelos. "Parallel implementation of concurrent logic languages using graph rewriting techniques." Thesis, University of East Anglia, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.329340.
Full textNautiyal, Sunil Datt. "Parallel computing techniques for investigating three dimensional collapse of a masonry arch." Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.320031.
Full textWebb, Craig Jonathan. "Parallel computation techniques for virtual acoustics and physical modelling synthesis." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/15779.
Full textBayne, Ethan. "Accelerating digital forensic searching through GPGPU parallel processing techniques." Thesis, Abertay University, 2017. https://rke.abertay.ac.uk/en/studentTheses/702de12a-e10b-4daa-8baf-c2c57a501240.
Full textSrivastava, Rohit Kumar. "Modeling Performance of Tensor Transpose using Regression Techniques." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524080824154753.
Full textTitos, Gil José Rubén. "Hardware Techniques for High-Performance Transactional Memory in Many-Core Chip Multiprocessors." Doctoral thesis, Universidad de Murcia, 2011. http://hdl.handle.net/10803/51473.
Full textThis thesis focuses on the hardware mechanisms that provide optimistic concurrency control with guarantees of atomicity and isolation, with the intent of achieving high-performance across a variety of workloads, at a reasonable cost in terms of design complexity. This thesis identifies key inefficiencies that impact the performance of several hardware implementations of TM, and proposes mechanisms to overcome such limitations. In this dissertation we consider both eager and lazy approaches to HTM system design, and address important sources of overhead that are inherent to each policy. This thesis presents a hybrid-policy, adaptable HTM system that combines the advantages of both eager and lazy approaches in a low complexity design. Furthermore, this thesis investigates the overheads of the simpler, fixed-policy HTM designs that leverage a distributed directory-based coherence protocol to detect data races over a scalable interconnect, and develops solutions that address some performance degrading factors.
Protze, Joachim [Verfasser]. "Modular Techniques and Interfaces for Data Race Detection in Multi-Paradigm Parallel Programming / Joachim Protze." Berlin : epubli, 2021. http://d-nb.info/1239488076/34.
Full textAlbassam, Bader. "Enforcing Security Policies On GPU Computing Through The Use Of Aspect-Oriented Programming Techniques." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6165.
Full textPotgieter, Andrew. "A Parallel Multidimensional Weighted Histogram Analysis Method." Thesis, University of Cape Town, 2014. http://pubs.cs.uct.ac.za/archive/00000986/.
Full textTristram, Waide Barrington. "Investigating tools and techniques for improving software performance on multiprocessor computer systems." Thesis, Rhodes University, 2012. http://hdl.handle.net/10962/d1006651.
Full textGouin, Florian. "Méthodologie de placement d'algorithmes de traitement d'images sur architecture massivement parallèle." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEM075.
Full textIn industries, the curse of image sensors for higher definitions increases the amount of data to be processed in the image processing domain. The concerned algorithms, applied to embedded solutions, also have to frequently accept real-time constraints. So, the main issues are to moderate power consumption, to attain high performance computings and high memory bandwidth for data delivery.The massively parallel conception of GPUs is especially well adapted for this kind of tasks. However, this achitecture is complex to handle. Some reasons are its multiple memory and computation hierachical levels or the usage of this accelerator inside a global heterogeneous architecture. Therefore, mapping algorithms on GPUs, while exploiting high performance capacities of this architecture, aren’t trivial operations.In this thesis, we have developped a mapping methodology for sequential algorithms and designed it for GPUs. This methodology is made up of code analysis phases, mapping criteria verifications, code transformations and a final code generation phase. Part of the defined mapping criteria has been designed to assure the mapping legality, by considering GPU hardware specifities, whereas the other part are used to improve runtimes. In addition, we have studied GPU memories performances and the capacity of GPU to efficiently support coarse grain parallellism. This complementary work is a foundation for further improvments of GPU resources exploitation inside this mapping methodology.Last, the experimental results have revealed the functional reliability of the codes mapped on GPU and a speedup on the runtime of many C and C++ image processing applications used in industry
Fernandez, Alonso Eduard. "Offloading Techniques to Improve Performance on MPI Applications in NoC-Based MPSoCs." Doctoral thesis, Universitat Autònoma de Barcelona, 2014. http://hdl.handle.net/10803/284889.
Full textFuture embedded System-on-Chip (SoC) will probably be made up of tens or hundreds of heterogeneous Intellectual Properties (IP) cores, which will execute one parallel application or even several applications running in parallel. These systems could be possible due to the constant evolution in technology that follows the Moore’s law, which will lead us to integrate more transistors on a single dice, or the same number of transistors in a smaller dice. In embedded MPSoC systems, NoCs can provide a flexible communication infrastructure, in which several components such as microprocessor cores, MCU, DSP, GPU, memories and other IP components can be interconnected. In this thesis, firstly, we present a complete development process created for developing MPSoCs on reconfigurable clusters by complementing the current SoC development process with additional steps to support parallel programming and software optimization. This work explains systematically problems and solutions to achieve a FPGA-based MPSoC following our systematic flow and offering tools and techniques to develop parallel applications for such systems. Additionally, we show several programming models for embedded MPSoCs and propose the adoption of MPI for such systems and show some implementations created in this thesis over shared and distributed memory architectures. Finally, the focus will be set on the overhead produced by MPI library and on trying to find solutions to minimize this overhead and then be able to accelerate the execution of the application, offloading some parts of the software stack to the Network Interface Controller.
Menezes, Marlim Pereira. "Metodologia para paralelização e otimização de modelos matemáticos e computacionais, utilizando uma nova linguagem de programação." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/3/3143/tde-16122014-153819/.
Full textAt the end of this research project, an efficient methodology is expected with the purpose of assisting users in the processing of mathematical and computer models coded for sequential computers in parallel models that are optimized to run on modern personal computers, consisting of a CPU with multiple or hybrid (CPU + GPGPU) cores integrated into the same chip, with or without massively parallel graphics processors (GPGPU) installed. This will ensure the original quality of the results with respect to numerical accuracy, but with a considerable reduction in processing time. The emergence of these new hardware architectures in the mid-2000s increased the processing power of personal computers to the levels of mainframe computers from just a few years previously. This research work presents two methodologies, where the first methodology is composed of three parts and the second methodology is composed of two parts. Only the third part of the first methodology is dependent on hardware technologies.
Protze, Joachim [Verfasser], Matthias S. [Akademischer Betreuer] Müller, Jesper L. [Akademischer Betreuer] Träff, and Martin [Akademischer Betreuer] Schulz. "Modular techniques and interfaces for data race detection in multi-paradigm parallel programming / Joachim Protze ; Matthias S. Müller, Jesper L. Träff, Martin Schulz." Aachen : Universitätsbibliothek der RWTH Aachen, 2021. http://d-nb.info/1241683255/34.
Full textEwen, Stephan [Verfasser], Volker [Akademischer Betreuer] Markl, Peter [Akademischer Betreuer] Pepper, Odej [Akademischer Betreuer] Kao, and Michael [Akademischer Betreuer] Carey. "Programming abstractions, compilation, and execution techniques for massively parallel data analysis / Stephan Ewen. Gutachter: Peter Pepper ; Volker Markl ; Odej Kao ; Michael Carey. Betreuer: Volker Markl." Berlin : Technische Universität Berlin, 2015. http://d-nb.info/1070580619/34.
Full textTrevizan, Marcelo Porto. "Processamento paralelo na simulação de campos eletromagnéticos pelo método das diferenças finitas no domínio do tempo - FDTD." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-14052007-173206/.
Full textResearches and projects involving electromagnetic problems are continuously increasing. As much for researches as for projects, there is a resource of achieving computer simulations for the involved problems aiming to investigate the electromagnetic phenomenons behavior, in the situation they are. There are cases, however, the problem results in high computational size, requesting more memories sizes and high processing times, because of the given geometries or high accuracy wanted. With the intent of solving these questions, the parallel computation developing becomes interesting. One of the possible implementations of this parallel system is the use of a computer network. Besides, using free programms, the implementation has almost any costs. The present work, using the FDTD method, aims at the implementation of this parallel system. However, during the development stage, a special attention was given to the programming practices, with the intent of guaranteeing the flexibility, modularity and expansibility of the program. In addition, a mathematic tool was developed to estimate the total processing time of the parallel simulation and to predict indications for adjustments of parameters to reach the minimum time possible. The code, the parallel system and the mathematic tool are validated with some examples. Finally, a study for a practical aplication of interest is done with the developed tool.
Pai, Satish. "Multiplexed pipelining : a cost effective loop transformation technique." PDXScholar, 1992. https://pdxscholar.library.pdx.edu/open_access_etds/4425.
Full textSantos, Walter Meneghette dos. "Estudo e desenvolvimento de paralelismo de inversores para aplicação fotovoltaica conectados à rede elétrica." Universidade Tecnológica Federal do Paraná, 2013. http://repositorio.utfpr.edu.br/jspui/handle/1/883.
Full textPhotovoltaic systems have been spreading globally as a clean energy technology that can be used in most of the planet Earth. This makes it a very interesting system for distributed generation. The key to the use of photovoltaics in distributed generation inverter is connected to the power grid. Thus the performance of this equipment directly influences the use of energy generated by the photovoltaic panels and consequently the time that the system pays for itself. The seasonal behavior of power generation, where the drive works most of the time between 10% and 90% of capacity, especially in systems without tracking, does not allow the drive to be evaluated not only by their performance at full load, but the full yield curve throughout the operating range. The proposed method improves the system performance at low power is the use of low power inverters connected in parallel to mains electricity working in installments. Thus, in the low power output is higher than if a single drive were used. This work also evaluated the consequences of parallelism in the rate of harmonic current distortion and benefits of expanding the life of the equipment and the use of redundancy . We implemented four inverters 300W output full bridge topology with switching frequency of 21.6 kHz and sampling, each controlled by a Freescale 56F8014 DSC, and a device for monitoring the inverters using a PIC18F4520 microcontroler. All devices have isolated communication interface UART with LIN protocol. The inverters were tested in operation mode continuous power sharing , where all the inverters operate with identical plots power, and staggered where the inverters come into operation upon the demand of power being processed. The results show an improvement of 3,7% in revenue sharing system between the power and continued staggered valued at weighted yield of the system (IEC-61836).
5000
Michael, Gavin Constantine. "Compilation techniques for multicomputers." Phd thesis, 1996. http://hdl.handle.net/1885/117301.
Full text(9805715), Zhigang Huang. "A recursive algorithm for reliability assessment in water distribution networks with applications of parallel programming techniques." Thesis, 1994. https://figshare.com/articles/thesis/A_recursive_algorithm_for_reliability_assessment_in_water_distribution_networks_with_applications_of_parallel_programming_techniques/13425371.
Full textFaraj, Ahmad A. Yuan Xin. "Automatic empirical techniques for developing efficient MPI collective communication routines." 2006. http://etd.lib.fsu.edu/theses/available/07072006-162046.
Full textAdvisor: Xin Yuan, Florida State University, College of Arts and Sciences, Dept. of Computer Science. Title and description from dissertation home page (viewed Sept. 19, 2006). Document formatted into pages; contains xiii, 162 pages. Includes bibliographical references.
Abell, Stephen W. "Parallel acceleration of deadlock detection and avoidance algorithms on GPUs." Thesis, 2013. http://hdl.handle.net/1805/3653.
Full textCurrent mainstream computing systems have become increasingly complex. Most of which have Central Processing Units (CPUs) that invoke multiple threads for their computing tasks. The growing issue with these systems is resource contention and with resource contention comes the risk of encountering a deadlock status in the system. Various software and hardware approaches exist that implement deadlock detection/avoidance techniques; however, they lack either the speed or problem size capability needed for real-time systems. The research conducted for this thesis aims to resolve issues present in past approaches by converging the two platforms (software and hardware) by means of the Graphics Processing Unit (GPU). Presented in this thesis are two GPU-based deadlock detection algorithms and one GPU-based deadlock avoidance algorithm. These GPU-based algorithms are: (i) GPU-OSDDA: A GPU-based Single Unit Resource Deadlock Detection Algorithm, (ii) GPU-LMDDA: A GPU-based Multi-Unit Resource Deadlock Detection Algorithm, and (iii) GPU-PBA: A GPU-based Deadlock Avoidance Algorithm. Both GPU-OSDDA and GPU-LMDDA utilize the Resource Allocation Graph (RAG) to represent resource allocation status in the system. However, the RAG is represented using integer-length bit-vectors. The advantages brought forth by this approach are plenty: (i) less memory required for algorithm matrices, (ii) 32 computations performed per instruction (in most cases), and (iii) allows our algorithms to handle large numbers of processes and resources. The deadlock detection algorithms also require minimal interaction with the CPU by implementing matrix storage and algorithm computations on the GPU, thus providing an interactive service type of behavior. As a result of this approach, both algorithms were able to achieve speedups over two orders of magnitude higher than their serial CPU implementations (3.17-317.42x for GPU-OSDDA and 37.17-812.50x for GPU-LMDDA). Lastly, GPU-PBA is the first parallel deadlock avoidance algorithm implemented on the GPU. While it does not achieve two orders of magnitude speedup over its CPU implementation, it does provide a platform for future deadlock avoidance research for the GPU.
Nagarakatte, Santosh G. "Spill Code Minimization And Buffer And Code Size Aware Instruction Scheduling Techniques." Thesis, 2007. http://hdl.handle.net/2005/507.
Full textIyer, Neeraj. "Machine Vision Assisted In Situ Ichthyoplankton Imaging System." 2013. http://hdl.handle.net/1805/3368.
Full textRecently there has been a lot of effort in developing systems for sampling and automatically classifying plankton from the oceans. Existing methods assume the specimens have already been precisely segmented, or aim at analyzing images containing single specimen (extraction of their features and/or recognition of specimens as single targets in-focus in small images). The resolution in the existing systems is limiting. Our goal is to develop automated, very high resolution image sensing of critically important, yet under-sampled, components of the planktonic community by addressing both the physical sensing system (e.g. camera, lighting, depth of field), as well as crucial image extraction and recognition routines. The objective of this thesis is to develop a framework that aims at (i) the detection and segmentation of all organisms of interest automatically, directly from the raw data, while filtering out the noise and out-of-focus instances, (ii) extract the best features from images and (iii) identify and classify the plankton species. Our approach focusses on utilizing the full computational power of a multicore system by implementing a parallel programming approach that can process large volumes of high resolution plankton images obtained from our newly designed imaging system (In Situ Ichthyoplankton Imaging System (ISIIS)). We compare some of the widely used segmentation methods with emphasis on accuracy and speed to find the one that works best on our data. We design a robust, scalable, fully automated system for high-throughput processing of the ISIIS imagery.
Kriske, Jeffery Edward Jr. "A scalable approach to processing adaptive optics optical coherence tomography data from multiple sensors using multiple graphics processing units." Thesis, 2014. http://hdl.handle.net/1805/6458.
Full textAdaptive optics-optical coherence tomography (AO-OCT) is a non-invasive method of imaging the human retina in vivo. It can be used to visualize microscopic structures, making it incredibly useful for the early detection and diagnosis of retinal disease. The research group at Indiana University has a novel multi-camera AO-OCT system capable of 1 MHz acquisition rates. Until this point, a method has not existed to process data from such a novel system quickly and accurately enough on a CPU, a GPU, or one that can scale to multiple GPUs automatically in an efficient manner. This is a barrier to using a MHz AO-OCT system in a clinical environment. A novel approach to processing AO-OCT data from the unique multi-camera optics system is tested on multiple graphics processing units (GPUs) in parallel with one, two, and four camera combinations. The design and results demonstrate a scalable, reusable, extensible method of computing AO-OCT output. This approach can either achieve real time results with an AO-OCT system capable of 1 MHz acquisition rates or be scaled to a higher accuracy mode with a fast Fourier transform of 16,384 complex values.